RIKEN AIP Public

Talk by Mr. Shaojie Bai (CMU)

Name: Talk by Mr. Shaojie Bai (CMU)
Start: 2019-11-29T15:45:00+09:00
End: 2019-11-29T16:30:00+09:00
Location: RIKEN AIP KYOTO, Seminar Room (Research Bldg No15)

Fri, 29 Nov 2019 15:45 - 16:30 JST

Add to Google Calendar

RIKEN AIP KYOTO, Seminar Room (Research Bldg No15)

Artificial Intelligence Research Unit, Graduate School of Informatics, Kyoto University, Yoshida Honmachi, Sakyo-ku, Kyoto, 606-8501, Japan

Show map

Registration is closed

Get invited to future events

Free admission

Description

Speaker: Shaojie Bai (Carnegie Mellon University)
https://jerrybai1995.github.io/

Title: Deep Equilibrium Models: One “Implicit” Layer is All You Need (NeurIPS 2019, spotlight oral)

Abstract: Deep learning has long focused upon the hierarchy of representations, which is usually better learned by adding layers (i.e., depth) to increase a model’s both complexity and expressivity. In this work, we revisit and argue for an alternative perspective, where we only define one layer with an implicitly defined output of the model. We show how this one-layer model is equivalent to an infinite-depth model, and how it re-shapes our view on deep learning via the very concepts of equilibria and dynamical systems. Specifically, we introduce the deep equilibrium (DEQ) model, and discuss how we can 1) solve for this implicit-depth model’s equilibria directly via (black-box) Quasi-Newton methods; 2) backpropagate directly from these equilibria with O(1) memory (whereas typical deep networks need O(L) memory for L layers); and 3) theoretically analyze the universality of the representational power of the DEQ model (i.e., the proof that “one layer” is really all you need). Finally, we demonstrate that the DEQ approach is not predicated on any particular architectural choice, and that it scales to large, realistic, and high-dimensional sequence tasks with results on par with (or better) than the SOTA architectures (e.g., Transformers) despite only using a single layer and vastly improving the memory efficiency (by up to 88%). This work is based on the NeurIPS 2019 paper “Deep Equilibrium Models”.

Share Tweet

About this community

RIKEN AIP Public

Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community