This is an online seminar. Registration is required.
【Deep Learning Theory Team】
【Date】2023/April/6(Thu) 16:00-17:00(JST)
*【Speaker】Stefano Massaroli, Mila–Quebec AI Institute / Université de Montréal *
Title: Toward Large Convolutional Sequence Models
Abstract
In the realm of deep learning, large Transformers have proven effective due to their ability to learn at scale. However, the attention operator, which is a core building block of Transformers, exhibits quadratic cost in sequence length, making it challenging to access large contexts. In this talk, we will explore how long convolutions may provide a subquadratic drop-in replacement for attention. We will start by discussing classic signal processing arguments and follow our research journey, which began with continuous-depth learning and neural differential equations. and culminated in the development of “Hyena”, our latest sequence architecture. Hyena leverages implicitly parametrized convolutions interleaved with data-controlled gating and matches the performances of large Transformers on long-range reasoning and natural language modeling tasks. We will provide insights into the inner workings of Hyena through the lens of system theory, shedding light on how it enables efficient learning at scale. Join us as we explore the power of large convolutional sequence models and our journey to develop Hyena architecture.
Public events of RIKEN Center for Advanced Intelligence Project (AIP)
Join community