[ABI Team Seminar] Talk by Frederik Kunstner (UBC) on “Adaptive Methods in Machine Learning and Why Adam Works so Well”

Wed, 29 May 2024 14:30 - 15:30 JST
Online Link visible to participants

Registration is closed

Get invited to future events

Free admission


This talk will be held in a hybrid format, both in person at AIP Open Space of RIKEN AIP (Nihonbashi office) and online by Zoom. AIP Open Space: *only available to AIP researchers.

May 29, 2024: 14:30 pm - 15:30 pm (JST)

Adaptive Methods in Machine Learning and Why Adam Works so Well

Frederik Kunstner (University of British Columbia)

The success of the Adam optimizer has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding of why Adam performs better is lagging. The literature presents many competing interpretations and hypotheses, but we do not yet have a clear understanding of which (if any) captures the key problem that Adam “fixes” to outperform SGD. This talk presents empirical results that evaluate recently developed assumptions to model difficulties of modern architectures such as large language models, where a large performance gap between SGD and Adam has been observed. We isolate a key property of language problems — a large vocabulary with a heavy-tailed, unbalanced distribution of output classes — as a potential cause of this performance gap.

Frederik Kunstner is a 5th year PhD student at the University of British Columbia, working with Mark Schmidt. His work is at the intersection of the theory of optimization methods and their application to machine learning, focusing on modeling the difficulties involved in training modern models. Prior to his PhD, Frederik studied at EPFL in Switzerland, and had the opportunity to intern at the RIKEN Center for Advanced Intelligence Project with Emtiyaz Khan in Japan and the Max Planck Institute for Intelligent Systems with Philipp Hennig in Germany.

About this community



Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community