This talk will be held in a hybrid format, both in person at AIP Open Space of RIKEN AIP (Nihonbashi office) and online by Zoom. AIP Open Space: *only available to AIP researchers.
DATE & TIME
May 29, 2024: 14:30 pm - 15:30 pm (JST)
TITLE
Adaptive Methods in Machine Learning and Why Adam Works so Well
SPEAKER
Frederik Kunstner (University of British Columbia)
ABSTRACT
The success of the Adam optimizer has made it the default in settings where stochastic gradient descent (SGD) performs poorly. However, our theoretical understanding of why Adam performs better is lagging. The literature presents many competing interpretations and hypotheses, but we do not yet have a clear understanding of which (if any) captures the key problem that Adam “fixes” to outperform SGD. This talk presents empirical results that evaluate recently developed assumptions to model difficulties of modern architectures such as large language models, where a large performance gap between SGD and Adam has been observed. We isolate a key property of language problems — a large vocabulary with a heavy-tailed, unbalanced distribution of output classes — as a potential cause of this performance gap.
BIOGRAPHY
Frederik Kunstner is a 5th year PhD student at the University of British Columbia, working with Mark Schmidt. His work is at the intersection of the theory of optimization methods and their application to machine learning, focusing on modeling the difficulties involved in training modern models. Prior to his PhD, Frederik studied at EPFL in Switzerland, and had the opportunity to intern at the RIKEN Center for Advanced Intelligence Project with Emtiyaz Khan in Japan and the Max Planck Institute for Intelligent Systems with Philipp Hennig in Germany.