Doorkeeper

High-dimensional Statistical Modeling Team Seminar (Talk by Max Vladymyrov, Google Research)

Tue, 08 Mar 2022 10:00 - 11:00 JST
Online Link visible to participants
Register

Registration is closed

Get invited to future events

Free admission
-Time Zone:JST -The seats are available on a first-come-first-served basis. -When the seats are fully booked, we may stop accepting applications. -Simultaneous interpretation will not be available.

Description

Title: Examples of bilevel optimization with applications in learning-to-learn and few-shot learning

Abstract:
During the talk, I'll describe a couple of projects that we have recently published with my colleagues from Google. Both are related to a theme of bilevel optimization, where one system (or a set of parameters) is controlling (or adapting to) the behavior of another system. This framework arises often in the context of meta-learning, hyperparameters optimization and many others. In particular, I will describe two projects: one on learning-to-learn and another on few-shot learning.
In the first part, I will talk about a method that we call BLUR (Bidirectional Learned Update Rules). There, we decompose the backpropagation down to individual formulae and study the relation between forward, backward passes as well as the weight updates. Turns out, backpropagation can be equivalently represented using a 2-state network. We generalize this formulation to a multi-state update network using a more general set of rules parametrized by a set of low-dimensional meta-parameters. We show with several examples that learning rules obtained this way can generalize to unseen tasks and often learn faster than SGD.
For the second part, I will discuss our recent work on HyperTransformers, where we train a transformer-based model for few-shot learning to generate weights of a given CNN directly from support samples. This effectively decouples the complexity of the large task space learned by a transformer from the complexity of individual tasks performed by a CNN. Our method is particularly effective for small target CNN architectures where learning a fixed universal task-independent embedding is not optimal and better performance is attained when the information about the task can modulate all model parameters.

Bio:
https://research.google/people/106637/

About this community

RIKEN AIP Public

RIKEN AIP Public

Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community