Doorkeeper

Deep Learning Theory Team Seminar _Talk by Dr.Wei HUANG.

Thu, 23 Dec 2021 13:00 - 14:00 JST
Online Link visible to participants
Register

Registration is closed

Get invited to future events

Free admission

Description

This is an online seminar. Registration is required.
【Deep Learning Theory Team】
【Date】2021/Dec/23(Thu) 1300pm-1400pm (JST)

【Speaker】 Dr. Wei HUANG

Title:Closing the Gap between Theory and Applications in Deep Learning: A NTK Perspective
Abstract: Deep learning has been responsible for a step-change in performance across machine learning, setting new benchmarks in a large number of applications. During my Ph.D. study, I seek to understand the theoretical properties of deep neural networks and close the gap between the theory and application sides. This presentation will introduce three concrete works with respect to the neural tangent kernel (NTK), one of the seminal advances in deep learning theory recently. First, we study the training dynamics of deep nonlinear networks with orthogonal initialization via NTK, proving that two NTKs, one corresponding to Gaussian weights and one to orthogonal weights, are equal when the network width is infinite. This theoretical result suggests that two initializations have the same training dynamics in the NTK regime. Besides, with a thorough empirical investigation, we find that orthogonal initialization increases learning speeds in scenarios with a large learning rate or large depth. The second work focuses on the over-smoothing problem of graph neural networks by exploiting the Graph Neural Tangent Kernel (GNTK), which governs the optimization trajectory under gradient descent for wide Graph Convolutional Networks (GCNs). We formulate the asymptotic behaviors of GNTK in the large depth, which enables us to reveal the dropping trainability of wide and deep GCNs at an exponential rate in the optimization process. To overcome the exponential decay problem more fundamentally, we propose Critical DropEdge, a connectivity-aware and graph-adaptive sampling method, inspired by our theoretical insights on trainability. Finally, by exploring the connection between the generalization performance and the training dynamics, we propose a theory-driven deep active learning method that can achieve state-of-the-art performances. In particular, we prove that convergence speed of training and the generalization performance is positively correlated under the ultra-wide condition and show that maximizing the training dynamics leads to a better generalization performance. Empirical results show that our method not only outperforms the other baselines consistently but also scales well on large deep learning models.

About this community

RIKEN AIP Public

RIKEN AIP Public

Public events of RIKEN Center for Advanced Intelligence Project (AIP)

Join community