日本橋オフィス
Train faster, generalize better: Stability of stochastic gradient descent
We would talk about “train faster generalize better” by Moritz Hardt et al.
https://arxiv.org/pdf/1509.01240.pdf
This paper gives us the notion of the “stability” of stochastic algorithms.
Intuitively, the output of the stable algorithm changes just a little (bounded by a epsilon), when we change a sample in the training data set.
With this notion, we show two theorems about stable algorithms.
One is that if the stochastic algorithm is stable, then the generalization gap is bounded.
The second is that if the number of the iteration is bounded, then SGD is stable.
As a consequence, these two theorems indicate that if we can train DNNs faster, then they generalize better. (Remenber the case of "random labelling" by Zhang, et al.)
Public events of RIKEN Center for Advanced Intelligence Project (AIP)
Join community