Colloquia & Seminars - Activities - 计算数学与科学工程计算研究所

Activities

< 返回

Why does SGD converge to generalizable solutions? A quantitative explanation via linear stability

Home - Activities

Reporter：

Lei Wu, School of Mathematical Sciences, Peking University

Inviter：

Haijun Yu, professor

Subject：

Why does SGD converge to generalizable solutions? A quantitative explanation via linear stability

Time and place：

16:00-17:00 May 26(Thursday)

Abstract：

Deep learning models are often operated with far more unknown parameters than training examples. In such a case, there exist many global minima, but their test performances can be very different. Fortunately, stochastic gradient descent (SGD) can select the good ones without needing any explicit regularizations, suggesting certain "implicit regularization" at work. This talk will provide a quantitative explanation of this striking phenomenon from the perspective of dynamical stability. We prove that if a global minimum is linearly stable for SGD, then the flatness---as measured by the Hessian's Frobenius norm---must be bounded independently of the model size and sample size. Moreover, this flatness can bound the generalization gap of two-layer neural networks. Together, we show that SGD tends to converge to flat minima and flat minima provably generalize well. Note that these are made possible by exploiting the particular geometry-aware structure of SGD noise.