Colloquia of 2022 - 学术活动 - 计算数学与科学工程计算研究所

学术活动

< 返回

Why does SGD converge to generalizable solutions? A quantitative explanation via linear stability

首页 - 学术活动

报告人：

Lei Wu, School of Mathematical Sciences, Peking University

邀请人：

Haijun Yu, professor

题目：

Why does SGD converge to generalizable solutions? A quantitative explanation via linear stability

时间地点：

16:00-17:00 May 26(Thursday)

摘要：

Deep learning models are often operated with far more unknown parameters than training examples. In such a case, there exist many global minima, but their test performances can be very different. Fortunately, stochastic gradient descent (SGD) can select the good ones without needing any explicit regularizations, suggesting certain "implicit regularization" at work. This talk will provide a quantitative explanation of this striking phenomenon from the perspective of dynamical stability. We prove that if a global minimum is linearly stable for SGD, then the flatness---as measured by the Hessian's Frobenius norm---must be bounded independently of the model size and sample size. Moreover, this flatness can bound the generalization gap of two-layer neural networks. Together, we show that SGD tends to converge to flat minima and flat minima provably generalize well. Note that these are made possible by exploiting the particular geometry-aware structure of SGD noise.