2024年06月17日 星期一 登录 EN

学术活动
Why does SGD converge to generalizable solutions? A quantitative explanation via linear stability
首页 - 学术活动
报告人:
Lei Wu, School of Mathematical Sciences, Peking University
邀请人:
Haijun Yu, professor
题目:
Why does SGD converge to generalizable solutions? A quantitative explanation via linear stability
时间地点:
16:00-17:00 May 26(Thursday)
摘要:

Deep learning models are often operated with far more unknown parameters than training examples. In such a case, there exist many global minima, but their test performances can be very different. Fortunately, stochastic gradient descent (SGD) can select the good ones without needing any explicit regularizations, suggesting certain "implicit regularization" at work. This talk will provide a quantitative explanation of this striking phenomenon from the perspective of dynamical stability. We prove that if a global minimum is linearly stable for SGD, then the flatness---as measured by the Hessian's Frobenius norm---must be bounded independently of the model size and sample size. Moreover, this flatness can bound the generalization gap of two-layer neural networks. Together, we show that SGD tends to converge to flat minima and flat minima provably generalize well. Note that these are made possible by exploiting the particular geometry-aware structure of SGD noise.