Implicit biases of optimization algorithms for neural networks and their effects on generalization
Reporter:
Dr. Chao Ma, Stanford University
Inviter:
Pingbing Ming, Professor
Subject:
Implicit biases of optimization algorithms for neural networks and their effects on generalization
Time and place:
9:00-10:00 December 23( Friday),Tencent Meeting ID: 247-520-003
Abstract:
Modern neural networks are usually over-parameterized—the number of parameters exceeds the number of training data. In this case the loss functions tend to have many (or even infinite) global minima, which imposes an additional challenge of minima selection on optimization algorithms besides the convergence. Specifically, when training a neural network, the algorithm not only has to find a global minimum, but also needs to select minima with good generalization among many other bad ones. In this talk, we connect the implicit bias of optimization algorithms and the generalization performance via two steps. First, with a linear stability analysis around global minima, we show that stochastic gradient descent (SGD) favors flat and uniform global minima. Then, we build a theoretical connection of flatness and generalization performance based on a special multiplicative structure of neural networks. Together, we show that SGD tends to find global minima with good generalization. Bounds for generalization error and adversarial robustness depending on SGD hyperparameters are derived.