The primary task of many applications is to approximate or estimate a function through samples drawn from a probability distribution on the input space. The deep approximation approach involves approximating the function by compositions of many layers of simple functions, which can be viewed as a series of nested feature extractors. The key idea of deep learning networks is to convert these layers of compositions into layers of tunable parameters that can be adjusted through a learning process, achieving a good approximation with respect to the input data.
In this talk, we will discuss the mathematical foundation behind this new approach and the approximation rate of deep networks. We will also show how this new approach differs from the classic approximation theory, and how this new theory can be used to understand and design deep learning networks.