Interpolators—estimators that achieve zero training error—have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum norm (“ridgeless”) interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors are obtained by applying a linear transform to a vector of i.i.d. entries, (with ); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, (with , a matrix of i.i.d. entries, and φ an activation function acting componentwise on ). We recover—in a precise quantitative way—several phenomena that have been observed in large-scale neural networks and kernel machines, including the “double descent” behavior of the prediction risk, and the potential benefits of overparametrization.
TH was partially supported by NSF Grants DMS-1407548, NSF IIS-1837931 and NIH 5R01-EB001988-21.
AM was partially supported by NSF Grants DMS-1613091, NSF CCF-1714305, NSF IIS-1741162 and ONR N00014-18-1-2729.
RT was partially supported by NSF Grant DMS-1554123.
The authors are grateful to Brad Efron, Rob Tibshirani and Larry Wasserman for inspiring us to work on this in the first place. RT sincerely thanks Edgar Dobriban for many helpful conversations about the random matrix theory literature, in particular, the literature trail leading up to Proposition 2, and the reference to Rubio and Mestre . We are also grateful to Dmitry Kobak for bringing to our attention  and clarifying the implications of this work on our setting.
"Surprises in high-dimensional ridgeless least squares interpolation." Ann. Statist. 50 (2) 949 - 986, April 2022. https://doi.org/10.1214/21-AOS2133