Surprises in high-dimensional ridgeless least squares interpolation

Trevor Hastie; Andrea Montanari; Saharon Rosset; Ryan J. Tibshirani

doi:10.1214/21-AOS2133

Abstract

Interpolators—estimators that achieve zero training error—have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum ${\ell _{2}}$ norm (“ridgeless”) interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors ${x_{i}}\in {\mathbb{R}^{p}}$ are obtained by applying a linear transform to a vector of i.i.d. entries, ${x_{i}}={\Sigma ^{1/2}}{z_{i}}$ (with ${z_{i}}\in {\mathbb{R}^{p}}$ ); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, ${x_{i}}=\varphi (W{z_{i}})$ (with ${z_{i}}\in {\mathbb{R}^{d}}$ , $W\in {\mathbb{R}^{p\times d}}$ a matrix of i.i.d. entries, and φ an activation function acting componentwise on $W{z_{i}}$ ). We recover—in a precise quantitative way—several phenomena that have been observed in large-scale neural networks and kernel machines, including the “double descent” behavior of the prediction risk, and the potential benefits of overparametrization.

Funding Statement

TH was partially supported by NSF Grants DMS-1407548, NSF IIS-1837931 and NIH 5R01-EB001988-21.
AM was partially supported by NSF Grants DMS-1613091, NSF CCF-1714305, NSF IIS-1741162 and ONR N00014-18-1-2729.
RT was partially supported by NSF Grant DMS-1554123.

Acknowledgments

The authors are grateful to Brad Efron, Rob Tibshirani and Larry Wasserman for inspiring us to work on this in the first place. RT sincerely thanks Edgar Dobriban for many helpful conversations about the random matrix theory literature, in particular, the literature trail leading up to Proposition 2, and the reference to Rubio and Mestre [60]. We are also grateful to Dmitry Kobak for bringing to our attention [42] and clarifying the implications of this work on our setting.

Citation

Download Citation

Trevor Hastie. Andrea Montanari. Saharon Rosset. Ryan J. Tibshirani. "Surprises in high-dimensional ridgeless least squares interpolation." Ann. Statist. 50 (2) 949 - 986, April 2022. https://doi.org/10.1214/21-AOS2133

Information

Received: 1 June 2019; Revised: 1 August 2021; Published: April 2022

First available in Project Euclid: 7 April 2022

MathSciNet: MR4404925

zbMATH: 1486.62202

Digital Object Identifier: 10.1214/21-AOS2133

Subjects:

Primary: 62J05 , 62J07

Secondary: 62F12 , 62J02

Keywords: interpolation , overparametrization , Random matrix theory , regression , Ridge regression

Abstract

Funding Statement

Acknowledgments

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS