April 2022 Surprises in high-dimensional ridgeless least squares interpolation
Trevor Hastie, Andrea Montanari, Saharon Rosset, Ryan J. Tibshirani
Author Affiliations +
Ann. Statist. 50(2): 949-986 (April 2022). DOI: 10.1214/21-AOS2133

Abstract

Interpolators—estimators that achieve zero training error—have attracted growing attention in machine learning, mainly because state-of-the art neural networks appear to be models of this type. In this paper, we study minimum 2 norm (“ridgeless”) interpolation least squares regression, focusing on the high-dimensional regime in which the number of unknown parameters p is of the same order as the number of samples n. We consider two different models for the feature distribution: a linear model, where the feature vectors xiRp are obtained by applying a linear transform to a vector of i.i.d. entries, xi=Σ1/2zi (with ziRp); and a nonlinear model, where the feature vectors are obtained by passing the input through a random one-layer neural network, xi=φ(Wzi) (with ziRd, WRp×d a matrix of i.i.d. entries, and φ an activation function acting componentwise on Wzi). We recover—in a precise quantitative way—several phenomena that have been observed in large-scale neural networks and kernel machines, including the “double descent” behavior of the prediction risk, and the potential benefits of overparametrization.

Funding Statement

TH was partially supported by NSF Grants DMS-1407548, NSF IIS-1837931 and NIH 5R01-EB001988-21.
AM was partially supported by NSF Grants DMS-1613091, NSF CCF-1714305, NSF IIS-1741162 and ONR N00014-18-1-2729.
RT was partially supported by NSF Grant DMS-1554123.

Acknowledgments

The authors are grateful to Brad Efron, Rob Tibshirani and Larry Wasserman for inspiring us to work on this in the first place. RT sincerely thanks Edgar Dobriban for many helpful conversations about the random matrix theory literature, in particular, the literature trail leading up to Proposition 2, and the reference to Rubio and Mestre [60]. We are also grateful to Dmitry Kobak for bringing to our attention [42] and clarifying the implications of this work on our setting.

Citation

Download Citation

Trevor Hastie. Andrea Montanari. Saharon Rosset. Ryan J. Tibshirani. "Surprises in high-dimensional ridgeless least squares interpolation." Ann. Statist. 50 (2) 949 - 986, April 2022. https://doi.org/10.1214/21-AOS2133

Information

Received: 1 June 2019; Revised: 1 August 2021; Published: April 2022
First available in Project Euclid: 7 April 2022

MathSciNet: MR4404925
zbMATH: 1486.62202
Digital Object Identifier: 10.1214/21-AOS2133

Subjects:
Primary: 62J05 , 62J07
Secondary: 62F12 , 62J02

Keywords: interpolation , overparametrization , Random matrix theory , regression , Ridge regression

Rights: Copyright © 2022 Institute of Mathematical Statistics

JOURNAL ARTICLE
38 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.50 • No. 2 • April 2022
Back to Top