October 2022 The interpolation phase transition in neural networks: Memorization and generalization under lazy training
Andrea Montanari, Yiqiao Zhong
Author Affiliations +
Ann. Statist. 50(5): 2816-2847 (October 2022). DOI: 10.1214/22-AOS2211

Abstract

Modern neural networks are often operated in a strongly overparametrized regime: they comprise so many parameters that they can interpolate the training set, even if actual labels are replaced by purely random ones. Despite this, they achieve good prediction error on unseen data: interpolating the training set does not lead to a large generalization error. Further, overparametrization appears to be beneficial in that it simplifies the optimization landscape. Here, we study these phenomena in the context of two-layers neural networks in the neural tangent (NT) regime. We consider a simple data model, with isotropic covariates vectors in d dimensions, and N hidden neurons. We assume that both the sample size n and the dimension d are large, and they are polynomially related. Our first main result is a characterization of the eigenstructure of the empirical NT kernel in the overparametrized regime Ndn. This characterization implies as a corollary that the minimum eigenvalue of the empirical NT kernel is bounded away from zero as soon as Ndn and, therefore, the network can exactly interpolate arbitrary labels in the same regime.

Our second main result is a characterization of the generalization error of NT ridge regression including, as a special case, min-2 norm interpolation. We prove that, as soon as Ndn, the test error is well approximated by the one of kernel ridge regression with respect to the infinite-width kernel. The latter is in turn well approximated by the error of polynomial ridge regression, whereby the regularization parameter is increased by a “self-induced” term related to the high-degree components of the activation function. The polynomial degree depends on the sample size and the dimension (in particular on logn/logd).

Funding Statement

This work was supported by NSF through award DMS-2031883 and from the Simons Foundation through Award 814639 for the Collaboration on the Theoretical Foundations of Deep Learning. We also acknowledge NSF Grants CCF-1714305, IIS-1741162 and the ONR Grant N00014-18-1-2729.

Acknowledgments

We thank Behrooz Ghorbani and Song Mei for helpful discussion.

Citation

Download Citation

Andrea Montanari. Yiqiao Zhong. "The interpolation phase transition in neural networks: Memorization and generalization under lazy training." Ann. Statist. 50 (5) 2816 - 2847, October 2022. https://doi.org/10.1214/22-AOS2211

Information

Received: 1 August 2020; Revised: 1 June 2022; Published: October 2022
First available in Project Euclid: 27 October 2022

MathSciNet: MR4500626
zbMATH: 07628842
Digital Object Identifier: 10.1214/22-AOS2211

Subjects:
Primary: 62J07
Secondary: 62J05

Keywords: kernel ridge regression , memorization , neural tangent kernel , overfitting , overparametrization

Rights: Copyright © 2022 Institute of Mathematical Statistics

JOURNAL ARTICLE
32 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.50 • No. 5 • October 2022
Back to Top