November 2021 Over-parametrized deep neural networks minimizing the empirical risk do not generalize well
Michael Kohler, Adam Krzyżak
Author Affiliations +
Bernoulli 27(4): 2564-2597 (November 2021). DOI: 10.3150/21-BEJ1323

Abstract

Recently it was shown in several papers that backpropagation is able to find the global minimum of the empirical risk on the training data using over-parametrized deep neural networks. In this paper, a similar result is shown for deep neural networks with the sigmoidal squasher activation function in a regression setting, and a lower bound is presented which proves that these networks do not generalize well on a new data in the sense that networks which minimize the empirical risk do not achieve the optimal minimax rate of convergence for estimation of smooth regression functions.

Acknowledgements

The authors would like to thank the Associate Editor and three anonymous referees for various invaluable comment, which helped very much to improve the presentation. This work was supported in part by the Natural Sciences and Engineering Research Council of Canada under Grant RGPIN-2015-06412 and the second author would like to thank NSERC for funding this work.

Citation

Download Citation

Michael Kohler. Adam Krzyżak. "Over-parametrized deep neural networks minimizing the empirical risk do not generalize well." Bernoulli 27 (4) 2564 - 2597, November 2021. https://doi.org/10.3150/21-BEJ1323

Information

Received: 1 February 2020; Revised: 1 November 2020; Published: November 2021
First available in Project Euclid: 24 August 2021

MathSciNet: MR4303896
zbMATH: 1504.62052
Digital Object Identifier: 10.3150/21-BEJ1323

Keywords: neural networks , Nonparametric regression , over-parametrization , rate of convergence

Rights: Copyright © 2021 ISI/BS

JOURNAL ARTICLE
34 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.27 • No. 4 • November 2021
Back to Top