Early stopping for L2-boosting in high-dimensional linear models

Bernhard Stankewitz

doi:10.1214/24-AOS2356

Abstract

Increasingly high-dimensional data sets require that estimation methods do not only satisfy statistical guarantees but also remain computationally feasible. In this context, we consider $L^{2}$ -boosting via orthogonal matching pursuit in a high-dimensional linear model and analyze a data-driven early stopping time τ of the algorithm, which is sequential in the sense that its computation is based on the first τ iterations only. This approach is much less costly than established model selection criteria, that require the computation of the full boosting path, which may even be computationally infeasible in truly high-dimensional applications. We prove that sequential early stopping preserves statistical optimality in this setting in terms of a fully general oracle inequality for the empirical risk and recently established optimal convergence rates for the population risk. Finally, an extensive simulation study shows that at a significantly reduced computational cost, the performance of early stopping methods is on par with other state of the art algorithms such as the cross-validated Lasso or model selection via a high-dimensional Akaike criterion based on the full boosting path.

Funding Statement

The research of the author has been partially funded by the Deutsche Forschungsgemeinschaft (DFG)—Project-ID 318763901-SFB1294. Cofunded by the European Union (ERC, BigBayesUQ, project number: 101041064). Views and opinions expressed are, however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.

Acknowledgments

The author is very grateful for the discussions with Markus Reiß and Martin Wahl that were indispensable during the preparation of this paper. Further, the author would like to thank Botond Szabo, Martin Spindler, Richard Nickl, the Associate Editor and two anonymous referees for very valuable feedback during the revision process.

Citation

Download Citation

Bernhard Stankewitz. "Early stopping for $L^{2}$ -boosting in high-dimensional linear models." Ann. Statist. 52 (2) 491 - 518, April 2024. https://doi.org/10.1214/24-AOS2356

Information

Received: 1 October 2022; Revised: 1 October 2023; Published: April 2024

First available in Project Euclid: 9 May 2024

Digital Object Identifier: 10.1214/24-AOS2356

Subjects:

Primary: 62G05 , 62J07

Secondary: 62F35

Keywords: adaptive estimation , discrepancy principle , early stopping , L2-boosting , Oracle inequalities , orthogonal matching pursuit

Abstract

Funding Statement

Acknowledgments

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS