Boosting for high-dimensional linear models

Peter Bühlmann

doi:10.1214/009053606000000092

April 2006 Boosting for high-dimensional linear models

Peter Bühlmann

Ann. Statist. 34(2): 559-583 (April 2006). DOI: 10.1214/009053606000000092

Abstract

We prove that boosting with the squared error loss, L₂Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the ℓ₁-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the ℓ₁-norm. We also propose here an AIC-based method for tuning, namely for choosing the number of boosting iterations. This makes L₂Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate L₂Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.

Citation

Download Citation

Peter Bühlmann. "Boosting for high-dimensional linear models." Ann. Statist. 34 (2) 559 - 583, April 2006. https://doi.org/10.1214/009053606000000092

Information

Published: April 2006

First available in Project Euclid: 27 June 2006

zbMATH: 1095.62077

MathSciNet: MR2281878

Digital Object Identifier: 10.1214/009053606000000092

Subjects:

Primary: 62J05 , 62J07

Secondary: 49M15 , 62P10 , 68Q32

Keywords: Binary classification , gene expression , Lasso , matching pursuit , overcomplete dictionary , Sparsity , Variable selection , weak greedy algorithm

Access the abstract

JOURNAL ARTICLE
25 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY