Annals of Statistics

Boosting for high-dimensional linear models

Peter Bühlmann

Full-text: Open access


We prove that boosting with the squared error loss, L2Boosting, is consistent for very high-dimensional linear models, where the number of predictor variables is allowed to grow essentially as fast as O(exp(sample size)), assuming that the true underlying regression function is sparse in terms of the 1-norm of the regression coefficients. In the language of signal processing, this means consistency for de-noising using a strongly overcomplete dictionary if the underlying signal is sparse in terms of the 1-norm. We also propose here an AIC-based method for tuning, namely for choosing the number of boosting iterations. This makes L2Boosting computationally attractive since it is not required to run the algorithm multiple times for cross-validation as commonly used so far. We demonstrate L2Boosting for simulated data, in particular where the predictor dimension is large in comparison to sample size, and for a difficult tumor-classification problem with gene expression microarray data.

Article information

Ann. Statist., Volume 34, Number 2 (2006), 559-583.

First available in Project Euclid: 27 June 2006

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators
Secondary: 49M15: Newton-type methods 62P10: Applications to biology and medical sciences 68Q32: Computational learning theory [See also 68T05]

Binary classification gene expression Lasso matching pursuit overcomplete dictionary sparsity variable selection weak greedy algorithm


Bühlmann, Peter. Boosting for high-dimensional linear models. Ann. Statist. 34 (2006), no. 2, 559--583. doi:10.1214/009053606000000092.

Export citation


  • Breiman, L. (1998). Arcing classifiers (with discussion). Ann. Statist. 26 801–849.
  • Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493–1517.
  • Bühlmann, P. and Yu, B. (2003). Boosting with the $l_2$ loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339.
  • Bühlmann, P. and Yu, B. (2005). Sparse boosting. J. Machine Learning Research. To appear.
  • Chen, S., Donoho, D. and Saunders, M. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • CRAN (1997 ff.). The comprehensive R archive network. Available at
  • Dettling, M. and Bühlmann, P. (2004). Finding predictive gene groups from microarray data. J. Multivariate Anal. 90 106–131.
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Dudoit, S., Fridlyand, J. and Speed, T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc. 97 77–87.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
  • Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. In Machine Learning: Proc. Thirteenth International Conference 148–156. Morgan Kaufmann, San Francisco.
  • Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407.
  • Goldenshluger, A. and Tsybakov, A. (2001). Adaptive prediction and estimation in linear regression with infinitely many parameters. Ann. Statist. 29 1601–1619.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of over-parametrization. Bernoulli 10 971–988.
  • Hurvich, C., Simonoff, J. and Tsai, C.-L. (1998). Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 271–293.
  • Jiang, W. (2004). Process consistency for AdaBoost (with discussion). Ann. Statist. 32 13–29, 85–134.
  • Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods (with discussion). Ann. Statist. 32 30–55, 85–134.
  • Mallat, S. and Zhang, Z. (1993). Matching pursuits with time–frequency dictionaries. IEEE Trans. Signal Proc. 41 3397–3415.
  • Schapire, R. (2002). The boosting approach to machine learning: An overview. Nonlinear Estimation and Classification. Lecture Notes in Statist. 171 149–171. Springer, New York.
  • Temlyakov, V. (2000). Weak greedy algorithms. Adv. Comput. Math. 12 213–227.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tukey, J. (1977). Exploratory Data Analysis. Addison–Wesley, Reading, MA.
  • West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, J., Marks, J. and Nevins, J. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. U.S.A. 98 11,462–11,467.
  • Zhang, T. and Yu, B. (2005). Boosting with early stopping: Convergence and consistency. Ann. Statist. 33 1538–1579.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.