Institute of Mathematical Statistics Collections

Hierarchical selection of variables in sparse high-dimensional regression

Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov

Full-text: Open access


We study a regression model with a huge number of interacting variables. We consider a specific approximation of the regression function under two assumptions: (i) there exists a sparse representation of the regression function in a suggested basis, (ii) there are no interactions outside of the set of the corresponding main effects. We suggest an hierarchical randomized search procedure for selection of variables and of their interactions. We show that given an initial estimator, an estimator with a similar prediction loss but with a smaller number of non-zero coordinates can be found.

Chapter information

James O. Berger, T. Tony Cai and Iain M. Johnstone, eds., Borrowing Strength: Theory Powering Applications – A Festschrift for Lawrence D. Brown (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2010), 56-69

First available in Project Euclid: 26 October 2010

Permanent link to this document

Digital Object Identifier

Primary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43] 62G08: Nonparametric regression
Secondary: 62C20: Minimax procedures 62G05: Estimation 62G20: Asymptotic properties

linear models model selection nonparametric statistics

Copyright © 2010, Institute of Mathematical Statistics


Bickel, Peter J.; Ritov, Ya’acov; Tsybakov, Alexandre B. Hierarchical selection of variables in sparse high-dimensional regression. Borrowing Strength: Theory Powering Applications – A Festschrift for Lawrence D. Brown, 56--69, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2010. doi:10.1214/10-IMSCOLL605.

Export citation


  • [1] Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • [2] Barron, A. (1993). Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans. Inform. Theory 39 930–945.
  • [3] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of lasso and dantzig selector. Ann. Statist. 37 1705–1732.
  • [4] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Aggregation for gaussian regression. Ann. Statist. 35 1674–1697.
  • [5] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the lasso. Electron. J. Statist. 1 169–194.
  • [6] Candes, E. and Tao, T. (2007). The dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [7] Chen, S., Donoho, D. and Saunders, M. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129–159.
  • [8] Dalalyan, A. and Tsybakov, A. (2008). Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity. Mach. Learn. 72 39–61.
  • [9] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
  • [10] Fu, W. and Knight, K. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
  • [11] Golubev, G. (2002). Reconstruction of sparse vectors in white gaussian noise. Problems of Information Transmission 38 65–79.
  • [12] Greenshtein, E. and Ritov, Y. (2004). Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10 971–988.
  • [13] Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric estimation. Ann. Statist. 28 681–712.
  • [14] Koltchinskii, V. (2009). Sparse recovery in convex hulls via entropy penalization. Ann. Statist. 37 1332–1359.
  • [15] Koltchinskii, V. (2009). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré. Probab. Statist. 45 7–57.
  • [16] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [17] Meinshausen, N. and Yu, B. (2009). Lasso type recovery of sparse representations for high dimensional data. Ann. Statist. 37 246–270.
  • [18] Nemirovski, A. (2000). Topics in Non-parametric Statistics, Ecole d’Eté de Probabilités de Saint-Flour XXVIII - 1998. Lecture Notes in Mathematics 1738. Springer, New York.
  • [19] Osborne, M., Presnell, B. and Turlach, B. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–404.
  • [20] Osborne, M., Presnell, B. and Turlach, B. (2000b). On the lasso and its dual. J. Computat. Graph. Statist. 9 319–337.
  • [21] Pisier, G. (1981). Remarques sur un résultat non-publié de B. Maurey. Séminaire d’analyse fonctionnelle 1980–1981 V.1–V.12. École Polytechnique, Centre de Mathématiques, Palaiseau.
  • [22] Stone, C. J., Hansen, M. H., Kooperberg, C. and Truong, Y. K. (1997). Polynomial splines and their tensor products in extended linear modeling. Ann. Statist. 25 1371–1425.
  • [23] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [24] Turlach, B. (2005). On algorithms for solving least squares problems under an l1 penalty or an l1 constraint. In 2004 Proceedings of the American Statistical Association, Statistical Computing Section [CD-ROM] 2572–2577. Amer. Statist. Assoc., Alexandria, VA.
  • [25] van de Geer, S. (2008). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
  • [26] Zhang, C.-H. and Huang, J. (2008). Model-selection consistency of the lasso in high-dimensional regression. Ann. Statist. 36 1567–1594.
  • [27] Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. J. Mach. Learn. Res. 7 2541–2563.