The Annals of Statistics

High-dimensional variable selection

Larry Wasserman and Kathryn Roeder

Full-text: Open access

Abstract

This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in high-dimensional models? In particular, we look at the error rates and power of some multi-stage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by cross-validation. In the third stage we use hypothesis testing to eliminate some variables. We refer to the first two stages as “screening” and the last stage as “cleaning.” We consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Our method gives consistent variable selection under certain conditions.

Article information

Source
Ann. Statist., Volume 37, Number 5A (2009), 2178-2201.

Dates
First available in Project Euclid: 15 July 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1247663752

Digital Object Identifier
doi:10.1214/08-AOS646

Mathematical Reviews number (MathSciNet)
MR2543689

Zentralblatt MATH identifier
1173.62054

Subjects
Primary: 62J05: Linear regression
Secondary: 62J07: Ridge regression; shrinkage estimators

Keywords
Lasso stepwise regression sparsity

Citation

Wasserman, Larry; Roeder, Kathryn. High-dimensional variable selection. Ann. Statist. 37 (2009), no. 5A, 2178--2201. doi:10.1214/08-AOS646. https://projecteuclid.org/euclid.aos/1247663752


Export citation

References

  • Barron, A., Cohen, A., Dahmen, W. and DeVore, R. (2008). Approximation and learning by greedy algorithms. Ann. Statist. 36 64–94.
  • Bühlmann, P. (2006). Boosting for high-dimensional linear models. Ann. Statist. 34 559–583.
  • Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • Donoho, D. (2006). For most large underdetermined systems of linear equations, the minimal l1-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 797–829.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultra-high-dimensional feature space. J. Roy. Statist. Soc. Ser. B 70 849–911.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Meinshausen, N. (2007). Relaxed lasso. Comput. Statist. Data Anal. 52 374–393.
  • Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations of high-dimensional data. Ann. Statist. To appear.
  • Orwoll, E., Blank, J. B., Barrett-Connor, E., Cauley, J., Cummings, S., Ensrud, K., Lewis, C., Cawthon, P. M., Marcus, R., Marshall, L. M., McGowan, J., Phipps, K., Sherman, S., Stefanick, M. L. and Stone, K. (2005). Design and baseline characteristics of the osteoporotic fractures in men (MrOS) study—a large observational study of the determinants of fracture in older men. Contemp. Clin. Trials. 26 569–585.
  • Robins, J., Scheines, R., Spirtes, P. and Wasserman, L. (2003). Uniform consistency in causal inference. Biometrika 90 491–515.
  • Spirtes, P., Glymour, C. and Scheines, R. (2001). Causation, Prediction, and Search. MIT Press.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tropp, J. A. (2004). Greed is good: Algorithmic results for sparse approximation. IEEE Trans. Inform. Theory 50 2231–2242.
  • Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030–1051.
  • Wainwright, M. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Available at arxiv.org/math.ST/0605740.
  • Wellcome Trust (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3000 shared controls. Nature 447 661–678.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of lasso. J. Mach. Learn. Res. 7 2541–2563.
  • Zhang, C. H. and Huang, J. (2006). Model selection consistency of the lasso in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.