“Preconditioning” for feature selection and regression in high-dimensional problems



The Annals of Statistics

“Preconditioning” for feature selection and regression in high-dimensional problems

Debashis Paul, Eric Bair, Trevor Hastie, and Robert Tibshirani

Source: Ann. Statist. Volume 36, Number 4 (2008), 1595-1618.

Abstract

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a “preconditioned” response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the preconditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the preconditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data.

Primary Subjects: 62J07
Keywords: Model selection; prediction error; lasso

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Alternatively, the document is available for a cost of $15. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1216237293
Digital Object Identifier: doi:10.1214/009053607000000578

References

[1] Bair, E., Hastie, T., Paul, D. and Tibshirani, R. (2006). Prediction by supervised principal components. J. Amer. Statist. Assoc. 101 119–137.
Mathematical Reviews (MathSciNet): MR2252436
Digital Object Identifier: doi:10.1198/016214505000000628
[2] Bair, E. and Tibshirani, R. (2004). Semi-supervised methods to predict patient survival from gene expression data. PLOS Biology 2 511–522.
[3] Donoho, D. (2004). For most large underdetermined systems of equations, the minimal 1-norm solution is the sparsest solution. Technical report, Stanford Univ.
[4] Donoho, D. and Elad, M. (2003). Optimally sparse representation from overcomplete dictionaries via 1-norm minimization. Proc. Natl. Acad. Sci. USA 100 2197–2202.
Mathematical Reviews (MathSciNet): MR1963681
Digital Object Identifier: doi:10.1073/pnas.0437847100
[5] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
Mathematical Reviews (MathSciNet): MR2060166
Digital Object Identifier: doi:10.1214/009053604000000067
Project Euclid: euclid.aos/1083178935
[6] Fan, J. and Li, R. (2005). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
Mathematical Reviews (MathSciNet): MR1946581
Digital Object Identifier: doi:10.1198/016214501753382273
[7] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
Mathematical Reviews (MathSciNet): MR2065194
Digital Object Identifier: doi:10.1214/009053604000000256
Project Euclid: euclid.aos/1085408491
[8] Kalbfleisch, J. and Prentice, R. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York.
Mathematical Reviews (MathSciNet): MR570114
Zentralblatt MATH: 0504.62096
[9] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
Mathematical Reviews (MathSciNet): MR1805787
Digital Object Identifier: doi:10.1214/aos/1015957397
Project Euclid: euclid.aos/1015957397
[10] Meinshausen, M. (2005). Lasso with relaxation. Research Report 129, ETH Zürich.
[11] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
[12] Osborne, M., Presnell, B. and Turlach, B. (2000). On the lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
Mathematical Reviews (MathSciNet): MR1822089
Digital Object Identifier: doi:10.2307/1390657
[13] Park, M. Y. and Hastie, T. (2006). An l1 regularization-path algorithm for generalized linear models. Unpublished manuscript.
[14] Paul, D. (2005). Nonparametric estimation of principal components. Ph.D. dissertation, Dept. Statistics, Stanford Univ.
[15] Shen, X. and Ye, J. (2002). Adaptive model selection. J. Amer. Statist. Assoc. 97 210–221.
Mathematical Reviews (MathSciNet): MR1947281
Digital Object Identifier: doi:10.1198/016214502753479356
[16] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242
[17] Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2001). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99 6567–6572.
[18] Zhao, H., Tibshirani, R. and Brooks, J. (2005). Gene expression profiling predicts survival in conventional renal cell carcinoma. PloS. Med. 3(1) e13.
[19] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
Mathematical Reviews (MathSciNet): MR2274449
[20] Zou, H. (2005). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
Mathematical Reviews (MathSciNet): MR2279469
Digital Object Identifier: doi:10.1198/016214506000000735

2008 © Institute of Mathematical Statistics