Open Access
August 2008 “Preconditioning” for feature selection and regression in high-dimensional problems
Debashis Paul, Eric Bair, Trevor Hastie, Robert Tibshirani
Ann. Statist. 36(4): 1595-1618 (August 2008). DOI: 10.1214/009053607000000578

Abstract

We consider regression problems where the number of predictors greatly exceeds the number of observations. We propose a method for variable selection that first estimates the regression function, yielding a “preconditioned” response variable. The primary method used for this initial regression is supervised principal components. Then we apply a standard procedure such as forward stepwise selection or the LASSO to the preconditioned response variable. In a number of simulated and real data examples, this two-step procedure outperforms forward stepwise selection or the usual LASSO (applied directly to the raw outcome). We also show that under a certain Gaussian latent variable model, application of the LASSO to the preconditioned response variable is consistent as the number of predictors and observations increases. Moreover, when the observational noise is rather large, the suggested procedure can give a more accurate estimate than LASSO. We illustrate our method on some real problems, including survival analysis with microarray data.

Citation

Download Citation

Debashis Paul. Eric Bair. Trevor Hastie. Robert Tibshirani. "“Preconditioning” for feature selection and regression in high-dimensional problems." Ann. Statist. 36 (4) 1595 - 1618, August 2008. https://doi.org/10.1214/009053607000000578

Information

Published: August 2008
First available in Project Euclid: 16 July 2008

zbMATH: 1142.62022
MathSciNet: MR2435449
Digital Object Identifier: 10.1214/009053607000000578

Subjects:
Primary: 62J07

Keywords: Lasso , Model selection , prediction error

Rights: Copyright © 2008 Institute of Mathematical Statistics

Vol.36 • No. 4 • August 2008
Back to Top