Open Access
September 2008 Testing significance of features by lassoed principal components
Daniela M. Witten, Robert Tibshirani
Ann. Appl. Stat. 2(3): 986-1012 (September 2008). DOI: 10.1214/08-AOAS182

Abstract

We consider the problem of testing the significance of features in high-dimensional settings. In particular, we test for differentially-expressed genes in a microarray experiment. We wish to identify genes that are associated with some type of outcome, such as survival time or cancer type. We propose a new procedure, called Lassoed Principal Components (LPC), that builds upon existing methods and can provide a sizable improvement. For instance, in the case of two-class data, a standard (albeit simple) approach might be to compute a two-sample t-statistic for each gene. The LPC method involves projecting these conventional gene scores onto the eigenvectors of the gene expression data covariance matrix and then applying an L1 penalty in order to de-noise the resulting projections. We present a theoretical framework under which LPC is the logical choice for identifying significant genes, and we show that LPC can provide a marked reduction in false discovery rates over the conventional methods on both real and simulated data. Moreover, this flexible procedure can be applied to a variety of types of data and can be used to improve many existing methods for the identification of significant features.

Citation

Download Citation

Daniela M. Witten. Robert Tibshirani. "Testing significance of features by lassoed principal components." Ann. Appl. Stat. 2 (3) 986 - 1012, September 2008. https://doi.org/10.1214/08-AOAS182

Information

Published: September 2008
First available in Project Euclid: 13 October 2008

zbMATH: 1149.62092
MathSciNet: MR2516801
Digital Object Identifier: 10.1214/08-AOAS182

Keywords: Feature selection , gene expression , microarray , multiple testing

Rights: Copyright © 2008 Institute of Mathematical Statistics

Vol.2 • No. 3 • September 2008
Back to Top