Open Access
Translator Disclaimer
April 2011 Sparse linear discriminant analysis by thresholding for high dimensional data
Jun Shao, Yazhen Wang, Xinwei Deng, Sijian Wang
Ann. Statist. 39(2): 1241-1265 (April 2011). DOI: 10.1214/10-AOS870


In many social, economical, biological and medical studies, one objective is to classify a subject into one of several classes based on a set of variables observed from the subject. Because the probability distribution of the variables is usually unknown, the rule of classification is constructed using a training sample. The well-known linear discriminant analysis (LDA) works well for the situation where the number of variables used for classification is much smaller than the training sample size. Because of the advance in technologies, modern statistical studies often face classification problems with the number of variables much larger than the sample size, and the LDA may perform poorly. We explore when and why the LDA has poor performance and propose a sparse LDA that is asymptotically optimal under some sparsity conditions on the unknown parameters. For illustration of application, we discuss an example of classifying human cancer into two classes of leukemia based on a set of 7,129 genes and a training sample of size 72. A simulation is also conducted to check the performance of the proposed method.


Download Citation

Jun Shao. Yazhen Wang. Xinwei Deng. Sijian Wang. "Sparse linear discriminant analysis by thresholding for high dimensional data." Ann. Statist. 39 (2) 1241 - 1265, April 2011.


Published: April 2011
First available in Project Euclid: 9 May 2011

zbMATH: 1215.62062
MathSciNet: MR2816353
Digital Object Identifier: 10.1214/10-AOS870

Primary: 62H30
Secondary: 62F12 , 62G12

Keywords: ‎classification‎ , high dimensionality , misclassification rate , normality , optimal classification rule , sparse estimates

Rights: Copyright © 2011 Institute of Mathematical Statistics


Vol.39 • No. 2 • April 2011
Back to Top