The Annals of Statistics

Sparse linear discriminant analysis by thresholding for high dimensional data

Jun Shao, Yazhen Wang, Xinwei Deng, and Sijian Wang

Full-text: Open access


In many social, economical, biological and medical studies, one objective is to classify a subject into one of several classes based on a set of variables observed from the subject. Because the probability distribution of the variables is usually unknown, the rule of classification is constructed using a training sample. The well-known linear discriminant analysis (LDA) works well for the situation where the number of variables used for classification is much smaller than the training sample size. Because of the advance in technologies, modern statistical studies often face classification problems with the number of variables much larger than the sample size, and the LDA may perform poorly. We explore when and why the LDA has poor performance and propose a sparse LDA that is asymptotically optimal under some sparsity conditions on the unknown parameters. For illustration of application, we discuss an example of classifying human cancer into two classes of leukemia based on a set of 7,129 genes and a training sample of size 72. A simulation is also conducted to check the performance of the proposed method.

Article information

Ann. Statist., Volume 39, Number 2 (2011), 1241-1265.

First available in Project Euclid: 9 May 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 62F12: Asymptotic properties of estimators 62G12

Classification high dimensionality misclassification rate normality optimal classification rule sparse estimates


Shao, Jun; Wang, Yazhen; Deng, Xinwei; Wang, Sijian. Sparse linear discriminant analysis by thresholding for high dimensional data. Ann. Statist. 39 (2011), no. 2, 1241--1265. doi:10.1214/10-AOS870.

Export citation


  • Bickel, P. J. and Levina, E. (2004). Some theory of Fisher’s linear discriminant function, ‘naive Bayes’, and some alternatives when there are many more variables than observations. Bernoulli 10 989–1010.
  • Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Clemmensen, L., Hastie, T. and Ersbøll, B. (2008). Sparse discriminant analysis. Technical report, Technical Univ. Denmark and Stanford Univ.
  • Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? J. Roy. Statist. Soc. Ser. B 57 301–369.
  • Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
  • Fan, J. and Lv, J. (2010). A selective overview of variable selection in high dimensional feature space. Statist. Sinica 20 101–148.
  • Fang, K. T. and Anderson, T. W. (eds.) (1990). Statistical Inference in Elliptically Contoured and Related Distributions. Allerton Press, New York.
  • Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A., Bloomfield, C. D. and Lander, E. S. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–537.
  • Guo, Y., Hastie, T. and Tibshirani, R. (2007). Regularized linear discriminant analysis and its applications in microarrays. Biostatistics 8 86–100.
  • Kohavi, R. and John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence 97 273–324.
  • Qiao, Z., Zhou, L. and Huang, J. Z. (2009). Sparse linear discriminant analysis with applications to high dimensional low sample size data. IAENG Int. J. Appl. Math. 39 48–60.
  • Zhang, Q. and Wang, H. (2010). On BIC’s selection consistency for discriminant analysis. Statist. Sinica 20. To appear.