Electronic Journal of Statistics

Sparse supervised dimension reduction in high dimensional classification

Junhui Wang and Lifeng Wang

Full-text: Open access


Supervised dimension reduction has proven effective in analyzing data with complex structure. The primary goal is to seek the reduced subspace of minimal dimension which is sufficient for summarizing the data structure of interest. This paper investigates the supervised dimension reduction in high dimensional classification context, and proposes a novel method for estimating the dimension reduction subspace while retaining the ideal classification boundary based on the original dataset. The proposed method combines the techniques of margin based classification and shrinkage estimation, and can estimate the dimension and the directions of the reduced subspace simultaneously. Both theoretical and numerical results indicate that the proposed method is highly competitive against its competitors, especially when the dimension of the covariates exceeds the sample size.

Article information

Electron. J. Statist., Volume 4 (2010), 914-931.

First available in Project Euclid: 15 September 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]

Dimension reduction SAVE SIR large-p-small-n support vector machine tuning


Wang, Junhui; Wang, Lifeng. Sparse supervised dimension reduction in high dimensional classification. Electron. J. Statist. 4 (2010), 914--931. doi:10.1214/10-EJS572. https://projecteuclid.org/euclid.ejs/1284557753

Export citation


  • [1] Aeberhard, S., Coomans, D. and De Vel, O. (1994). Comparative analysis of statistical pattern recognition methods in high dimensional settings., Pattern Recog., 27, 1065-1077.
  • [2] Antoniadis, A., Lambert-Lacroix, S. and Leblanc, F. (2003). Effective dimension reduction methods for tumor classification using gene expression data., Bioinformatics, 19, 563-570.
  • [3] Cook, R.D. (1998)., Regression Graphics: Ideas for Studying Regressions Through Graphics. Wiley, New York.
  • [4] Cook, R.D. and Lee, H. (1999). Dimension reduction in regressions with a binary response., J. Am. Stat. Assoc., 94, 1187-1200.
  • [5] Cook, R.D. and Li, B. (2002). Dimension reduction for the conditional mean., Ann. Statist., 30, 455-474.
  • [6] Cook, R.D. and Yin, X. (2001). Dimension reduction and visualization in discriminant analysis (with discussion)., Aust. N. Z. J. Statist., 43, 147-199.
  • [7] Cook, R.D. and Weisberg, S. (1991). Discussion of “Sliced inverse regression for dimension reduction” by K.C. Li., J. Am. Statist. Assoc., 86, 328-332.
  • [8] Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation., J. Am. Statist. Assoc., 78, 316-331.
  • [9] Efron, B. (2004). The estimation of prediction error: covariance penalties and cross-validation., J. Am. Statist. Assoc., 99, 619-632.
  • [10] Fan, J. and Gijbels, I. (1996)., Local Polynomical Modelling and Its Applications. Chapman and Hall, London.
  • [11] Golub, T., Slonim, D., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J., Coller, H., Loh, M., Downing, J. and Caligiuri, M. (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring., Science, 286, 531-6.
  • [12] Hastie, Tibshirani and Friedman (2009)., The elements of statistical learning, 2nd Edition. Springer-Verlag, New York.
  • [13] Härdle, W. and Stoker, T. (1989). Investigating smooth multiple regression by the method of average derivatives., J. Am. Statist. Assoc., 84, 986-995.
  • [14] Hong, Z. and Yang, J. (1991). Optimal discriminant plane for a small number of samples and design method of classifier on the plane, Pattern Recog., 24, 317-324.
  • [15] Hooper, J. (1959). Simultaneous equations and canonical correlation theory., Econometrica, 27, 245-256.
  • [16] Hotelling, H. (1936). Relations between two sets of variates., Biometrika, 28, 321-377.
  • [17] Lee, Y., Lin, Y., and Wahba, G. (2004). Multicategory support vector machines, theory and application to the classification of microarray data and satellite radiance data., J. Am. Statist. Assoc., 99, 67-81.
  • [18] Li, K.C. (1991). Sliced inverse regression for dimension reduction (with Discussion)., J. Am. Statist. Assoc., 86, 316-342.
  • [19] Li, K.C. (1992). On principal Hessian directions for data visualization and dimension reduction: another application of Stein’s lemma., J. Am. Statist. Assoc., 87, 1025-1039.
  • [20] Li, L. (2007). Sparse sufficient dimension reduction., Biometrika, 94, 603-613.
  • [21] Lin, Y. (2002). Support Vector Machines and the Bayes Rule in Classification., Data Mining and Knowledge Discovery, 6, 259-275.
  • [22] Liu, Y. and Shen, X. (2006). Multicategory, ψ-learning. J. Am. Statist. Assoc., 101, 500-509.
  • [23] Shen, X. and Huang, H-C. (2006). Optimal model assessment, selection and combination., J. Am. Statist. Assoc., 101, 554-568.
  • [24] Vapnik, V. (1998)., Statistical Learning Theory. Wiley, New York.
  • [25] Wang, J. and Shen, X. (2006). Estimation of generalization error: random and fixed inputs., Statist. Sinica, 16, 569-588.
  • [26] Wang, L. and Shen, X. (2007). On L1-Norm Multiclass Support Vector Machines: Methodology and Theory., J. Am. Statist. Assoc., 102, 583-594.
  • [27] Wang, S., Nan, B., Zhou, N. and Zhu, J. (2009). Hierarchically penalized Cox regression with grouped variables., Biometrika, 96, 307-322.
  • [28] Wolberg, W.H. and Mangasarian, O.L. (1990). Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology., Proc. Natl. Acad. of Sci., 87, 9193-9196.
  • [29] Xia, Y., Tong, H., Li, W.K. and Zhu, L.-X. (2002). An adaptive estimation of dimension reduction space., J. Roy. Statist. Soc., Ser. B, 64, 363-410.
  • [30] Zhang, T. (2004). Statistical analysis of some multi-category large margin classification methods., J. Mach. Learn. Res., 5, 1225-1251.