The Annals of Statistics

Innovated interaction screening for high-dimensional nonlinear classification

Yingying Fan, Yinfei Kong, Daoji Li, and Zemin Zheng

Full-text: Open access

Abstract

This paper is concerned with the problems of interaction screening and nonlinear classification in a high-dimensional setting. We propose a two-step procedure, IIS-SQDA, where in the first step an innovated interaction screening (IIS) approach based on transforming the original $p$-dimensional feature vector is proposed, and in the second step a sparse quadratic discriminant analysis (SQDA) is proposed for further selecting important interactions and main effects and simultaneously conducting classification. Our IIS approach screens important interactions by examining only $p$ features instead of all two-way interactions of order $O(p^{2})$. Our theory shows that the proposed method enjoys sure screening property in interaction selection in the high-dimensional setting of $p$ growing exponentially with the sample size. In the selection and classification step, we establish a sparse inequality on the estimated coefficient vector for QDA and prove that the classification error of our procedure can be upper-bounded by the oracle classification error plus some smaller order term. Extensive simulation studies and real data analysis show that our proposal compares favorably with existing methods in interaction selection and high-dimensional classification.

Article information

Source
Ann. Statist., Volume 43, Number 3 (2015), 1243-1272.

Dates
Received: October 2014
First available in Project Euclid: 15 May 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1431695643

Digital Object Identifier
doi:10.1214/14-AOS1308

Mathematical Reviews number (MathSciNet)
MR3346702

Zentralblatt MATH identifier
1328.62383

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 62F05: Asymptotic properties of tests 62J12: Generalized linear models

Keywords
Classification dimension reduction discriminant analysis interaction screening sparsity sure screening property

Citation

Fan, Yingying; Kong, Yinfei; Li, Daoji; Zheng, Zemin. Innovated interaction screening for high-dimensional nonlinear classification. Ann. Statist. 43 (2015), no. 3, 1243--1272. doi:10.1214/14-AOS1308. https://projecteuclid.org/euclid.aos/1431695643


Export citation

References

  • [1] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [2] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [3] Bunea, F. (2008). Honest variable selection in linear and logistic regression models via $\ell_{1}$ and ${\ell}_{1}+{\ell}_{2}$ penalization. Electron. J. Stat. 2 1153–1194.
  • [4] Cai, T. and Liu, W. (2011). A direct estimation approach to sparse linear discriminant analysis. J. Amer. Statist. Assoc. 106 1566–1577.
  • [5] Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
  • [6] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • [7] Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
  • [8] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [9] Fan, Y., Jin, J. and Yao, Z. (2013). Optimal classification in sparse Gaussian graphic model. Ann. Statist. 41 2537–2571.
  • [10] Fan, Y., Kong, Y., Li, D. and Zheng, Z. (2015). Supplement to “Innovated interaction screening for high-dimensional nonlinear classification.” DOI:10.1214/14-AOS1308SUPP.
  • [11] Fan, Y. and Lv, J. (2013). Asymptotic equivalence of regularization methods in thresholded parameter space. J. Amer. Statist. Assoc. 108 1044–1061.
  • [12] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • [13] Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
  • [14] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • [15] Hebiri, M. and van de Geer, S. (2011). The Smooth-Lasso and other $\ell_{1}+\ell_{2}$-penalized methods. Electron. J. Stat. 5 1184–1226.
  • [16] Jiang, B. and Liu, J. S. (2014). Variable selection for general index models via sliced inverse regression. Ann. Statist. 42 1751–1786.
  • [17] Jin, J. (2012). Comment: “Estimating false discovery proportion under arbitrary covariance dependence” [MR3010887]. J. Amer. Statist. Assoc. 107 1042–1045.
  • [18] Kooperberg, C., LeBlanc, M., Dai, J. Y. and Rajapakse, I. (2009). Structures and assumptions: Strategies to harness gene $\times$ gene and gene $\times$ environment interactions in GWAS. Statist. Sci. 24 472–488.
  • [19] Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37 3498–3528.
  • [20] Mai, Q., Zou, H. and Yuan, M. (2012). A direct approach to sparse discriminant analysis in ultra-high dimensions. Biometrika 99 29–42.
  • [21] Pan, W., Basu, S. and Shen, X. (2011). Adaptive tests for detecting gene–gene and gene–environment interactions. Human Heredity 72 98–109.
  • [22] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • [23] Shao, J., Wang, Y., Deng, X. and Wang, S. (2011). Sparse linear discriminant analysis by thresholding for high dimensional data. Ann. Statist. 39 1241–1265.
  • [24] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • [25] Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99 6567–6572.
  • [26] van’t Veer, L. J., Dai, H., Van De Vijver, M. J., He, Y. D., Hart, A. A., Mao, M., Peterse, H. L., van der Kooy, K., Marton, M. J., Witteveen, A. T., Schreiber, G. J., Kerkhoven, R. M., Roberts, C., Linsley, P. S., Bernards, R. and Friend, S. H. (2002). Gene expression profiling predicts clinical outcome of breast cancer. Nature 415 530–536.
  • [27] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
  • [28] Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11 2261–2286.
  • [29] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • [30] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • [31] Zhang, T. and Zou, H. (2014). Sparse precision matrix estimation via lasso penalized D-trace loss. Biometrika 101 103–120.
  • [32] Zhu, J. and Hastie, T. (2004). Classification of gene microarrays by penalized logistic regression. Biostatistics 5 427–443.
  • [33] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.

Supplemental materials