The Annals of Applied Statistics

Testing high-dimensional covariance matrices, with application to detecting schizophrenia risk genes

Lingxue Zhu, Jing Lei, Bernie Devlin, and Kathryn Roeder

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Scientists routinely compare gene expression levels in cases versus controls in part to determine genes associated with a disease. Similarly, detecting case-control differences in co-expression among genes can be critical to understanding complex human diseases; however, statistical methods have been limited by the high-dimensional nature of this problem. In this paper, we construct a sparse-Leading-Eigenvalue-Driven (sLED) test for comparing two high-dimensional covariance matrices. By focusing on the spectrum of the differential matrix, sLED provides a novel perspective that accommodates what we assume to be common, namely sparse and weak signals in gene expression data, and it is closely related with sparse principal component analysis. We prove that sLED achieves full power asymptotically under mild assumptions, and simulation studies verify that it outperforms other existing procedures under many biologically plausible scenarios. Applying sLED to the largest gene-expression dataset obtained from post-mortem brain tissue from Schizophrenia patients and controls, we provide a novel list of genes implicated in Schizophrenia and reveal intriguing patterns in gene co-expression change for Schizophrenia subjects. We also illustrate that sLED can be generalized to compare other gene-gene “relationship” matrices that are of practical interest, such as the weighted adjacency matrices.

Article information

Ann. Appl. Stat. Volume 11, Number 3 (2017), 1810-1831.

Received: November 2016
Revised: April 2017
First available in Project Euclid: 5 October 2017

Permanent link to this document

Digital Object Identifier

Permutation test high-dimensional data covariance matrix sparse principal component analysis


Zhu, Lingxue; Lei, Jing; Devlin, Bernie; Roeder, Kathryn. Testing high-dimensional covariance matrices, with application to detecting schizophrenia risk genes. Ann. Appl. Stat. 11 (2017), no. 3, 1810--1831. doi:10.1214/17-AOAS1062.

Export citation


  • Anderson, T. W. (1958). An Introduction to Multivariate Statistical Analysis. Wiley, New York.
  • Bai, Z., Jiang, D., Yao, J.-F. and Zheng, S. (2009). Corrections to LRT on large-dimensional covariance matrix by RMT. Ann. Statist. 37 3822–3840.
  • Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
  • Cai, T., Liu, W. and Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Amer. Statist. Assoc. 108 265–277.
  • Cai, T. T. and Zhang, A. (2016). Inference for high-dimensional differential correlation matrices. J. Multivariate Anal. 143 107–126.
  • Chang, J., Zhou, W., Zhou, W.-X. and Wang, L. (2016). Comparing large covariance matrices under weak conditions on the dependence structure and its application to gene clustering. Preprint. Available at arXiv:1505.04493v3.
  • Chen, E. Y., Tan, C. M., Kou, Y., Duan, Q., Wang, Z., Meirelles, G. V., Clark, N. R. and Ma’ayan, A. (2013). Enrichr: Interactive and collaborative HTML5 gene list enrichment analysis tool. BMC Bioinform. 14 128.
  • d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448.
  • Diniz, L. P., Almeida, J. C., Tortelli, V., Vargas Lopes, C., Setti-Perdigão, P., Stipursky, J., Kahn, S. A., Romão, L. F., de Miranda, J., Alves-Leon, S. V., de Souza, J. M., Castro, N. G., Panizzutti, R. and Gomes, F. C. A. (2012). Astrocyte-induced synaptogenesis is mediated by transforming growth factor $\beta$ signaling through modulation of D-serine levels in cerebral cortex neurons. J. Biol. Chem. 287 41432–41445.
  • Fromer, M., Roussos, P., Sieberts, S. K., Johnson, J. S., Kavanagh, D. H., Perumal, T. M., Ruderfer, D. M., Oh, E. C., Topol, A., Shah, H. R., Klei, L. L., Kramer, R., Pinto, D., Gümüş, Z. H., Cicek, A. E., Dang, K. K., Browne, A., Lu, C., Xie, L., Readhead, B., Stahl, E. A., Xiao, J., Parvizi, M., Hamamsy, T., Fullard, J. F., Wang, Y.-C., Mahajan, M. C., Derry, J. M. J., Dudley, J. T., Hemby, S. E., Logsdon, B. A., Talbot, K., Raj, T., Bennett, D. A., De Jager, P. L., Zhu, J., Zhang, B., Sullivan, P. F., Chess, A., Purcell, S. M., Shinobu, L. A., Mangravite, L. M., Toyoshiba, H., Gur, R. E., Hahn, C.-G., Lewis, D. A., Haroutunian, V., Peters, M. A., Lipska, B. K., Buxbaum, J. D., Schadt, E. E., Hirai, K., Roeder, K., Brennand, K. J., Katsanis, N., Domenici, E., Devlin, B. and Sklar, P. (2016). Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19 1442–1453.
  • Gu, X., Li, A., Liu, S., Lin, L., Xu, S., Zhang, P., Li, S., Li, X., Tian, B., Zhu, X. and Wang, X. (2015). MicroRNA124 regulated neurite elongation by targeting OSBP. Mol. Neurobiol. 53 6388–6396.
  • Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
  • Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547.
  • Li, J. and Chen, S. X. (2012). Two sample tests for high-dimensional covariance matrices. Ann. Statist. 40 908–940.
  • McGrath, J., Saha, S., Chant, D. and Welham, J. (2008). Schizophrenia: A concise overview of incidence, prevalence, and mortality. Epidemiol. Rev. 30 67–76.
  • Owen, M. J., Sawa, A. and Mortensen, P. B. (2016). Schizophrenia. Lancet 388 86–97.
  • Oymak, S., Jalali, A., Fazel, M., Eldar, Y. C. and Hassibi, B. (2015). Simultaneously structured models with application to sparse and low-rank matrices. IEEE Trans. Inform. Theory 61 2886–2908.
  • International Schizophrenia Consortium, Purcell, S. M., Wray, N. R., Stone, J. L., Visscher, P. M., O’Donovan, M. C., Sullivan, P. F. and Sklar, P. (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460 748–752.
  • Purcell, S. M., Moran, J. L., Fromer, M., Ruderfer, D., Solovieff, N., Roussos, P., O’Dushlaine, C., Chambert, K., Bergen, S. E., Kähler, A., Duncan, L., Stahl, E., Genovese, G., Fernández, E., Collins, M. O., Komiyama, N. H., Choudhary, J. S., Magnusson, P. K. E., Banks, E., Shakir, K., Garimella, K., Fennell, T., DePristo, M., Grant, S. G. N., Haggarty, S. J., Gabriel, S., Scolnick, E. M., Lander, E. S., Hultman, C. M., Sullivan, P. F., McCarroll, S. A. and Sklar, P. (2014). A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506 185–90.
  • Schizophrenia Working Group of the Psychiatric Genomics Consortium (2014). Biological insights from 108 schizophrenia-associated genetic loci. Nature 511 421–427.
  • Schott, J. R. (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput. Statist. Data Anal. 51 6535–6542.
  • Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015–1034.
  • Srivastava, M. S. and Yanagihara, H. (2010). Testing the equality of several covariance matrices with fewer observations than the dimension. J. Multivariate Anal. 101 1319–1329.
  • Székely, G. J. and Rizzo, M. L. (2013). Energy statistics: A class of statistics based on distances. J. Statist. Plann. Inference 143 1249–1272.
  • Vu, V. Q., Cho, J., Lei, J. and Rohe, K. (2013). Fantope projection and selection: A near-optimal convex relaxation of sparse PCA. In Advances in Neural Information Processing Systems 26 2670–2678.
  • Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
  • Wu, T.-L. and Li, P. (2015). Tests for high-dimensional covariance matrices using random matrix projection. Preprint. Available at arXiv:1511.01611.
  • Yu, C.-Y., Gui, W., He, H.-Y., Wang, X.-S., Zuo, J., Huang, L., Zhou, N., Wang, K. and Wang, Y. (2014). Neuronal and astroglial TGF${\beta}$-Smad3 signaling pathways differentially regulate dendrite growth and synaptogenesis. Neuromolecular Med. 16 457–72.
  • Zhang, B. and Horvath, S. (2005). A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4 Art. 17, 45.
  • Zhu, L., Lei, J., Devlin, B. and Roeder, K. (2017). Supplement to “Testing high-dimensional covariance matrices, with application to detecting schizophrenia risk genes.” DOI:10.1214/17-AOAS1062SUPP.
  • Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.

Supplemental materials

  • Supplement to “Testing high dimensional covariance matrices, with application to detecting schizophrenia risk genes”. This supplement provides additional simulation results as well as proofs for the theorems.