The Annals of Applied Statistics

A testing based approach to the discovery of differentially correlated variable sets

Kelly Bodwin, Kai Zhang, and Andrew Nobel

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Given data obtained under two sampling conditions, it is often of interest to identify variables that behave differently in one condition than in the other. We introduce a method for differential analysis of second-order behavior called Differential Correlation Mining (DCM). The DCM method identifies differentially correlated sets of variables, with the property that the average pairwise correlation between variables in a set is higher under one sample condition than the other. DCM is based on an iterative search procedure that adaptively updates the size and elements of a candidate variable set. Updates are performed via hypothesis testing of individual variables, based on the asymptotic distribution of their average differential correlation. We investigate the performance of DCM by applying it to simulated data as well as to recent experimental datasets in genomics and brain imaging.

Article information

Source
Ann. Appl. Stat., Volume 12, Number 2 (2018), 1180-1203.

Dates
Received: February 2016
Revised: March 2017
First available in Project Euclid: 28 July 2018

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1532743490

Digital Object Identifier
doi:10.1214/17-AOAS1083

Mathematical Reviews number (MathSciNet)
MR3834299

Keywords
Differential correlation mining association mining biostatistics genomics high-dimensional data

Citation

Bodwin, Kelly; Zhang, Kai; Nobel, Andrew. A testing based approach to the discovery of differentially correlated variable sets. Ann. Appl. Stat. 12 (2018), no. 2, 1180--1203. doi:10.1214/17-AOAS1083. https://projecteuclid.org/euclid.aoas/1532743490


Export citation

References

  • Anderson, T. W. (1959). An Introduction to Multivariate Statistical Analysis. Wiley, New York.
  • Bassi, F. and Hero, A. (2012). Large scale correlation detection. In Proc. of the IEEE International Symposium on Information Theory 2591–2595.
  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Springer, New York.
  • Bockmayr, M., Klauschen, F., Györffy, B., Denkert, C. and Budczies, J. (2013). New network topology approaches reveal differential correlation patterns in breast cancer. BMC Syst. Biol. 7 78.
  • Bodwin, K., Zhang, K. and Nobel, A. (2018). Supplement to “A testing based approach to the discovery of differentially correlated variable sets.” DOI:10.1214/17-AOAS1083SUPP.
  • Browne, M. W. and Shapiro, A. (1986). The asymptotic covariance matrix of sample correlation coefficients under general conditions. Linear Algebra Appl. 82 169–176.
  • Cai, T. T. and Jiang, T. (2011). Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Statist. 39 1496–1525.
  • Cai, T., Liu, W. and Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Amer. Statist. Assoc. 108 265–277.
  • Cai, T. T. and Zhang, A. (2014). Inference on high-dimensional differential correlation matrix. Technical report.
  • Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
  • Choi, Y. and Kendziorski, C. (2009). Statistical methods for gene set co-expression analysis. Bioinformatics 25 2780–2786.
  • Cui, X. and Churchill, G. A. (2003). Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 4 210.
  • Datta, S. and Datta, S. (2002). Comparisons and validation of statistical clustering techniques for microarray gene expression data. Bioinformatics 19.
  • Derado, G., Bowman, F. D. and Kilts, C. D. (2010). Modeling the spatial and temporal dependence in fMRI data. Biometrics 66 949–957.
  • Donner, A. and Zou, G. (2014). Testing the equality of dependent intraclass correlation coefficients. J. R. Stat. Soc., D 51 367–379.
  • Friguet, C., Kloareg, M. and Causeur, D. (2009). A factor model approach to multiple testing under dependence. J. Amer. Statist. Assoc. 104 1406–1415.
  • Fukushima, A. (2013). DiffCorr: An R package to analyze and visualize differential correlations in biological networks. Gene 518 209–214.
  • Gill, R., Datta, S. and Datta, S. (2010). A statistical framework for differential network analysis from microarray data. BMC Bioinform. 11 95.
  • Greicius, M. D., Krasnow, B., Reiss, A. L. and Menon, V. (2002). Functional connectivity in the resting brain: A network analysis of the default mode hypothesis. Proc. Natl. Acad. Sci. USA 100 253–258.
  • Harman, H. H. (1960). Modern Factor Analysis. Univ. Chicago Press, Chicago, Ill.
  • Hu, R., Qiu, X. and Glazko, G. (2010). A new gene selection procedure based on the covariance distance. Bioinformatics 26 348–354.
  • Iglesia, M. D., Vincent, B. G., Parker, J. S., Hoadley, K. A., Carey, L. A., Perou, C. M. and Serody, J. S. (2014). Prognostic B-cell signatures using mRNA-Seq in patients with subtype-specific breast and ovarian cancer. Clin. Cancer Res. 20 3818–3829.
  • Jiang, D., Tang, C. and Zhang, A. (2004). Cluster analysis for gene expression data: A survey. IEEE Trans. Knowl. Data Eng. 16 1370–1386.
  • Kriegel, H.-P., Kröger, P. and Zimek, A. (2009). Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans. Knowl. Discov. Data 3.
  • Langfelder, P. and Horvath, S. (2008). WGCNA: An R package for weighted correlation network analysis. BMC Bioinform. 9 559.
  • Lewis, K., Kaufman, J., Gonzalez, M., Wimmer, A. and Christakis, N. (2008). Tastes, ties, and time: A new social network dataset using Facebook.com. Soc. Netw. 30 330–342.
  • Liu, B.-H., Yu, H., Tu, K., Li, C., Li, Y.-X. and Li, Y.-Y. (2010). DCGL: An R package for identifying differentially coexpressed genes and links from gene expression microarray data. Bioinformatics 26 2637–2638.
  • MacMahon, M. and Garlaschelli, D. (2015). Community detection for correlation matrices. Phys. Rev. X 5 021006.
  • Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.
  • Peng, J., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746.
  • Perou, C. M., Sørlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H., Akslen, L. A., Fluge, O., Pergamenschikov, A., Williams, C., Zhu, S. X., Lonning, P. E., Borresen-Dale, A.-L., Brown, P. O. and Botstein, D. (2000). Molecular portraits of human breast tumours. Nature 406 747–752.
  • Phan, K. L., Wager, T., Taylor, S. F. and Liberzon, I. (2002). Functional neuroanatomy of emotion: A meta-analysis of emotion activation studies in PET and fMRI. NeuroImage 16 331–348.
  • Rajaratnam, B., Massam, H. and Carvalho, C. M. (2008). Flexible covariance estimation in graphical Gaussian models. Ann. Statist. 36 2818–2849.
  • Sheng, E., Witten, D. and Zhou, X.-H. (2016). Hypothesis testing for differentially correlated features. Biostatistics 17 677–691.
  • Soneson, C. and Delorenzi, M. (2013). A comparison of methods for differential expression analysis of RNA-seq data. BMC Bioinform. 14 91.
  • Stamatatos, E. (2009). A comparison of methods for differential expression analysis of RNA-seq data. J. Am. Soc. Inf. Sci. Technol. 60 538–556.
  • Steiger, J. H. (1980). Tests for comparing elements of a correlation matrix. Psychol. Bull. 87 245–251.
  • Steiger, J. H. and Hakstian, A. R. (1982). The asymptotic distribution of elements of a correlation matrix: Theory and application. Br. J. Math. Stat. Psychol. 35 208–215.
  • Tesson, B. M., Breitling, R. and Jansen, R. C. (2010). DiffCoEx: A simple and sensitive method to find differentially coexpressed gene modules. BMC Bioinform. 11 497.
  • Voets, N. L., Adcock, J. E., Flitney, D. E., Behrens, T. E., Hart, Y., Stacey, R., Carpenter, K. and Matthews, P. M. (2006). Distinct right frontal lobe activation in language processing following left hemisphere injury. Brain 129 754–766.
  • Wainer, H. and Braun, H. I. (2013). Test Validity. Routledge, London.
  • Wang, J., Fan, L., Wang, Y., Xu, W., Jiang, T., Fox, P. T., Eickhoff, S. B., Yu, C. and Jiang, T. (2015). Determination of the posterior boundary of Wernicke’s area based on multimodal connectivity profiles. Hum. Brain Mapp. 36 1908–1924.
  • Wilson, J. D., Wang, S., Mucha, P. J., Bhamidi, S. and Nobel, A. B. (2014). A testing based extraction algorithm for identifying significant communities in networks. Ann. Appl. Stat. 8 1853–1891.
  • Xia, Y., Cai, T. and Cai, T. T. (2015). Testing differential networks with applications to the detection of gene–gene interactions. Biometrika 102 247–266.
  • Zhou, C., Han, F., Zhang, X. and Liu, H. (2015). An extreme-value approach for testing the equality of large U-statistic based correlation matrices. Available at arXiv:1502.03211.

Supplemental materials

  • Differential correlation mining: Supplementary material. We provide the proof of Corollary 1.1, the derivation of the variance estimator, additional simulation results, extended real data results, and pseudocode for the algorithmic procedures.