## The Annals of Statistics

### Structural similarity and difference testing on multiple sparse Gaussian graphical models

Weidong Liu

#### Abstract

We present a new framework on inferring structural similarities and differences among multiple high-dimensional Gaussian graphical models (GGMs) corresponding to the same set of variables under distinct experimental conditions. The new framework adopts the partial correlation coefficients to characterize the potential changes of dependency strengths between two variables. A hierarchical method has been further developed to recover edges with different or similar dependency strengths across multiple GGMs. In particular, we first construct two-sample test statistics for testing the equality of partial correlation coefficients and conduct large-scale multiple tests to estimate the substructure of differential dependencies. After removing differential substructure from original GGMs, a follow-up multiple testing procedure is used to detect the substructure of similar dependencies among GGMs. In each step, false discovery rate is controlled asymptotically at a desired level. Power results are proved, which demonstrate that our method is more powerful on finding common edges than the common approach that separately estimates GGMs. The performance of the proposed hierarchical method is illustrated on simulated datasets.

#### Article information

Source
Ann. Statist., Volume 45, Number 6 (2017), 2680-2707.

Dates
Revised: January 2017
First available in Project Euclid: 15 December 2017

https://projecteuclid.org/euclid.aos/1513328587

Digital Object Identifier
doi:10.1214/17-AOS1539

Mathematical Reviews number (MathSciNet)
MR3737906

Zentralblatt MATH identifier
06838147

Subjects
Primary: 62H12: Estimation 62H15: Hypothesis testing

#### Citation

Liu, Weidong. Structural similarity and difference testing on multiple sparse Gaussian graphical models. Ann. Statist. 45 (2017), no. 6, 2680--2707. doi:10.1214/17-AOS1539. https://projecteuclid.org/euclid.aos/1513328587

#### References

• Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ.
• Belilovsky, E., Varoquaux, G. and Blaschko, M. B. (2015). Hypothesis testing for differences in Gaussian graphical models: Applications to brain connectivity Technical report. http://arxiv.org/abs/1512.08643.
• Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289–300.
• Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• Candès, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• Chiquet, J., Grandvalet, Y. and Ambroise, C. (2011). Inferring multiple graphical structures. Stat. Comput. 21 537–553.
• Chu, J., Lazarus, R., Carey, V. J. and Raby, B. A. (2011). Quantifying differential gene connectivity between disease states for objective identification of disease-relevant genes. BMC Syst. Biol. 5 89.
• Cox, D. R. and Wermuth, N. (1996). Multivariate Dependencies: Models, Analysis and Interpretation. Monographs on Statistics and Applied Probability 67. Chapman & Hall, London.
• d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56–66.
• Danaher, P., Wang, P. and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 373–397.
• de la Fuente, A. (2010). From “differential expression” to “differential networking”-identification of dysfunctional regulatory networks in diseases. Trends Genet. 26 326–333.
• Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93–103.
• Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive lasso and SCAD penalties. Ann. Appl. Stat. 3 521–541.
• Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostat. 9 432–441.
• Gill, R., Datta, S. and Datta, S. (2010). A statistical framework for differential network analysis from microarray data. BMC Bioinformatics 11 95.
• Gu, Q., Cao, Y., Ning, Y. and Liu, H. (2015). Local and global inference for high dimensional nonparanormal graphical models Technical report. http://arxiv.org/abs/1502.02347.
• Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika 98 1–15.
• Hara, S. and Washio, T. (2013). Learning a common substructure of multiple graphical Gaussian models. Neural Netw. 38 23–38.
• Honorio, J. and Samaras, D. (2010). Multi-task learning of Gaussian graphical models. In Proceedings of the 27th International Conference on Machine Learning. Haifa, Israel.
• Li, H. and Gui, J. (2006). Gradient directed regularization for sparse Gaussian concentration graphs, with applications to inference of genetic networks. Biostat. 7 302–317.
• Liu, W. (2013). Gaussian graphical model estimation with false discovery rate control. Ann. Statist. 41 2948–2978.
• Liu, W. (2016). Supplement to “Structural similarity and difference testing on multiple sparse Gaussian graphical models”. DOI:10.1214/17-AOS1539SUPP.
• Liu, W. and Shao, Q. M. (2014). Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control. Ann. Statist. 42 2003–2025.
• Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40 2293–2326.
• Ma, S., Gong, Q. and Bohnert, H. J. (2007). An arabidopsis gene network based on the graphical Gaussian model. Genome Res. 17 1614–1625.
• Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• Negahban, S. N. and Wainwright, M. J. (2011). Simultaneous support recovery in high dimensions: Benefits and perils of block $\ell_{1}/\ell_{\infty}$-regularization. IEEE Trans. Inform. Theory 57 3841–3863.
• Obozinski, G., Wainwright, M. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
• Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_{1}$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
• Ren, Z., Sun, T., Zhang, C.-H. and Zhou, H. H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43 991–1026.
• Rothman, A., Bickel, P., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
• Schäfer, J. and Strimmer, K. (2005). An empirical Bayes approach to inferring large-scale gene association networks. Bioinformatics 21 754–764.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
• Xue, L. and Zou, H. (2012). Regularized rank-based estimation of high-dimensional nonparanormal graphical models. Ann. Statist. 40 2541–2571.
• Yang, S., Lu, Z., Shen, X., Wonka, P. and Ye, J. (2015). Fused multiple graphical lasso. SIAM J. Optim. 25 916–943.
• Yuan, M. (2010). High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11 2261–2286.
• Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
• Zhang, C. (2010). Estimation of large inverse matrices and graphical model selection. Technical Report. Rutgers University, Department of Statistics and Biostatistics.
• Zhang, B. and Wang, Y. (2010). Learning structural changes of Gaussian graphical models in controlled experiments. In Proc. of the Twenty-Sixth Conference on Uncertainty in Artificial Intelligence (UAI 2010) 701–708, Avalon, CA.
• Zhao, S. D., Cai, T. T. and Li, H. (2014). Direct estimation of differential networks. Biometrika 101 253–268.
• Zhu, D., Hero, A. O., Qin, Z. S. and Swaroop, A. (2005). High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS). J. Comput. Biol. 12 1029–1045.
• Zolotarev, M., (1961). Concerning a certain probability problem. Theory Probab. Appl. 6 201–204.

#### Supplemental materials

• Structural similarity and difference testing on multiple sparse Gaussian graphical models. The supplementary material includes the proofs of Proposition 3.1, Theorems 3.3–3.5 and Lemmas 6.1–6.3. Also, a part of numerical results in Section 4 are included.