## The Annals of Applied Statistics

### Bayesian inference for multiple Gaussian graphical models with application to metabolic association networks

#### Abstract

We investigate the effect of cadmium (a toxic environmental pollutant) on the correlation structure of a number of urinary metabolites using Gaussian graphical models (GGMs). The inferred metabolic associations can provide important information on the physiological state of a metabolic system and insights on complex metabolic relationships. Using the fitted GGMs, we construct differential networks, which highlight significant changes in metabolite interactions under different experimental conditions. The analysis of such metabolic association networks can reveal differences in the underlying biological reactions caused by cadmium exposure. We consider Bayesian inference and propose using the multiplicative (or Chung–Lu random graph) model as a prior on the graphical space. In the multiplicative model, each edge is chosen independently with probability equal to the product of the connectivities of the end nodes. This class of prior is parsimonious yet highly flexible; it can be used to encourage sparsity or graphs with a pre-specified degree distribution when such prior knowledge is available. We extend the multiplicative model to multiple GGMs linking the probability of edge inclusion through logistic regression and demonstrate how this leads to joint inference for multiple GGMs. A sequential Monte Carlo (SMC) algorithm is developed for estimating the posterior distribution of the graphs.

#### Article information

Source
Ann. Appl. Stat., Volume 11, Number 4 (2017), 2222-2251.

Dates
Revised: June 2017
First available in Project Euclid: 28 December 2017

https://projecteuclid.org/euclid.aoas/1514430284

Digital Object Identifier
doi:10.1214/17-AOAS1076

Mathematical Reviews number (MathSciNet)
MR3743295

Zentralblatt MATH identifier
1383.62294

#### Citation

Tan, Linda S. L.; Jasra, Ajay; De Iorio, Maria; Ebbels, Timothy M. D. Bayesian inference for multiple Gaussian graphical models with application to metabolic association networks. Ann. Appl. Stat. 11 (2017), no. 4, 2222--2251. doi:10.1214/17-AOAS1076. https://projecteuclid.org/euclid.aoas/1514430284

#### References

• Armstrong, H., Carter, C. K., Wong, K. F. K. and Kohn, R. (2009). Bayesian covariance matrix estimation using a mixture of decomposable graphical models. Stat. Comput. 19 303–316.
• Atay-Kayis, A. and Massam, H. (2005). A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models. Biometrika 92 317–335.
• Beskos, A., Jasra, A., Kantas, N. and Thiery, A. (2016). On the convergence of adaptive sequential Monte Carlo methods. Ann. Appl. Probab. 26 1111–1146.
• Carvalho, C. M. and Scott, J. G. (2009). Objective Bayesian model selection in Gaussian graphical models. Biometrika 96 497–512.
• Carvalho, C. M. and West, M. (2007). Dynamic matrix-variate graphical models. Bayesian Anal. 2 69–97.
• Chun, H., Zhang, X. and Zhao, H. (2015). Gene regulation network inference with joint sparse Gaussian graphical models. J. Comput. Graph. Statist. 24 954–974.
• Chung, F. and Lu, L. (2002). The average distances in random graphs with given expected degrees. Proc. Natl. Acad. Sci. USA 99 15879–15882.
• D’Souza, R. M., Borgs, C., Chayes, J. T., Berger, N. and Kleinberg, R. D. (2007). Emergence of tempered preferential attachment from optimization. Proc. Natl. Acad. Sci. USA 104 6112–6117.
• Danaher, P., Wang, P. and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 373–397.
• Del Moral, P., Doucet, A. and Jasra, A. (2006). Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 411–436.
• Del Moral, P., Doucet, A. and Jasra, A. (2012). An adaptive sequential Monte Carlo method for approximate Bayesian computation. Stat. Comput. 22 1009–1020.
• Dempster, A. P. (1972). Covariance selection. Biometrics 157–175.
• Diaconis, P. and Ylvisaker, D. (1979). Conjugate priors for exponential families. Ann. Statist. 7 269–281.
• Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196–212.
• Ellis, J. K., Athersuch, T. J., Thomas, L. D., Teichert, F., Perez-Trujillo, M., Svendsen, C., Spurgeon, D. J., Singh, R., Jarup, L., Bundy, J. G. and Keun, H. C. (2012). Metabolic profiling detects early effects of environmental and lifestyle exposure to cadmium in a human population. BMC Medicine 10 61.
• Fenner, T., Levene, M. and Loizou, G. (2007). A model for collaboration networks giving rise to a power-law distribution with an exponential cutoff. Soc. Netw. 29 70–80.
• Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
• Giot, L., Bader, J. S., Brouwer, C. et al. (2003). A protein interaction map of Drosophila melanogaster. Science 302 1727–1736.
• Giraud, C., Huet, S. and Verzelen, N. (2012). Graph selection with GGMselect. Stat. Appl. Genet. Mol. Biol. 11 Art. 3, 52.
• Guo, J., Levina, E., Michailidis, G. and Zhu, J. (2011). Joint estimation of multiple graphical models. Biometrika 98 1–15.
• Jasra, A., Stephens, D. A., Doucet, A. and Tsagaris, T. (2011). Inference for Lévy driven stochastic volatility models via adaptive sequential Monte Carlo. Scand. J. Stat. 38 1–22.
• Jeong, H., Mason, S. P., Barabasi, A. L. and Oltvai, Z. N. (2001). Lethality and centrality in protein networks. Nature 411 41–42.
• Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C. and West, M. (2005). Experiments in stochastic computation for high-dimensional graphical models. Statist. Sci. 20 388–400.
• Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
• Lenkoski, A. and Dobra, A. (2011). Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior. J. Comput. Graph. Statist. 20 140–157.
• Liu, Y., Li, Y., Liu, K. and Shen, J. (2014). Exposing to cadmium stress cause profound toxic effect on microbiota of the mice intestinal tract. PLoS ONE 9 e85323.
• Mitra, R., Müller, P. and Ji, Y. (2016). Bayesian graphical models for differential pathways. Bayesian Anal. 11 99–124.
• Mohan, K., London, P., Fazel, M., Witten, D. and Lee, S.-I. (2014). Node-based learning of multiple Gaussian graphical models. J. Mach. Learn. Res. 15 445–488.
• Murray, I., Ghahramani, Z. and MacKay, D. J. C. (2006). MCMC for doubly-intractable distributions. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (T. Decther and T. Richardson, eds.) 359–366.
• Newman, M. E. J. (2001). The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA 98 404–409.
• Newman, M. E. J., Strogatz, S. H. and Watts, D. J. (2001). Random graphs with arbitrary degree distributions and their applications. Phys. Rev. E (3) 64.
• Olhede, S. C. and Wolfe, P. J. (2013). Degree-based network models. Available at arXiv:1211.6537.
• Peterson, C., Stingo, F. C. and Vannucci, M. (2015). Bayesian inference of multiple Gaussian graphical models. J. Amer. Statist. Assoc. 110 159–174.
• Rastelli, R., Friel, N. and Raftery, A. E. (2015). Properties of latent variable network models. Available at arXiv:1506.07806.
• Roverato, A. (2002). Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Stat. 29 391–411.
• Salamanca, B. V., Ebbels, T. M. D. and De Iorio, M. (2014). Variance and covariance heterogeneity analysis for detection of metabolites associated with cadmium exposure. Stat. Appl. Genet. Mol. Biol. 13 191–201.
• Schaefer, J., Opgen-Rhein, R. and Strimmer, K. (2015). R package: GeneNet version 1.2.13. Available at https://cran.r-project.org/web/packages/GeneNet/index.html.
• Schäfer, C. and Chopin, N. (2013). Sequential Monte Carlo on large binary sampling spaces. Stat. Comput. 23 163–184.
• Steuer, R. (2006). Review: On the analysis and interpretation of correlations in metabolomic data. Brief. Bioinform. 7 151–158.
• Tan, L. S., Jasra, A., De Iorio, M. and Ebbels, T. M. (2017). Supplement to “Bayesian inference for multiple Gaussian graphical models with application to metabolic association networks.” DOI:10.1214/17-AOAS1076SUPP.
• Telesca, D., Müller, P., Parmigiani, G. and Freedman, R. S. (2012). Modeling dependent gene expression. Ann. Appl. Stat. 6 542–560.
• Valcárcel, B., Würtz, P., Seich al Basatena, N.-K., Tukiainen, T., Kangas, A. J., Soininen, P., Järvelin, M.-R., Ala-Korpela, M., Ebbels, T. M. and de Iorio, M. (2011). A differential network approach to exploring differences between biological states: An application to prediabetes. PLoS ONE 6 e24702.
• Wang, H. and Li, S. Z. (2012). Efficient Gaussian graphical model determination under $G$-Wishart prior distributions. Electron. J. Stat. 6 168–198.
• Wang, H., Reeson, C. and Carvalho, C. M. (2011). Dynamic financial index models: Modeling conditional dependencies via graphs. Bayesian Anal. 6 639–663.
• Yajima, M., Telesca, D., Ji, Y. and Müller, P. (2015). Detecting differential patterns of interaction in molecular pathways. Biostatistics 16 240–251.

#### Supplemental materials

• Supplement to “Bayesian inference for multiple Gaussian graphical models with application to metabolic association networks”. We provide additional material to support the results in this paper. This include Matlab code, further discussions, detailed derivations and further results on the application to urinary metabolic data.