Bayesian Analysis
- Bayesian Anal.
- Volume 10, Number 1 (2015), 109-138.
Bayesian Structure Learning in Sparse Gaussian Graphical Models
Full-text: Open access
Abstract
Decoding complex relationships among large numbers of variables with relatively few observations is one of the crucial issues in science. One approach to this problem is Gaussian graphical modeling, which describes conditional independence of variables through the presence or absence of edges in the underlying graph. In this paper, we introduce a novel and efficient Bayesian framework for Gaussian graphical model determination which is a trans-dimensional Markov Chain Monte Carlo (MCMC) approach based on a continuous-time birth-death process. We cover the theory and computational details of the method. It is easy to implement and computationally feasible for high-dimensional graphs. We show our method outperforms alternative Bayesian approaches in terms of convergence, mixing in the graph space and computing time. Unlike frequentist approaches, it gives a principled and, in practice, sensible approach for structure learning. We illustrate the efficiency of the method on a broad range of simulated data. We then apply the method on large-scale real applications from human and mammary gland gene expression studies to show its empirical usefulness. In addition, we implemented the method in the R package BDgraph which is freely available at http://CRAN.R-project.org/package=BDgraph.
Article information
Source
Bayesian Anal., Volume 10, Number 1 (2015), 109-138.
Dates
First available in Project Euclid: 28 January 2015
Permanent link to this document
https://projecteuclid.org/euclid.ba/1422468425
Digital Object Identifier
doi:10.1214/14-BA889
Mathematical Reviews number (MathSciNet)
MR3420899
Zentralblatt MATH identifier
1335.62056
Keywords
Bayesian model selection Sparse Gaussian graphical models Non-decomposable graphs Birth-death process Markov chain Monte Carlo G-Wishart
Citation
Mohammadi, A.; Wit, E. C. Bayesian Structure Learning in Sparse Gaussian Graphical Models. Bayesian Anal. 10 (2015), no. 1, 109--138. doi:10.1214/14-BA889. https://projecteuclid.org/euclid.ba/1422468425
References
- Abegaz, F. and Wit, E. (2013). “Sparse time series chain graphical models for reconstructing genetic networks.” Biostatistics, 14(3): 586–599.
- Albert, R. and Barabási, A.-L. (2002). “Statistical mechanics of complex networks.” Reviews of modern physics, 74(1): 47.Mathematical Reviews (MathSciNet): MR1895096
Digital Object Identifier: doi:10.1103/RevModPhys.74.47 - Atay-Kayis, A. and Massam, H. (2005). “A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models.” Biometrika, 92(2): 317–335.
- Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A., and Nielsen, H. (2000). “Assessing the accuracy of prediction algorithms for classification: an overview.” Bioinformatics, 16(5): 412–424.Mathematical Reviews (MathSciNet): MR1849633
- Bhadra, A. and Mallick, B. K. (2013). “Joint High-Dimensional Bayesian Variable and Covariance Selection with an Application to eQTL Analysis.” Biometrics, 69(2): 447–457.
- Cappé, O., Robert, C., and Rydén, T. (2003). “Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 65(3): 679–700.
- Carvalho, C. M., and Scott, J. G. (2009). “Objective Bayesian model selection in Gaussian graphical models.” Biometrika, 96(3): 497–512.
- Chen, L., Tong, T., and Zhao, H. (2008). “Considering dependence among genes and markers for false discovery control in eQTL mapping.” Bioinformatics, 24(18): 2015–2022.
- Cheng, Y., Lenkoski, A., et al. (2012). “Hierarchical Gaussian graphical models: Beyond reversible jump.” Electronic Journal of Statistics, 6: 2309–2331.Mathematical Reviews (MathSciNet): MR3020264
Digital Object Identifier: doi:10.1214/12-EJS746
Project Euclid: euclid.ejs/1354284421 - Dahlhaus, R. and Eichler, M. (2003). “Causality and graphical models in time series analysis.” Oxford Statistical Science Series, 115–137.
- Dempster, A. (1972). “Covariance selection.” Biometrics, 28(1): 157–175.
- Dobra, A., Lenkoski, A., and Rodriguez, A. (2011a). “Bayesian inference for general Gaussian graphical models with application to multivariate lattice data.” Journal of the American Statistical Association, 106(496): 1418–1433.Mathematical Reviews (MathSciNet): MR2896846
Digital Object Identifier: doi:10.1198/jasa.2011.tm10465 - Dobra, A., Lenkoski, A., et al. (2011b). “Copula Gaussian graphical models and their application to modeling functional disability data.” The Annals of Applied Statistics, 5(2A): 969–993.Mathematical Reviews (MathSciNet): MR2840183
Digital Object Identifier: doi:10.1214/10-AOAS397
Project Euclid: euclid.aoas/1310562213 - Foygel, R. and Drton, M. (2010). “Extended Bayesian Information Criteria for Gaussian Graphical Models.” In Lafferty, J., Williams, C. K. I., Shawe-Taylor, J., Zemel, R., and Culotta, A. (eds.), Advances in Neural Information Processing Systems 23, 604–612.
- Friedman, J., Hastie, T., and Tibshirani, R. (2008). “Sparse inverse covariance estimation with the graphical lasso.” Biostatistics, 9(3): 432–441.
- Geyer, C. J. and Møller, J. (1994). “Simulation procedures and likelihood inference for spatial point processes.” Scandinavian Journal of Statistics, 359–373.Mathematical Reviews (MathSciNet): MR1310082
- Giudici, P. and Castelo, R. (2003). “Improving Markov chain Monte Carlo model search for data mining.” Machine Learning, 50(1-2): 127–158.
- Giudici, P. and Green, P. (1999). “Decomposable graphical Gaussian model determination.” Biometrika, 86(4): 785–801.
- Green, P. (1995). “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.” Biometrika, 82(4): 711–732.
- Green, P. J. (2003). “Trans-dimensional Markov chain Monte Carlo.” Oxford Statistical Science Series, 179–198.Mathematical Reviews (MathSciNet): MR2082410
- Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical learning: data mining, inference, and prediction, volume 2. Springer.Mathematical Reviews (MathSciNet): MR2722294
- Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C., and West, M. (2005). “Experiments in stochastic computation for high-dimensional graphical models.” Statistical Science, 20(4): 388–400.Mathematical Reviews (MathSciNet): MR2210226
Digital Object Identifier: doi:10.1214/088342305000000304
Project Euclid: euclid.ss/1137076659 - Kullback, S. and Leibler, R. A. (1951). “On information and sufficiency.” The Annals of Mathematical Statistics, 22(1): 79–86.Mathematical Reviews (MathSciNet): MR39968
Digital Object Identifier: doi:10.1214/aoms/1177729694
Project Euclid: euclid.aoms/1177729694 - Labrie, F., Luu-The, V., Lin, S.-X., Claude, L., Simard, J., Breton, R., and Bélanger, A. (1997). “The key role of 17$\beta$-hydroxysteroid dehydrogenases in sex steroid biology.” Steroids, 62(1): 148–158.
- Lauritzen, S. (1996). Graphical models, volume 17. Oxford University Press, USA.Mathematical Reviews (MathSciNet): MR1419991
- Lenkoski, A. (2013). “A direct sampler for G-Wishart variates.” Stat, 2(1): 119–128.
- Lenkoski, A. and Dobra, A. (2011). “Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior.” Journal of Computational and Graphical Statistics, 20(1): 140–157.
- Letac, G. and Massam, H. (2007). “Wishart distributions for decomposable graphs.” The Annals of Statistics, 35(3): 1278–1323.Mathematical Reviews (MathSciNet): MR2341706
Digital Object Identifier: doi:10.1214/009053606000001235
Project Euclid: euclid.aos/1185304006 - Liang, F. (2010). “A double Metropolis–Hastings sampler for spatial models with intractable normalizing constants.” Journal of Statistical Computation and Simulation, 80(9): 1007–1022.Mathematical Reviews (MathSciNet): MR2742519
Digital Object Identifier: doi:10.1080/00949650902882162 - Liu, H., Roeder, K., and Wasserman, L. (2010). “Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models.” In Advances in Neural Information Processing Systems, 1432–1440.
- Meinshausen, N. and Bühlmann, P. (2006). “High-dimensional graphs and variable selection with the lasso.” The Annals of Statistics, 34(3): 1436–1462.Mathematical Reviews (MathSciNet): MR2278363
Digital Object Identifier: doi:10.1214/009053606000000281
Project Euclid: euclid.aos/1152540754 - Mohammadi, A. and Wit, E. C. (2013). BDgraph: Graph estimation based on birth-death MCMC. R package version 2.10. http://CRAN.R-project.org/package=BDgraph
- Muirhead, R. (1982). Aspects of multivariate statistical theory, volume 42. Wiley Online Library.Mathematical Reviews (MathSciNet): MR652932
- Murray, I., Ghahramani, Z., and MacKay, D. (2012). “MCMC for doubly-intractable distributions.” arXiv preprint arXiv:1206.6848.
- Pitt, M., Chan, D., and Kohn, R. (2006). “Efficient Bayesian inference for Gaussian copula regression models.” Biometrika, 93(3): 537–554.
- Powers, D. M. (2011). “Evaluation: from precision, recall and F-measure to ROC, informedness, markedness & correlation.” Journal of Machine Learning Technologies, 2(1): 37–63.
- Preston, C. J. (1976). “Special birth-and-death processes.” Bulletin of the International Statistical Institute, 46: 371–391.Mathematical Reviews (MathSciNet): MR474532
- Ravikumar, P., Wainwright, M. J., Lafferty, J. D., et al. (2010). “High-dimensional Ising model selection using L1-regularized logistic regression.” The Annals of Statistics, 38(3): 1287–1319.Mathematical Reviews (MathSciNet): MR2662343
Digital Object Identifier: doi:10.1214/09-AOS691
Project Euclid: euclid.aos/1268056617 - Ripley, B. D. (1977). “Modelling spatial patterns.” Journal of the Royal Statistical Society. Series B (Methodological), 172–212.Mathematical Reviews (MathSciNet): MR488279
- Roverato, A. (2002). “Hyper Inverse Wishart Distribution for Non-decomposable Graphs and its Application to Bayesian Inference for Gaussian Graphical Models.” Scandinavian Journal of Statistics, 29(3): 391–411.
- Schmidt-Ott, K. M., Mori, K., Li, J. Y., Kalandadze, A., Cohen, D. J., Devarajan, P., and Barasch, J. (2007). “Dual action of neutrophil gelatinase–associated lipocalin.” Journal of the American Society of Nephrology, 18(2): 407–413.
- Scott, J. G. and Berger, J. O. (2006). “An exploration of aspects of Bayesian multiple testing.” Journal of Statistical Planning and Inference, 136(7): 2144–2162.Mathematical Reviews (MathSciNet): MR2235051
Digital Object Identifier: doi:10.1016/j.jspi.2005.08.031 - Scutari, M. (2013). “On the Prior and Posterior Distributions Used in Graphical Modelling.” Bayesian Analysis, 8(1): 1–28.Mathematical Reviews (MathSciNet): MR3102220
Digital Object Identifier: doi:10.1214/13-BA819
Project Euclid: euclid.ba/1378729914 - Stein, T., Morris, J. S., Davies, C. R., Weber-Hall, S. J., Duffy, M.-A., Heath, V. J., Bell, A. K., Ferrier, R. K., Sandilands, G. P., and Gusterson, B. A. (2004). “Involution of the mouse mammary gland is associated with an immune cascade and an acute-phase response, involving LBP, CD14 and STAT3.” Breast Cancer Research, 6(2): R75–91.
- Stephens, M. (2000). “Bayesian analysis of mixture models with an unknown number of components-an alternative to reversible jump methods.” Annals of Statistics, 28(1): 40–74.Mathematical Reviews (MathSciNet): MR1762903
Digital Object Identifier: doi:10.1214/aos/1016120364
Project Euclid: euclid.aos/1016120364 - Stranger, B. E., Nica, A. C., Forrest, M. S., Dimas, A., Bird, C. P., Beazley, C., Ingle, C. E., Dunning, M., Flicek, P., Koller, D., et al. (2007). “Population genomics of human gene expression.” Nature genetics, 39(10): 1217–1224.
- Wang, H. (2012). “Bayesian graphical lasso models and efficient posterior computation.” Bayesian Analysis, 7(4): 867–886.Mathematical Reviews (MathSciNet): MR3000017
Digital Object Identifier: doi:10.1214/12-BA729
Project Euclid: euclid.ba/1354024465 - — (2014). “Scaling It Up: Stochastic Search Structure Learning in Graphical Models.” http://www.stat.sc.edu/~wang345/RESEARCH/Wang2013WP.pdf
- Wang, H. and Li, S. (2012). “Efficient Gaussian graphical model determination under G-Wishart prior distributions.” Electronic Journal of Statistics, 6: 168–198.Mathematical Reviews (MathSciNet): MR2879676
Digital Object Identifier: doi:10.1214/12-EJS669
Project Euclid: euclid.ejs/1328280902 - Wang, H. and Pillai, N. S. (2013). “On a class of shrinkage priors for covariance matrix estimation.” Journal of Computational and Graphical Statistics, 22(3): 689–707.Mathematical Reviews (MathSciNet): MR3173737
Digital Object Identifier: doi:10.1080/10618600.2013.785732 - Wit, E. and McClure, J. (2004). Statistics for Microarrays: Design, Analysis and Inference. John Wiley & Sons.Mathematical Reviews (MathSciNet): MR2136598
- Zhao, P. and Yu, B. (2006). “On model selection consistency of Lasso.” The Journal of Machine Learning Research, 7: 2541–2563.Mathematical Reviews (MathSciNet): MR2274449
- Zhao, T., Liu, H., Roeder, K., Lafferty, J., and Wasserman, L. (2012). “The Huge Package for High-dimensional Undirected Graph Estimation in R.” The Journal of Machine Learning Research, 13(1): 1059–1062.Mathematical Reviews (MathSciNet): MR2930633

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- High-dimensional semiparametric Gaussian copula graphical models
Liu, Han, Han, Fang, Yuan, Ming, Lafferty, John, and Wasserman, Larry, Annals of Statistics, 2012 - Scaling It Up: Stochastic Search Structure Learning in Graphical Models
Wang, Hao, Bayesian Analysis, 2015 - Simultaneous SNP identification in association
studies with missing data
Li, Zhen, Gopal, Vikneswaran, Li, Xiaobo, Davis, John M., and Casella, George, Annals of Applied Statistics, 2012
- High-dimensional semiparametric Gaussian copula graphical models
Liu, Han, Han, Fang, Yuan, Ming, Lafferty, John, and Wasserman, Larry, Annals of Statistics, 2012 - Scaling It Up: Stochastic Search Structure Learning in Graphical Models
Wang, Hao, Bayesian Analysis, 2015 - Simultaneous SNP identification in association
studies with missing data
Li, Zhen, Gopal, Vikneswaran, Li, Xiaobo, Davis, John M., and Casella, George, Annals of Applied Statistics, 2012 - A quasi-Bayesian perspective to online clustering
Li, Le, Guedj, Benjamin, and Loustau, Sébastien, Electronic Journal of Statistics, 2018 - Bayesian Graphical Lasso Models and Efficient Posterior Computation
Wang, Hao, Bayesian Analysis, 2012 - Local adaptation and genetic effects on fitness: Calculations for exponential family models with random effects
Geyer, Charles J., Ridley, Caroline E., Latta, Robert G., Etterson, Julie R., and Shaw, Ruth G., Annals of Applied Statistics, 2013 - On the Choice of Difference Sequence in a Unified Framework for Variance Estimation in Nonparametric Regression
Dai, Wenlin, Tong, Tiejun, and Zhu, Lixing, Statistical Science, 2017 - Adjusted regularization in latent graphical models: Application to multiple-neuron spike count data
Vinci, Giuseppe, Ventura, Valérie, Smith, Matthew A., and Kass, Robert E., Annals of Applied Statistics, 2018 - Distributional equivalence and structure learning for bow-free acyclic path diagrams
Nowzohour, Christopher, Maathuis, Marloes H., Evans, Robin J., and Bühlmann, Peter, Electronic Journal of Statistics, 2017 - Bayesian learning of weakly structural Markov graph laws using sequential Monte Carlo methods
Olsson, Jimmy, Pavlenko, Tatjana, and Rios, Felix L., Electronic Journal of Statistics, 2019
