Statistical Science

Experiments in Stochastic Computation for High-Dimensional Graphical Models

Beatrix Jones, Carlos Carvalho, Adrian Dobra, Chris Hans, Chris Carter, and Mike West

Source: Statist. Sci. Volume 20, Number 4 (2005), 388-400.

Abstract

We discuss the implementation, development and performance of methods of stochastic computation in Gaussian graphical models. We view these methods from the perspective of high-dimensional model search, with a particular interest in the scalability with dimension of Markov chain Monte Carlo (MCMC) and other stochastic search methods. After reviewing the structure and context of undirected Gaussian graphical models and model uncertainty (covariance selection), we discuss prior specifications, including new priors over models, and then explore a number of examples using various methods of stochastic computation. Traditional MCMC methods are the point of departure for this experimentation; we then develop alternative stochastic search ideas and contrast this new approach with MCMC. Our examples range from low (12–20) to moderate (150) dimension, and combine simple synthetic examples with data analysis from gene expression studies. We conclude with comments about the need and potential for new computational methods in far higher dimensions, including constructive approaches to Gaussian graphical modeling and computation.

Keywords: Decomposable models; nondecomposable models; Markov chain Monte Carlo; shotgun stochastic search; parallel implementation

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1137076659
Digital Object Identifier: doi:10.1214/088342305000000304
Mathematical Reviews number (MathSciNet): MR2210226
Zentralblatt MATH identifier: 1130.62408

References

Andersson, S. A., Madigan, D., Perlman, M. D. and Richardson, T. (1999). Graphical Markov models in multivariate analysis. In Multivariate Analysis, Design of Experiments and Survey Sampling (S. Ghosh, ed.) 187--229. Dekker, New York.
Mathematical Reviews (MathSciNet): MR1719078
Zentralblatt MATH: 1068.62515
Armstrong, H., Carter, C. K., Wong, K. F. and Kohn, R. (2005). Bayesian covariance matrix estimation using a mixture of decomposable graphical models. Unpublished manuscript.
Atay-Kayis, A. and Massam, H. (2006). The marginal likelihood for decomposable and non-decomposable graphical Gaussian models. Biometrika. To appear.
Mathematical Reviews (MathSciNet): MR2201362
Digital Object Identifier: doi:10.1093/biomet/92.2.317
Cowell, R. G., Dawid, A. P., Lauritzen, S. L. and Spiegelhalter, D. J. (1999). Probabilistic Networks and Expert Systems. Springer, New York.
Mathematical Reviews (MathSciNet): MR1697175
Zentralblatt MATH: 0937.68121
Dawid, A. P. and Lauritzen, S. L. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272--1317.
Mathematical Reviews (MathSciNet): MR1241267
Digital Object Identifier: doi:10.1214/aos/1176349260
Project Euclid: euclid.aos/1176349260
Dellaportas, P. and Forster, J. J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86 615--633.
Mathematical Reviews (MathSciNet): MR1723782
Zentralblatt MATH: 0949.62050
Digital Object Identifier: doi:10.1093/biomet/86.3.615
Dellaportas, P., Giudici, P. and Roberts, G. (2003). Bayesian inference for nondecomposable graphical Gaussian models. Sankhyā 65 43--55.
Mathematical Reviews (MathSciNet): MR2016776
Dempster, A. P. (1972). Covariance selection. Biometrics 28 157--175.
Dickey, J. M. (1971). The weighted likelihood ratio, linear hypotheses on normal location parameters. Ann. Math. Statist. 42 204--223.
Mathematical Reviews (MathSciNet): MR309225
Digital Object Identifier: doi:10.1214/aoms/1177693507
Project Euclid: euclid.aoms/1177693507
Dobra, A. and Fienberg, S. E. (2000). Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proc. Natl. Acad. Sci. U.S.A. 97 11,885--11,892.
Mathematical Reviews (MathSciNet): MR1789526
Digital Object Identifier: doi:10.1073/pnas.97.22.11885
Dobra, A., Hans, C., Jones, B. Nevins, J., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196--212.
Mathematical Reviews (MathSciNet): MR2064941
Digital Object Identifier: doi:10.1016/j.jmva.2004.02.009
Dobra, A. and West, M. (2004). Bayesian covariance selection. Available as Discussion Paper 04-23 at www.isds.duke.edu.
Flores, M. J., Gámez, J. A. and Olesen, K. G. (2003). Incremental compilation of Bayesian networks. In Proc. 19th Annual Conference on Uncertainty in Artificial Intelligence 233--240. Morgan Kaufmann, San Francisco.
Friedman, N., Linial, M., Nachman, I. and Pe'er, D. (2000). Using Bayesian networks to analyze expression data. J. Computational Biology 7 601--620.
Giudici, P. (1996). Learning in graphical Gaussian models. In Bayesian Statistics 5 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 621--628. Oxford Univ. Press, London.
Mathematical Reviews (MathSciNet): MR1425431
Giudici, P. and Castelo, R. (2003). Improving Markov chain Monte Carlo model search for data mining. Machine Learning 50 127--158.
Giudici, P. and Green, P. J. (1999). Decomposable graphical Gaussian model determination. Biometrika 86 785--801.
Mathematical Reviews (MathSciNet): MR1741977
Zentralblatt MATH: 0940.62019
Digital Object Identifier: doi:10.1093/biomet/86.4.785
Grone, R., Johnson, C. R., de Sá, E. M. and Wolkowicz, H. (1984). Positive definite completions of partial Hermitian matrices. Linear Algebra Appl. 58 109--124.
Mathematical Reviews (MathSciNet): MR739282
Digital Object Identifier: doi:10.1016/0024-3795(84)90207-6
Hammersley, J. M. and Clifford, P. E. (1971). Markov fields on finite graphs and lattices. Unpublished manuscript.
Heckerman, D., Chickering, D. M., Meek, C., Rounthwaite, R. and Kadie, C. M. (2000). Dependency networks for inference, collaborative filtering, and data visualization. J. Machine Learning Research 1 49--75.
Lauritzen, S. L. (1996). Graphical Models. Clarendon Press, Oxford.
Mathematical Reviews (MathSciNet): MR1419991
Lauritzen, S. L. and Sheehan, N. A. (2003). Graphical models for genetic analyses. Statist. Sci. 18 489--514.
Mathematical Reviews (MathSciNet): MR2059327
Digital Object Identifier: doi:10.1214/ss/1081443232
Project Euclid: euclid.ss/1081443232
Madigan, D. and York, J. (1995). Bayesian graphical models for discrete data. Internat. Statist. Rev. 63 215--232.
Roverato, A. (2002). Hyper-inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Statist. 29 391--411.
Mathematical Reviews (MathSciNet): MR1925566
Digital Object Identifier: doi:10.1111/1467-9469.00297
Wermuth, N. (1976). Model search among multiplicative models. Biometrics 32 253--263.
Mathematical Reviews (MathSciNet): MR403088
Digital Object Identifier: doi:10.2307/2529341
West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Olson, Jr., J. A., Marks, J.R. and Nevins, J. R. (2001). Predicting the clinical status of human breast cancer by using gene expression profiles. Proc. Natl. Acad. Sci. U.S.A. 98 11,462--11,467.
Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, Chichester.
Mathematical Reviews (MathSciNet): MR1112133
Wong, F., Carter, C. and Kohn, R. (2003). Efficient estimation of covariance selection models. Biometrika 90 809--830.
Mathematical Reviews (MathSciNet): MR2024759
Digital Object Identifier: doi:10.1093/biomet/90.4.809
Yu, J., Smith, V., Wang, P., Hartemink, A. and Jarvis, E. (2004). Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinformatics 20 3594--3603.
Zentralblatt MATH: 1078.92024
Zhou, X., Kao, M. J. and Wong, W. H. (2002). Transitive functional annotation by shortest-path analysis of gene expression data. Proc. Natl. Acad. Sci. U.S.A. 99 12,783--12,788.

2009 © Institute of Mathematical Statistics