The Annals of Applied Statistics

Copula Gaussian graphical models and their application to modeling functional disability data

Adrian Dobra and Alex Lenkoski

Full-text: Open access

Abstract

We propose a comprehensive Bayesian approach for graphical model determination in observational studies that can accommodate binary, ordinal or continuous variables simultaneously. Our new models are called copula Gaussian graphical models (CGGMs) and embed graphical model selection inside a semiparametric Gaussian copula. The domain of applicability of our methods is very broad and encompasses many studies from social science and economics. We illustrate the use of the copula Gaussian graphical models in the analysis of a 16-dimensional functional disability contingency table.

Article information

Source
Ann. Appl. Stat., Volume 5, Number 2A (2011), 969-993.

Dates
First available in Project Euclid: 13 July 2011

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1310562213

Digital Object Identifier
doi:10.1214/10-AOAS397

Mathematical Reviews number (MathSciNet)
MR2840183

Zentralblatt MATH identifier
1232.62046

Keywords
Bayesian inference Gaussian graphical models latent variable model Markov chain Monte Carlo

Citation

Dobra, Adrian; Lenkoski, Alex. Copula Gaussian graphical models and their application to modeling functional disability data. Ann. Appl. Stat. 5 (2011), no. 2A, 969--993. doi:10.1214/10-AOAS397. https://projecteuclid.org/euclid.aoas/1310562213


Export citation

References

  • Atay-Kayis, A. and Massam, H. (2005). A Monte Carlo method for computing the marginal likelihood in nondecomposable Gaussian graphical models. Biometrika 92 317–335.
  • Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press, Princeton, NJ.
  • Diaconis, P. and Ylvisaker, D. (1979). Conjugate priors for exponential families. Ann. Statist. 7 269–281.
  • Dobra, A., Erosheva, E. A. and Fienberg, S. E. (2003). Disclosure limitation methods based on bounds for large contingency tables with application to disability data. In Proceedings of Conference on the New Frontiers of Statistical Data Mining (E. H. Bozdogan, ed.) 93–116. CRC Press, New York.
  • Dobra, A. and Lenkoski, A. (2010). Supplement to “Copula Gaussian graphical models and their application to modeling functional disability data.” DOI: 10.1214/10-AOAS397SUPP.
  • Dobra, A. and Massam, H. (2010). The mode oriented stochastic search algorithm (MOSS) for log-linear models with conjugate priors. Statist. Methodol. 7 240–253.
  • Dunson, D. B. (2006). Bayesian dynamic modeling of latent trait distributions. Biostatistics 7 551–568.
  • Dunson, D. B. and Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. J. Amer. Statist. Assoc. 104 1042–1051.
  • Erosheva, E. A., Fienberg, S. E. and Joutard, C. (2007). Describing disability through individual-level mixture models for multivariate binary data. Ann. Appl. Statist. 1 502–537.
  • Fienberg, S. E., Hersh, P., Rinaldo, A. and Zhou, Y. (2010). Maximum likelihood estimation in latent class models for contingency table data. In Algebraic and Geometric Methods in Statistics (P. Gibilisco, E. Riccomagno, M. P. Rogantin and E. H. P. Wynn, eds.) 27–62. Cambridge Univ. Press, Cambridge.
  • Genest, C. and Neslehová (2007). A primer on copulas for count data. Astin Bulletin 37 475–515.
  • Giudici, P. and Green, P. J. (1999). Decomposable graphical Gaussian model determination. Biometrika 86 785–801.
  • Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • Hoff, P. D. (2007). Extending the rank likelihood for semiparametric copula estimation. Ann. Appl. Statist. 1 265–283.
  • Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C. and West, M. (2005). Experiments in stochastic computation for high-dimensional graphical models. Statist. Sci. 20 388–400.
  • Kass, R. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773–795.
  • Lauritzen, S. L. (1996). Graphical Models. Oxford Univ. Press, Oxford.
  • Lenkoski, A. and Dobra, A. (2010). Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior. J. Comput. Graph. Statist. DOI: 10.1198/jcgs.2010.08181.
  • Letac, G. and Massam, H. (2007). Wishart distributions for decomposable graphs. Ann. Statist. 35 1278–1323.
  • Liu, H., Lafferty, J. and Wasserman, L. (2009). The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. J. Mach. Learn. Res. 10 2295–2328.
  • Madigan, D. and York, J. (1995). Bayesian graphical models for discrete data. Int. Statist. Rev. 63 215–232.
  • Manton, K. G., Corder, L. and Stallard, E. (1993). Estimates of change in chronic disability and institutional incidence and prevalence rate in the US elderly populations from 1982 to 1989. J. Gerontol. Soc. Sci. 48 S153–S166.
  • Manton, K. G. and Gu, X. (2001). Changes in prevalence of chronic disability in the United States black and nonblack population above age 65 from 1982 to 1999. Proc. Natl. Acad. Sci. USA 98 6354–6359.
  • Muirhead, R. J. (2005). Aspects of Multivariate Statistical Theory. Wiley, New York.
  • Muthén, B. (1984). A general structural equation model with dichotomous, ordered categorical, and continuous latent variables indicators. Psychometrika 49 115–132.
  • Nelsen, R. B. (1999). An Introduction to Copulas. Springer, New York.
  • Pitt, M., Chan, D. and Kohn, R. (2006). Efficient Bayesian inference for Gaussian copula regression models. Biometrika 93 537–554.
  • Roverato, A. (2002). Hyper inverse Wishart distribution for non-decomposable graphs and its application to Bayesian inference for Gaussian graphical models. Scand. J. Statist. 29 391–411.
  • Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
  • Song, P. X. K. (2000). Multivariate dispersion models generated from Gaussian copula. Scand. J. Statist. 27 305–320.
  • Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, New York.
  • Wong, F., Carter, C. K. and Kohn, R. (2003). Efficient estimation of covariance selection models. Biometrika 90 809–830.

Supplemental materials

  • Supplementary material: C++ implementation of copula Gaussian graphical models. We provide source code for the methodology described in this paper. Our program takes advantage of cluster computing to run several Markov chains in parallel. By using this code, one can replicate the analyses of the Rochdale data and the NLTCS functional disability data for which we give sample input files.