Bayesian Analysis

Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies

Zehang Richard Li, Tyler H. McComick, and Samuel J. Clark

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

Learning dependence relationships among variables of mixed types provides insights in a variety of scientific settings and is a well-studied problem in statistics. Existing methods, however, typically rely on copious, high quality data to accurately learn associations. In this paper, we develop a method for scientific settings where learning dependence structure is essential, but data are sparse and have a high fraction of missing values. Specifically, our work is motivated by survey-based cause of death assessments known as verbal autopsies (VAs). We propose a Bayesian approach to characterize dependence relationships using a latent Gaussian graphical model that incorporates informative priors on the marginal distributions of the variables. We demonstrate such information can improve estimation of the dependence structure, especially in settings with little training data. We show that our method can be integrated into existing probabilistic cause-of-death assignment algorithms and improves model performance while recovering dependence patterns between symptoms that can inform efficient questionnaire design in future data collection.

Article information

Source
Bayesian Anal., Advance publication (2018), 27 pages.

Dates
First available in Project Euclid: 24 September 2019

Permanent link to this document
https://projecteuclid.org/euclid.ba/1569290444

Digital Object Identifier
doi:10.1214/19-BA1172

Keywords
cause of death mixed data high dimensional spike-and-slab parameter expansion

Rights
Creative Commons Attribution 4.0 International License.

Citation

Li, Zehang Richard; McComick, Tyler H.; Clark, Samuel J. Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies. Bayesian Anal., advance publication, 24 September 2019. doi:10.1214/19-BA1172. https://projecteuclid.org/euclid.ba/1569290444


Export citation

References

  • Andrews, J. L. and McNicholas, P. D. (2014). “Variable selection for clustering and classification.” Journal of Classification, 31(2): 136–153.
  • Barnard, J., McCulloch, R., and Meng, X.-L. (2000). “Modeling Covariance Matrices in Terms of Standard Deviations and Correlations, With Application To Shrinkage.” Statistica Sinica, 10(4): 1281–1311.
  • Bhadra, A., Rao, A., and Baladandayuthapani, V. (2018). “Inferring network structure in non-normal and mixed discrete-continuous genomic data.” Biometrics, 74(1): 185–195.
  • Bu, Y. and Lederer, J. (2017). “Integrating Additional Knowledge Into Estimation of Graphical Models.” arXiv preprint arXiv:1704.02739.
  • Byass, P., Huong, D. L., and Van Minh, H. (2003). “A probabilistic approach to interpreting verbal autopsies: methodology and preliminary validation in Vietnam.” Scandinavian Journal of Public Health, 31(62 suppl): 32–37.
  • Clark, S. J., Li, Z. R., and McCormick, T. H. (2018). “Quantifying the contributions of training data and algorithm logic to the performance of automated cause-assignment algorithms for Verbal Autopsy.” arXiv preprint arXiv:1803.07141.
  • Crampin, A. C., Dube, A., Mboma, S., Price, A., Chihana, M., Jahn, A., Baschieri, A., Molesworth, A., Mwaiyeghele, E., Branson, K., et al. (2012). “Profile: the Karonga health and demographic surveillance system.” International Journal of Epidemiology, 41(3): 676–685.
  • Deshpande, S. K., Rockova, V., and George, E. I. (2017). “Simultaneous Variable and Covariance Selection with the Multivariate Spike-and-Slab Lasso.” arXiv preprint arXiv:1708.08911.
  • Dobra, A., Lenkoski, A., et al. (2011). “Copula Gaussian graphical models and their application to modeling functional disability data.” The Annals of Applied Statistics, 5(2A): 969–993.
  • Fan, J., Liu, H., Ning, Y., and Zou, H. (2016). “High dimensional semiparametric latent graphical model for mixed data.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).
  • Gan, L., Narisetty, N. N., and Liang, F. (2018). “Bayesian regularization for graphical models with unequal shrinkage.” Journal of the American Statistical Association, 1–14.
  • Gelman, A. (2006). “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian analysis, 1(3): 515–534.
  • Gruhl, J., Erosheva, E. A., and Crane, P. K. (2013). “A semiparametric approach to mixed outcome latent variable models: Estimating the association between cognition and regional brain volumes.” The Annals of Applied Statistics, 2361–2383.
  • Hoff, P. D. (2007). “Extending the rank likelihood for semiparametric copula estimation.” The Annals of Applied Statistics, 265–283.
  • Horton, R. (2007). “Counting for health.” Lancet, 370(9598): 1526.
  • James, S. L., Flaxman, A. D., Murray, C. J., and Consortium Population Health Metrics Research (2011). “Performance of the Tariff Method: validation of a simple additive algorithm for analysis of verbal autopsies.” Population Health Metrics, 9(31).
  • Jha, P. (2014). “Reliable direct measurement of causes of death in low-and middle-income countries.” BMC medicine, 12(1): 19.
  • Jin, Z. and Matteson, D. S. (2018). “Independent Component Analysis via Energy-based and Kernel-based Mutual Dependence Measures.” arXiv preprint arXiv:1805.06639.
  • Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C., and West, M. (2005). “Experiments in stochastic computation for high-dimensional graphical models.” Statistical Science, 388–400.
  • King, G. and Lu, Y. (2008). “Verbal autopsy methods with multiple causes of death.” Statistical Science, 100(469).
  • Klaassen, C. A. and Wellner, J. A. (1997). “Efficient estimation in the bivariate normal copula model: normal margins are least favourable.” Bernoulli, 3(1): 55–77.
  • Kunihama, T., Li, Z. R., Clark, S. J., and McCormick, T. H. (2018). “Bayesian factor models for probabilistic cause of death assessment with verbal autopsies.” arXiv preprint arXiv:1803.01327.
  • Lenkoski, A. and Dobra, A. (2011). “Computational aspects related to inference in Gaussian graphical models with the G-Wishart prior.” Journal of Computational and Graphical Statistics, 20(1): 140–157.
  • Li, Y., Craig, B. A., and Bhadra, A. (2017). “The Graphical Horseshoe Estimator for Inverse Covariance Matrices.” arXiv preprint arXiv:1707.06661.
  • Li, Z. and McCormick, T. H. (2019). “An Expectation Conditional Maximization approach for Gaussian graphical models.” Journal of Computational and Graphical Statistics, 1–11.
  • Li, Z. R., McCormick, T., and Clark, S. (2019a). openVA: Automated Method for Verbal Autopsy. R package version 1.0.8. URL http://CRAN.R-project.org/package=openVA.
  • Li, Z. R., McCormick, T., and Clark, S. (2019b). “Supplementary Material to “Using Bayesian latent Gaussian graphical models to infer symptom associations in verbal autopsies”.” Bayesian Analysis.
  • Liu, H., Han, F., Yuan, M., Lafferty, J., Wasserman, L., et al. (2012). “High-dimensional semiparametric Gaussian copula graphical models.” The Annals of Statistics, 40(4): 2293–2326.
  • Liu, H., Lafferty, J., and Wasserman, L. (2009). “The nonparanormal: Semiparametric estimation of high dimensional undirected graphs.” Journal of Machine Learning Research, 10(Oct): 2295–2328.
  • Liu, J. S. and Wu, Y. N. (1999). “Parameter Expansion for Data Augmentation.” Journal of the American Statistical Association, 94(448): 1264–1274.
  • McCormick, T. H., Li, Z. R., Calvert, C., Crampin, A. C., Kahn, K., and Clark, S. J. (2016). “Probabilistic cause-of-death assignment using verbal autopsies.” Journal of the American Statistical Association, 111(515): 1036–1049.
  • Meng, X.-L. and Van Dyk, D. A. (1999). “Seeking efficient data augmentation schemes via conditional and marginal augmentation.” Biometrika, 86(2): 301–320.
  • Miasnikof, P., Giannakeas, V., Gomes, M., Aleksandrowicz, L., Shestopaloff, A. Y., Alam, D., Tollman, S., Samarikhalaj, A., and Jha, P. (2015). “Naive Bayes classifiers for verbal autopsies: comparison to physician-based classification for 21,000 child and adult deaths.” BMC medicine, 13(1): 1.
  • Mohammadi, A., Abegaz, F., van den Heuvel, E., and Wit, E. C. (2017). “Bayesian modelling of Dupuytren disease by using Gaussian copula graphical models.” Journal of the Royal Statistical Society: Series C (Applied Statistics), 66(3): 629–645.
  • Mohammadi, R. and Wit, E. C. (2017). “BDgraph: An R Package for Bayesian Structure Learning in Graphical Models.” arXiv preprint arXiv:1501.05108.
  • Murray, C. J., Lopez, A. D., Black, R., Ahuja, R., Ali, S. M., Baqui, A., Dandona, L., Dantzer, E., Das, V., Dhingra, U., et al. (2011a). “Population Health Metrics Research Consortium gold standard verbal autopsy validation study: design, implementation, and development of analysis datasets.” Population health metrics, 9(1): 27.
  • Murray, C. J., Lozano, R., Flaxman, A. D., Vahdatpour, A., and Lopez, A. D. (2011b). “Robust metrics for assessing the performance of different verbal autopsy cause assignment methods in validation studies.” , 9(1): 28.
  • Murray, I., Adams, R., and MacKay, D. (2010). “Elliptical slice sampling.” In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, 541–548.
  • Murray, J. S., Dunson, D. B., Carin, L., and Lucas, J. E. (2013). “Bayesian Gaussian copula factor models for mixed data.” Journal of the American Statistical Association, 108(502): 656–665.
  • Nishihara, R., Murray, I., and Adams, R. P. (2014). “Parallel MCMC with generalized elliptical slice sampling.” The Journal of Machine Learning Research, 15(1): 2087–2112.
  • Peterson, C., Vannucci, M., Karakas, C., Choi, W., Ma, L., and Meletić-Savatić, M. (2013). “Inferring metabolic networks using the Bayesian adaptive graphical lasso with informative priors.” Statistics and its Interface, 6(4): 547.
  • R Core Team (2018). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
  • Ročková, V. and George, E. I. (2014). “EMVS: The EM approach to Bayesian variable selection.” Journal of the American Statistical Association, 109(506): 828–846.
  • Roverato, A. (2002). “Hyper Inverse Wishart Distribution for Non-decomposable Graphs and its Application to Bayesian Inference for Gaussian Graphical Models.” Scandinavian Journal of Statistics, 29(3): 391–411.
  • Serina, P., Riley, I., Stewart, A., James, S. L., Flaxman, A. D., Lozano, R., Hernandez, B., Mooney, M. D., Luning, R., Black, R., et al. (2015). “Improving performance of the Tariff Method for assigning causes of death to verbal autopsies.” BMC medicine, 13(1): 1.
  • Talhouk, A., Doucet, A., and Murphy, K. (2012). “Efficient Bayesian Inference for Multivariate Probit Models With Sparse Inverse Correlation Matrices.” Journal of Computational and Graphical Statistics, 21(February 2015): 739–757.
  • Wang, H. (2015). “Scaling it up: Stochastic search structure learning in graphical models.” Bayesian Analysis, 10(2): 351–377.
  • Wang, H. et al. (2012). “Bayesian graphical lasso models and efficient posterior computation.” Bayesian Analysis, 7(4): 867–886.
  • Xue, L., Zou, H., et al. (2012). “Regularized rank-based estimation of high-dimensional nonparanormal graphical models.” The Annals of Statistics, 40(5): 2541–2571.

Supplemental materials

  • Supplementary Material to “Using Bayesian Latent Gaussian Graphical Models to Infer Symptom Associations in Verbal Autopsies”. PDF document of supplementary material. The replication R and Java codes to implement the proposed method can be found in the repository at https://github.com/richardli/LGGM.