Statistical Science

Approximate Bayesian Computation and Simulation-Based Inference for Complex Stochastic Epidemic Models

Trevelyan J. McKinley, Ian Vernon, Ioannis Andrianakis, Nicky McCreesh, Jeremy E. Oakley, Rebecca N. Nsubuga, Michael Goldstein, and Richard G. White

Full-text: Open access

Abstract

Approximate Bayesian Computation (ABC) and other simulation-based inference methods are becoming increasingly used for inference in complex systems, due to their relative ease-of-implementation. We briefly review some of the more popular variants of ABC and their application in epidemiology, before using a real-world model of HIV transmission to illustrate some of challenges when applying ABC methods to high-dimensional, computationally intensive models. We then discuss an alternative approach—history matching—that aims to address some of these issues, and conclude with a comparison between these different methodologies.

Article information

Source
Statist. Sci., Volume 33, Number 1 (2018), 4-18.

Dates
First available in Project Euclid: 2 February 2018

Permanent link to this document
https://projecteuclid.org/euclid.ss/1517562021

Digital Object Identifier
doi:10.1214/17-STS618

Mathematical Reviews number (MathSciNet)
MR3757500

Keywords
Approximate Bayesian Computation history matching emulation Bayesian inference infectious disease models

Citation

McKinley, Trevelyan J.; Vernon, Ian; Andrianakis, Ioannis; McCreesh, Nicky; Oakley, Jeremy E.; Nsubuga, Rebecca N.; Goldstein, Michael; White, Richard G. Approximate Bayesian Computation and Simulation-Based Inference for Complex Stochastic Epidemic Models. Statist. Sci. 33 (2018), no. 1, 4--18. doi:10.1214/17-STS618. https://projecteuclid.org/euclid.ss/1517562021


Export citation

References

  • Andrianakis, I., Vernon, I., McCreesh, N., McKinley, T. J., Oakley, J. E., Nsubuga, R. N., Goldstein, M. and White, R. G. (2015). Bayesian history matching of complex infectious disease models using emulation: A tutorial and a case study on HIV in Uganda. PLoS Comput. Biol. 11. e1003968.
  • Andrianakis, I., McCreesh, N., Vernon, I., McKinley, T. J., Oakley, J. E., Nsubuga, R. N., Goldstein, M. and White, R. G. (2017). History matching of a high dimensional HIV transmission individual based model. SIAM/ASA J. Uncertain. Quantificat. 5 694–719.
  • Andrieu, C., Doucet, A. and Holenstein, R. (2010). Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 269–342.
  • Andrieu, C. and Roberts, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations. Ann. Statist. 37 697–725.
  • Barnes, C. P., Filippi, S., Stumpf, M. P. H. and Thorne, T. (2012). Considerate approaches to constructing summary statistics for ABC model selection. Stat. Comput. 22 1181–1197.
  • Beaumont, M. A. (2003). Estimation of population growth or decline in genetically monitored populations. Genetics 164 1139–1160.
  • Beaumont, M. A. (2010). Approximate Bayesian Computation in evolution and ecology. Annu. Rev. Ecol. Evol. Syst. 41 379–406.
  • Beaumont, M. A., Zhang, W. and Balding, D. J. (2002). Approximate Bayesian Computation in population genetics. Genetics 162 2025–2035.
  • Beaumont, M. A., Cornuet, J.-M., Marin, J.-M. and Robert, C. P. (2009). Adaptive approximate Bayesian computation. Biometrika 96 983–990.
  • Blum, M. G. B. and François, O. (2010). Non-linear regression models for approximate Bayesian computation. Stat. Comput. 20 63–73.
  • Bornn, L., Pillai, N. S., Smith, A. and Woodard, D. (2017). The use of a single pseudo-sample in approximate Bayesian computation. Stat. Comput. 27 583–590.
  • Bortot, P., Coles, S. G. and Sisson, S. A. (2007). Inference for stereological extremes. J. Amer. Statist. Assoc. 102 84–92.
  • Brooks Pollock, E., Roberts, G. O. and Keeling, M. J. (2014). A dynamic model of bovine tuberculosis spread and control in Great Britain. Nature 511 228–231.
  • Cameron, E., Battle, K. E., Bhatt, S., Weiss, D. J., Bisanzio, D., Mappin, B., Dalrymple, U., Hay, S. I., Smith, D. L., Griffin, J. T., Wenger, E. A., Eckhoff, P. A., Smith, T. A., Penny, M. A. and Gething, P. W. (2015). Defining the relationship between infection prevalence and clinical incidence of Plasmodium falciparum malaria. Nat. Commun. 6 8170.
  • Conlan, A. J. K., McKinley, T. J., Karolemeas, K., Pollock, E. B., Goodchild, A. V., Mitchell, A. P., Birch, C. P. D., Clifton-Hadley, R. S. and Wood, J. L. N. (2012). Estimating the hidden burden of bovine tuberculosis in Great Britain. PLoS Comput. Biol. 8 e1002730.
  • Craig, P. S., Goldstein, M., Seheult, A. H. and Smith, J. A. (1997). Pressure matching for hydrocarbon reservoirs: A case study in the use of Bayes linear strategies for large computer experiments. In Case Studies in Bayesian Statistics. 37–93. Springer.
  • Csilléry, K., Blum, M. G. B., Gaggiotti, O. E. and François, O. (2010). Approximate Bayesian Computation (ABC) in practice. Trends Ecol. Evol. 25 410–418.
  • Del Moral, P., Doucet, A. and Jasra, A. (2012). An adaptive sequential Monte Carlo method for approximate Bayesian computation. Stat. Comput. 22 1009–1020.
  • Diggle, P. J. and Gratton, R. J. (1984). Monte Carlo methods of inference for implicit statistical models. J. Roy. Statist. Soc. Ser. B 46 193–227.
  • Doucet, A., Pitt, M. K., Deligiannidis, G. and Kohn, R. (2015). Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator. Biometrika 102 295–313.
  • Drovandi, C. C. and Pettitt, A. N. (2011). Estimation of parameters for macroparasite population evolution using approximate Bayesian computation. Biometrics 67 225–233.
  • Drovandi, C. C., Pettitt, A. N. and Faddy, M. J. (2011). Approximate Bayesian computation using indirect inference. J. R. Stat. Soc. Ser. C. Appl. Stat. 60 317–337.
  • Drovandi, C. C., Pettitt, A. N. and Lee, A. (2015). Bayesian indirect inference using a parametric auxiliary model. Statist. Sci. 30 72–95.
  • Drovandi, C. C., Pettitt, A. N. and McCutchan, R. A. (2016). Exact and approximate Bayesian inference for low integer-valued time series models with intractable likelihoods. Bayesian Anal. 11 325–352.
  • Fearnhead, P. and Prangle, D. (2012). Constructing summary statistics for approximate Bayesian computation: Semi-automatic approximate Bayesian computation. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 419–474.
  • Filippi, S., Barnes, C. P., Cornebise, J. and Stumpf, M. P. H. (2013). On optimality of kernels for approximate Bayesian computation using sequential Monte Carlo. Stat. Appl. Genet. Mol. Biol. 12 87–107.
  • Gibson, G. J. and Renshaw, E. (1998). Estimating parameters in stochastic compartmental models using Markov chain methods. IMA J. Math. Appl. Med. Biol. 15 19–40.
  • Goldstein, M. and Rougier, J. (2009). Reified Bayesian modelling and inference for physical systems. J. Statist. Plann. Inference 139 1221–1239.
  • Goldstein, M., Seheult, A. and Vernon, I. (2013). Assessing Model Adequacy, 2nd ed. Wiley, UK.
  • Gouriéroux, C., Monfort, A. and Renault, E. (1993). Indirect inference. J. Appl. Econometrics 8 S85–S118.
  • Henderson, D. A., Boys, R. J., Krishnan, K. J., Lawless, C. and Wilkinson, D. J. (2009). Bayesian emulation and calibration of a stochastic computer model of mitochondrial DNA deletions in substantia nigra neurons. J. Amer. Statist. Assoc. 104 76–87.
  • Holden, P. B., Edwards, N. R., Hensman, J. and Wilkinson, R. D. (2016). ABC for climate: Dealing with expensive simulators. Handbook of Approximate Bayesian Computation (ABC). Available at 1511.03475.
  • Ionides, E. L., Bretó, C. and King, A. A. (2006). Inference for nonlinear dynamical systems. Proc. Natl. Acad. Sci. USA 103 18438–18443.
  • Ionides, E. L., Bhadra, A., Atchadé, Y. and King, A. (2011). Iterated filtering. Ann. Statist. 39 1776–1802.
  • Ionides, E. L., Nguyen, D., Atchadé, Y., Stoev, S. and King, A. A. (2015). Inference for dynamic and latent variable models via iterated, perturbed Bayes maps. Proc. Natl. Acad. Sci. USA 112 719–724.
  • Jabot, F., Lagarrigues, G., Courbaud, B. and Dumoulin, N. (2014). A comparison of emulation methods for Approximate Bayesian Computation. Available at http://arxiv.org/abs/1412.7560.
  • Jandarov, R., Haran, M., Bjørnstad, O. and Grenfell, B. (2014). Emulating a gravity model to infer the spatiotemporal dynamics of an infectious disease. J. R. Stat. Soc. Ser. C. Appl. Stat. 63 423–444.
  • Jewell, C. P., Kypraios, T., Christley, R. M. and Roberts, G. O. (2009). A novel approach to real-time risk prediction for emerging infectious diseases: A case study in avian influenza H5N1. Prev. Vet. Med. 91 19–28.
  • Joyce, P. and Marjoram, P. (2008). Approximately sufficient statistics and Bayesian computation. Stat. Appl. Genet. Mol. Biol. 7.
  • Kypraios, T., Neal, P. and Prangle, D. (2017). A tutorial introduction to Bayesian inference for stochastic epidemic models using approximate Bayesian computation. Math. Biosci. 287 42–53.
  • Lenormand, M., Jabot, F. and Deffuant, G. (2013). Adaptive approximate Bayesian computation for complex models. Comput. Statist. 28 2777–2796.
  • Marin, J.-M., Pudlo, P., Robert, C. P. and Ryder, R. J. (2012). Approximate Bayesian computational methods. Stat. Comput. 22 1167–1180.
  • Marjoram, P., Molitor, J., Plagnol, V. and Tavaré, S. (2003). Markov chain Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 100 15324–15328.
  • McCreesh, N., Andrianakis, I., Nsubuga, R. N., Strong, M., Vernon, I., McKinley, T. J., Oakley, J. E., Goldstein, M., Hayes, R. and White, R. G. (2017). Universal, test, treat, and keep: Improving ART retention is key in cost-effective HIV care and control in Uganda. BMC Infect. Dis.. To appear.
  • McKinley, T., Cook, A. R. and Deardon, R. (2009). Inference in epidemic models without likelihoods. Int. J. Biostat. 5.
  • McKinley, T. J., Ross, J. V., Deardon, R. and Cook, A. R. (2014). Simulation-based Bayesian inference for epidemic models. Comput. Statist. Data Anal. 71 434–447.
  • McKinley, T. J, Vernon, I., Andrianakis, I., McCreesh, N., Oakley, J. E., Nsubuga, R. N., Goldstein, M. and White, R. G. (2017). Supplement to “Approximate Bayesian computation and simulation-based inference for complex stochastic epidemic models.” DOI:10.1214/17-STS618SUPPA, DOI:10.1214/17-STS618SUPPB.
  • Meeds, E. and Welling, M. (2014). GPS-ABC: Gaussian process surrogate Approximate Bayesian Computation. Available at http://arxiv.org/abs/1401.2838v1.
  • Neal, P. (2012). Efficient likelihood-free Bayesian computation for household epidemics. Stat. Comput. 22 1239–1256.
  • Nunes, M. A. and Balding, D. J. (2010). On optimal selection of summary statistics for approximate Bayesian computation. Stat. Appl. Genet. Mol. Biol. 9.
  • O’Neill, P. D. and Roberts, G. O. (1999). Bayesian inference for partially observed stochastic epidemics. J. R. Stat. Soc., A 162 121–129.
  • O’Neill, P. D., Balding, D. J., Becker, N. G., Eerola, M. and Mollison, D. (2000). Analyses of infectious disease data from household outbreaks by Markov chain Monte Carlo methods. J. Roy. Statist. Soc. Ser. C 49 517–542.
  • Oakley, J. E. and Youngman, B. D. (2017). Calibration of stochastic computer simulators using likelihood emulation. Technometrics 59 80–92.
  • Pitt, M. K., Silva, R. d. S., Giordani, P. and Kohn, R. (2012). On some properties of Markov chain Monte Carlo simulation methods based on the particle filter. J. Econometrics 171 134–151.
  • Pukelsheim, F. (1994). The three sigma rule. Amer. Statist. 48 88–91.
  • Ratmann, O., Jørgensen, O., Hinkley, T., Stumpf, M., Richardson, S. and Wiuf, C. (2007). Using likelihood-free inference to compare evolutionary dynamics of the protein networks of H. pylori and P. falciparum. PLoS Comput. Biol. 3 2266–2278.
  • Ratmann, O., Andrieu, C., Wiuf, C. and Richardson, S. (2009). Model criticism based on likelihood-free inference, with an application to protein network evolution. Proc. Natl. Acad. Sci. USA 106 10576–10581.
  • Ratmann, O., Camacho, A., Meijer, A. and Donker, G. (2014). Statistical modelling of summary values leads to accurate Approximate Bayesian Computations. Available at arXiv:1305.4283v2.
  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
  • Sacks, J., Welch, W. J., Mitchell, T. J. and Wynn, H. P. (1989). Design and analysis of computer experiments. Statist. Sci. 4 409–435.
  • Sherlock, C., Thiery, A. H., Roberts, G. O. and Rosenthal, J. S. (2015). On the efficiency of pseudo-marginal random walk Metropolis algorithms. Ann. Statist. 43 238–275.
  • Silk, D., Filippi, S. and Stumpf, M. P. H. (2012). Optimizing threshold-schedules for approximate Bayesian computation sequential Monte Carlo samplers: applications to molecular systems. Available at arXiv:1210.3296v1.
  • Sisson, S. A., Fan, Y. and Tanaka, M. M. (2007). Sequential Monte Carlo without likelihoods. Proc. Natl. Acad. Sci. USA 104 1760–1765.
  • Tavaré, S., Balding, D. J., Griffiths, R. C. and Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics 145 505–518.
  • Toni, T., Welch, D., Strelkowa, N., Ipsen, A. and Strumpf, M. P. H. (2009). Approximate Bayesian Computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6 187–202.
  • Vernon, I., Goldstein, M. and Bower, R. G. (2010). Galaxy formation: A Bayesian uncertainty analysis. Bayesian Anal. 5 619–669.
  • Vernon, I., Goldstein, M. and Bower, R. (2014). Galaxy formation: Bayesian history matching for the observable universe. Statist. Sci. 29 81–90.
  • Wilkinson, R. D. (2013). Approximate Bayesian computation (ABC) gives exact results under the assumption of model error. Stat. Appl. Genet. Mol. Biol. 12 129–141.
  • Wilkinson, R. D. (2014). Accelerating ABC methods using Gaussian processes. In Proceedings of the 17th International Conference on Artificial Intelligence and Statistics (AISTATS) 33 1015–1023.
  • Wood, S. N. (2010). Statistical inference for noisy nonlinear ecological dynamic systems. Nature 466 1102–1104.

Supplemental materials

  • Supplement A: Bisection method. Details the bisection method used to generate tolerances at each generation of ABC.
  • Supplement B: Approximate posterior distributions for ABC vs. nonimplausible region for HM. Plots of the approximate posterior distributions after 11 generations of ABC, and depth plots after 9 waves of history matching. (Note that HM does not produce posterior samples, rather these correspond to the densities of nonimplausible points.).