Bayesian Analysis

Efficient Model Comparison Techniques for Models Requiring Large Scale Data Augmentation

Panayiota Touloupou, Naif Alzahrani, Peter Neal, Simon E. F. Spencer, and Trevelyan J. McKinley

Full-text: Open access


Selecting between competing statistical models is a challenging problem especially when the competing models are non-nested. In this paper we offer a simple solution by devising an algorithm which combines MCMC and importance sampling to obtain computationally efficient estimates of the marginal likelihood which can then be used to compare the models. The algorithm is successfully applied to a longitudinal epidemic data set, where calculating the marginal likelihood is made more challenging by the presence of large amounts of missing data. In this context, our importance sampling approach is shown to outperform existing methods for computing the marginal likelihood.

Article information

Bayesian Anal., Volume 13, Number 2 (2018), 437-459.

First available in Project Euclid: 29 April 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

epidemics marginal likelihood model evidence model selection

Creative Commons Attribution 4.0 International License.


Touloupou, Panayiota; Alzahrani, Naif; Neal, Peter; Spencer, Simon E. F.; McKinley, Trevelyan J. Efficient Model Comparison Techniques for Models Requiring Large Scale Data Augmentation. Bayesian Anal. 13 (2018), no. 2, 437--459. doi:10.1214/17-BA1057.

Export citation


  • Addy, C. L., Longini, I. M. and Haber, M. (1991). A generalized stochastic model for the analysis of infectious disease final size data. Biometrics 47, 961–974.
  • Anderson, B. D. O. and Moore, J. B. (1979). Optimal filtering. Englewood Cliffs, New Jersey: Prentice Hall.
  • Auranen K., Arjas E., Leino T. and Takala A. K. (2000). Transmission of pneummonococcal carriage in families: A latent Markov process model for binary longitudinal data. Journal of the American Statistical Association 95, 1044–1053.
  • Ball, F. G., Mollison, D. and Scalia-Tomba, G. (1997). Epidemics with two levels of mixing. The Annals of Applied Probability 7, 46–89.
  • Carter, C. K. and Kohn, R. (1994). On Gibbs sampling for state space models. Biometrika 81, 541–553.
  • Chen, M.-H. (2005). Computing marginal likelihoods from a single MCMC output. Statistica Neerlandica 59, 16–29.
  • Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association 90, 1313–1321.
  • Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association 96, 270–281.
  • Clancy, D. and O’Neill, P. (2007). Exact Bayesian inference and model selection for stochastic models of epidemics among a community of households. Scandinavian Journal of Statistics 34, 259–274.
  • Clyde, M. A., Berger, J. O., Bullard, F., Ford, E. B., Jefferys, W. H., Luo, R., Paulo, R. and Loredo, T. (2007). Current challenges in Bayesian model choice. In Statistical Challenges in Modern Astronomy IV ASP Conference Series, Vol. 371, proceedings of the conference held 12–15 June 2006 at Pennsylvania State University, in University Park, Pennsylvania, USA. Edited by G. Jogesh Babu and Eric D. Feigelson. p. 224.
  • Doucet, A. and Johansen, A. M. (2011). A tutorial on particle filtering and smoothing: fifteen years later. In Handbook of Nonlinear Filtering (eds D. Crisan and B. Rozovsky). Cambridge: Cambridge University Press. pp. 656–704.
  • Fong, Y., Rue, H. and Wakefield, J. (2010). Bayesian inference for generalized linear mixed models. Biostatistics 11, 397–412.
  • Fox, J. P. and Hall, C. E. (1980). Viruses in families. PSG Publishing, Littleton, MA.
  • Friel, N. and Pettitt, A. N. (2008). Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 589–607.
  • Gelfand, A. E. and Dey, D. K. (1994). Bayesian model choice. Exact and asymptotic calculations. Journal of the Royal Statistical Society. Series B (Methodological) 56, 501–514.
  • Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation. Radar and Signal Processing, IEE Proceedings F 140, 107–113.
  • Green P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82, 711–732.
  • Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109.
  • Hesterberg, T. (1995). Weighted average importance sampling and defense mixture distributions. Technometrics 37, 185–194.
  • Hussain M., Melegaro A., Pebody R. G., George R., Edmunds W. J., Talukdar R., Martin S. A., Efstratiou A. and Miller E. (2005). A longitudinal household study of Streptococcus pneumoniae nasopharyngeal carriage in a UK setting. Epidemiology and Infection 5, 891–898.
  • Karagiannis, G., and Andrieu, C. (2013). Annealed importance sampling reversible jump MCMC algorithms. Journal of Computational and Graphical Statistics 22, 623–648.
  • Kass, R. E. and Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association 90, 773–795.
  • Kohn, R., Smith, M., Chan D. (2001). Nonparametric regression using linear combinations of basis functions. Statistics and Computing 11, 313–322.
  • Melegaro, A., Gay, N., and Medley, G. (2004). Estimating the transmission parameters of pneumococcal carriage in households. Epidemiology and Infection 132, 433–441.
  • Metropolis N., Rosenbluth A., Rosenbluth M., Teller A. and Teller E. (1953). Equations of state calculations by fast computing machines. Journal of Chemical Physics 21, 1087–1092.
  • Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statistica Sinica 6, 831–860.
  • Monto, A. S., Koopman, J. S. and Longini, I. M. (1985). Tecumseh study of illness. XIII. Influenza infection and disease, 1976–1981. American Journal of Epidemiology 121, 811–822.
  • Neal, P. and Kypraios, T. (2015). Exact Bayesian inference via data augmentation. Statistics and Computing 25, 333–347
  • Neal, P. and Subba Rao, T. (2007). MCMC for integer-valued ARMA processes. Journal of Time Series Analysis 28, 92–110.
  • Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap. Journal of the Royal Statistical Society. Series B (Methodological) 56, 3–48.
  • O’Neill, P. D., Balding, D. J., Becker, N. G., Eerola, M. and Mollison, D. (2000). Analyses of infectious disease data from household outbreaks by Markov Chain Monte Carlo methods. Journal of the Royal Statistical Society. Series C (Applied Statistics) 49, 517–542.
  • Ripley, B. D. (1987). Stochastic Simulation. Wiley & Sons.
  • Roberts, G. O. and Rosenthal, J. S. (2009). Examples of adaptive MCMC. Journal of Computational and Graphical Statistics 18, 349–367
  • Skilling, J. (2004). Nested sampling. AIP Conference Proceedings 735, 395–405.
  • Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. Journal of Econometrics 75, 317–344.
  • Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. Journal of the American Statistical Association 81, 82–86. 590–599.
  • Touloupou, P., Alzahrani, N., Neal, P., Spencer, S. E. F., and McKinley, T. J. (2017). Supplementary material: Efficient model comparison techniques for models requiring large scale data augmentation. Bayesian Analysis.
  • Zeger, S. (1988). A regression model for time series of counts. Biometrika 75, 621–629.
  • Zhou, Y., Johansen, A. M. and Aston, J. A. D. (2015). Towards automatic model comparison: An adaptive sequential Monte Carlo approach. Journal of Computational and Graphical Statistics 25, 701–726.

Supplemental materials