Bayesian Analysis

Efficient Model Comparison Techniques for Models Requiring Large Scale Data Augmentation

Panayiota Touloupou, Naif Alzahrani, Peter Neal, Simon E. F. Spencer, and Trevelyan J. McKinley

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access


Selecting between competing statistical models is a challenging problem especially when the competing models are non-nested. In this paper we offer a simple solution by devising an algorithm which combines MCMC and importance sampling to obtain computationally efficient estimates of the marginal likelihood which can then be used to compare the models. The algorithm is successfully applied to a longitudinal epidemic data set, where calculating the marginal likelihood is made more challenging by the presence of large amounts of missing data. In this context, our importance sampling approach is shown to outperform existing methods for computing the marginal likelihood.

Article information

Bayesian Anal. (2017), 23 pages.

First available in Project Euclid: 29 April 2017

Permanent link to this document

Digital Object Identifier

epidemics marginal likelihood model evidence model selection

Creative Commons Attribution 4.0 International License.


Touloupou, Panayiota; Alzahrani, Naif; Neal, Peter; Spencer, Simon E. F.; McKinley, Trevelyan J. Efficient Model Comparison Techniques for Models Requiring Large Scale Data Augmentation. Bayesian Anal., advance publication, 29 April 2017. doi:10.1214/17-BA1057.

Export citation


  • Addy, C. L., Longini, I. M. and Haber, M. (1991). A generalized stochastic model for the analysis of infectious disease final size data.Biometrics47, 961–974.
  • Anderson, B. D. O. and Moore, J. B. (1979). Optimal filtering.Englewood Cliffs, New Jersey: Prentice Hall.
  • Auranen K., Arjas E., Leino T. and Takala A. K. (2000). Transmission of pneummonococcal carriage in families: A latent Markov process model for binary longitudinal data.Journal of the American Statistical Association95, 1044–1053.
  • Ball, F. G., Mollison, D. and Scalia-Tomba, G. (1997). Epidemics with two levels of mixing.The Annals of Applied Probability7, 46–89.
  • Carter, C. K. and Kohn, R. (1994). On Gibbs sampling for state space models.Biometrika81, 541–553.
  • Chen, M.-H. (2005). Computing marginal likelihoods from a single MCMC output.Statistica Neerlandica59, 16–29.
  • Chib, S. (1995). Marginal likelihood from the Gibbs output.Journal of the American Statistical Association90, 1313–1321.
  • Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output.Journal of the American Statistical Association96, 270–281.
  • Clancy, D. and O’Neill, P. (2007). Exact Bayesian inference and model selection for stochastic models of epidemics among a community of households.Scandinavian Journal of Statistics34, 259–274.
  • Clyde, M. A., Berger, J. O., Bullard, F., Ford, E. B., Jefferys, W. H., Luo, R., Paulo, R. and Loredo, T. (2007). Current challenges in Bayesian model choice. InStatistical Challenges in Modern Astronomy IV ASP Conference Series, Vol. 371, proceedings of the conference held 12–15 June 2006 at Pennsylvania State University, in University Park, Pennsylvania, USA. Edited by G. Jogesh Babu and Eric D. Feigelson.p. 224.
  • Doucet, A. and Johansen, A. M. (2011). A tutorial on particle filtering and smoothing: fifteen years later. InHandbook of Nonlinear Filtering(eds D. Crisan and B. Rozovsky). Cambridge: Cambridge University Press. pp. 656–704.
  • Fong, Y., Rue, H. and Wakefield, J. (2010). Bayesian inference for generalized linear mixed models.Biostatistics11, 397–412.
  • Fox, J. P. and Hall, C. E. (1980).Viruses in families.PSG Publishing, Littleton, MA.
  • Friel, N. and Pettitt, A. N. (2008). Marginal likelihood estimation via power posteriors.Journal of the Royal Statistical Society: Series B (Statistical Methodology)70, 589–607.
  • Gelfand, A. E. and Dey, D. K. (1994). Bayesian model choice. Exact and asymptotic calculations.Journal of the Royal Statistical Society. Series B (Methodological)56, 501–514.
  • Gordon, N. J., Salmond, D. J., and Smith, A. F. M. (1993). Novel approach to nonlinear/non-Gaussian Bayesian state estimation.Radar and Signal Processing, IEE Proceedings F140, 107–113.
  • Green P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.Biometrika82, 711–732.
  • Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their applications.Biometrika57, 97–109.
  • Hesterberg, T. (1995). Weighted average importance sampling and defense mixture distributions.Technometrics37, 185–194.
  • Hussain M., Melegaro A., Pebody R. G., George R., Edmunds W. J., Talukdar R., Martin S. A., Efstratiou A. and Miller E. (2005). A longitudinal household study ofStreptococcus pneumoniaenasopharyngeal carriage in a UK setting.Epidemiology and Infection5, 891–898.
  • Karagiannis, G., and Andrieu, C. (2013). Annealed importance sampling reversible jump MCMC algorithms.Journal of Computational and Graphical Statistics22, 623–648.
  • Kass, R. E. and Raftery, A. E. (1995). Bayes factors.Journal of the American Statistical Association90, 773–795.
  • Kohn, R., Smith, M., Chan D. (2001). Nonparametric regression using linear combinations of basis functions.Statistics and Computing11, 313–322.
  • Melegaro, A., Gay, N., and Medley, G. (2004). Estimating the transmission parameters of pneumococcal carriage in households.Epidemiology and Infection132, 433–441.
  • Metropolis N., Rosenbluth A., Rosenbluth M., Teller A. and Teller E. (1953). Equations of state calculations by fast computing machines.Journal of Chemical Physics21, 1087–1092.
  • Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration.Statistica Sinica6, 831–860.
  • Monto, A. S., Koopman, J. S. and Longini, I. M. (1985). Tecumseh study of illness. XIII. Influenza infection and disease, 1976–1981.American Journal of Epidemiology121, 811–822.
  • Neal, P. and Kypraios, T. (2015). Exact Bayesian inference via data augmentation.Statistics and Computing25, 333–347
  • Neal, P. and Subba Rao, T. (2007). MCMC for integer-valued ARMA processes.Journal of Time Series Analysis28, 92–110.
  • Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap.Journal of the Royal Statistical Society. Series B (Methodological)56, 3–48.
  • O’Neill, P. D., Balding, D. J., Becker, N. G., Eerola, M. and Mollison, D. (2000). Analyses of infectious disease data from household outbreaks by Markov Chain Monte Carlo methods.Journal of the Royal Statistical Society. Series C (Applied Statistics)49, 517–542.
  • Ripley, B. D. (1987).Stochastic Simulation.Wiley & Sons.
  • Roberts, G. O. and Rosenthal, J. S. (2009). Examples of adaptive MCMC.Journal of Computational and Graphical Statistics18, 349–367
  • Skilling, J. (2004). Nested sampling.AIP Conference Proceedings735, 395–405.
  • Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection.Journal of Econometrics75, 317–344.
  • Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities.Journal of the American Statistical Association81, 82–86. 590–599.
  • Touloupou, P., Alzahrani, N., Neal, P., Spencer, S. E. F., and McKinley, T. J. (2017). Supplementary material: Efficient model comparison techniques for models requiring large scale data augmentation.Bayesian Analysis.
  • Zeger, S. (1988). A regression model for time series of counts.Biometrika75, 621–629.
  • Zhou, Y., Johansen, A. M. and Aston, J. A. D. (2015). Towards automatic model comparison: An adaptive sequential Monte Carlo approach.Journal of Computational and Graphical Statistics25, 701–726.

Supplemental materials