Bayesian Analysis

Exploiting Multi-Core Architectures for Reduced-Variance Estimation with Intractable Likelihoods

Nial Friel, Antonietta Mira, and Chris J. Oates

Full-text: Open access

Abstract

Many popular statistical models for complex phenomena are intractable, in the sense that the likelihood function cannot easily be evaluated. Bayesian estimation in this setting remains challenging, with a lack of computational methodology to fully exploit modern processing capabilities. In this paper we introduce novel control variates for intractable likelihoods that can dramatically reduce the Monte Carlo variance of Bayesian estimators. We prove that our control variates are well-defined and provide a positive variance reduction. Furthermore, we show how to optimise these control variates for variance reduction. The methodology is highly parallel and offers a route to exploit multi-core processing architectures that complements recent research in this direction. Indeed, our work shows that it may not be necessary to parallelise the sampling process itself in order to harness the potential of massively multi-core architectures. Simulation results presented on the Ising model, exponential random graph models and non-linear stochastic differential equation models support our theoretical findings.

Article information

Source
Bayesian Anal. Volume 11, Number 1 (2016), 215-245.

Dates
First available in Project Euclid: 8 April 2015

Permanent link to this document
https://projecteuclid.org/euclid.ba/1428516724

Digital Object Identifier
doi:10.1214/15-BA948

Mathematical Reviews number (MathSciNet)
MR3447097

Keywords
control variates MCMC parallel computing zero variance

Citation

Friel, Nial; Mira, Antonietta; Oates, Chris J. Exploiting Multi-Core Architectures for Reduced-Variance Estimation with Intractable Likelihoods. Bayesian Anal. 11 (2016), no. 1, 215--245. doi:10.1214/15-BA948. https://projecteuclid.org/euclid.ba/1428516724


Export citation

References

  • Augustin, N., Mugglestone, M., and Buckland, S. (1996). “An autologistic model for spatial distribution of wildlife.” Journal of Applied Ecology, 33(2):339–347.
  • Alquier, P., Friel, N., Everitt, R., and Boland, A. (2014). “Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels.” arXiv:1403.5496.
  • Andradóttir, S., Heyman, D. P., and Teunis, J. O. (1993). “Variance reduction through smoothing and control variates for Markov Chain simulations.” ACM Transactions on Modeling and Computer Simulation (TOMACS), 3(3):167–189.
  • Andrieu, C., and Roberts, G. O. (2009). “The pseudo-marginal approach for efficient Monte Carlo computations.” The Annals of Statistics, 37(2):697–725.
  • Andrieu, C., Doucet, A., and Holenstein, R. (2010). “Particle Markov chain Monte Carlo (with Discussion).” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 72(3):269–342.
  • Angelino, E., Kohler, E., Waterland, A., Seltzer, M., and Adams, R. P. (2014). “Accelerating MCMC via Parallel Predictive Prefetching.” arXiv:1403.7265.
  • Ahn, S., Korattikara, A., and Welling, M. (2012). “Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring.” In: Proceedings of the 29th International Conference on Machine Learning, 1591–1598.
  • Armond, J., Saha, K., Rana, A. A., Oates, C. J., Jaenisch, R., Nicodemi, M., Mukherjee, S. (2014). “A stochastic model dissects cellular states and heterogeneity in transition processes”. Nature Scientific Reports, 4:3692.
  • Assaraf, R., and Caffarel, M. (1999), Zero-Variance Principle for Monte Carlo Algorithms. Physical Review Letters, 83(23):4682–4685.
  • Atchadé, Y, Fort, G., and Moulines, E. (2014). “On stochastic proximal gradient algorithms.” arXiv:1402.2365.
  • Bandyopadhyay, D., Reich, B. J., and Slate, E. (2009). “Bayesian Modeling of Multivariate Spatial Binary Data with applications to Dental Caries.” Statistics in Medicine, 28(28):3492–3508.
  • Bardenet, R., Doucet, A., and Holmes, C. (2014). “Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach.” In: Proceedings of the 31st International Conference on Machine Learning, 405–413.
  • Besag, J. E. (1972). “Nearest-neighbour systems and the auto-logistic model for binary data.” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 34(1):697–725.
  • Besag, J. E. (1974) “Spatial interaction and the statistical analysis of lattice systems (with discussion).” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 36(2):192–236.
  • Beskos, A., Papaspiliopoulos, O., Roberts, G. O., and Fearnhead, P. (2006). “Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion).” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 68(3):333–382.
  • Beskos, A., Kalogeropoulos, K., and Pazos, E. (2013). “Advanced MCMC methods for sampling on diffusion pathspace.” Stochastic Processes and Their Applications, 123(4):1415–1453.
  • Caimo, A., and Friel, N. (2011). “Bayesian inference for exponential random graph models.” Social Networks, 33:41–55.
  • Caimo, A., and Friel, N. (2013). “Bayesian model selection for exponential random graph models.” Social Networks, 35:11–24.
  • Caimo, A., and Friel, N. (2014). “Bergm: Bayesian inference for exponential random graphs using R.” Journal of Statistical Software, 61(2).
  • Caimo, A., and Mira, A. (2014). “Efficient computational strategies for Bayesian social networks.” Statistics and Computing, 25(1):113–125.
  • Calderhead, B. (2014). “A general construction for parallelizing Metropolis–Hastings algorithms.” Proceedings of the National Academy of Sciences, USE, 111(49):17408–17413.
  • Cappé, O., Moulines, E., and Ryden, T. (2005). “Inference in hidden Markov models.” Springer, New York.
  • Davison, A. C., Padoan, S. A., and Ribatet, M. (2009). “Statistical modelling of spatial extremes.” Statistical Science, 27:161–186.
  • Dellaportas, P., and Kontoyiannis, I. (2012). “Control variates for estimation based on reversible Markov chain Monte Carlo samplers.” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 74(1):133–161.
  • Doucet, A., Pitt, M., Deligiannidis, G., and Kohn, R. (2012). “Efficient implementation of Markov chain Monte Carlo when using an unbiased likelihood estimator.” arXiv:1210.1871.
  • Evans, M., and Swartz, T. (2000). “Approximating integrals via Monte Carlo and deterministic methods.” Oxford University Press.
  • Everitt, R. (2012). “Bayesian parameter estimation for latent Markov random fields and social networks.” Journal of Computational and graphical Statistics. 21(4):940–960.
  • Fahrmeir, L., and Lang, S. (2001). “Bayesian inference for generalized additive mixed models based on Markov random field priors.” Journal of the Royal Statistical Society, Series C (Applied Statistics), 50(2):201–220.
  • Friel, N., and Rue, H. (2007). “Recursive computing and simulation-free inference for general factorizable models.” Biometrika, 94:661–672.
  • Friel, N. (2013). “Estimating the evidence for Gibbs random fields.” Journal of Computational and Graphical Statistics, 22:518–532.
  • Friel, N., Mira, A., and Oates, Ch. J. (2015). “Supplementary Figures and Tables.” Bayesian Analysis.
  • Fuchs, C. (2013). Inference for Diffusion Processes with Applications in Life Sciences. Springer, Heidelberg.
  • Glasserman, P. (2004). Monte Carlo methods in financial engineering. Springer, New York.
  • Golightly, A., and Wilkinson, D. J. (2008). “Bayesian inference for nonlinear multivariate diffusion models observed with error.” Computational Statistics and Data Analysis, 52(3):1674–1693.
  • Geyer, C. J., and Thompson, E. A. (1992). “Constrained Monte Carlo maximum likelihood for dependent data (with discussion).” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 54(3):657–699.
  • Hammer, H., and Tjelmeland, H. (2008). “Control variates for the Metropolis–Hastings algorithm.” Scandinavian Journal of Statistics, 35(3):400–414.
  • He, F., Zhou, J., and Zhu, H. (2003). “Autologistic regression model for the distribution of vegetation.” Journal of Agricultural, Biological, and Environmental Statistics, 8(2):205–222.
  • Huffer, F. W., and Wu, H. (1998). “Markov Chain Monte Carlo for Autologistic Regression Models with Application to the Distribution of Plant Species.” Biometrics, 54:509–524.
  • Kendall, P. C., and Bourne, D. E. (1992). “Vector analysis and Cartesian tensors (3rd ed.).” CRC Press, Florida.
  • Korattikara, A., Chen, Y., and Welling, M. (2014). “Austerity in MCMC Land: Cutting the Metropolis–Hastings Budget.” In: Proceedings of the 31st International Conference on Machine Learning, 181–189.
  • Kou, S. C., Olding, B. P., Lysy, M., and Liu, J. S. (2012). “A multiresolution method for parameter estimation of diffusion processes.” Journal of the American Statistical Association, 107(500):1558–1574.
  • Lamberton, D., and Lapeyre, B. (2007). Introduction to stochastic calculus applied to finance. CRC Press.
  • Lee, A., Yau, C., Giles, M., Doucet, A., and Holmes, C. (2010). “On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods.” Journal of Computational and Graphical Statistics 19(4):769–789.
  • Lindgren, F., Rue, H., and Lindström, J. (2011). “An explicit link between Gaussian fields and Gaussian Markov random fields: the stochastic partial differential equation approach.” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 73(4):423–498.
  • Lyne, A. M., Girolami, M., Atchade, Y., Strathmann, H., and Simpson, D. (2013). “Playing Russian Roulette with Intractable Likelihoods.” arXiv:1306.4032.
  • Marjoram, P., Molitor, J., Plagnol, V., and Tavaré, S. (2003). “Markov chain Monte Carlo without likelihoods.” Proceedings of the National Academy of Sciences, U.S.A., 100:15324–15328.
  • Maclaurin, D., and Adams, R. P. (2014). “Firefly Monte Carlo: Exact MCMC with Subsets of Data.” In: Proceedings of the 30th Annual Conference on Uncertainty in Artificial Intelligence, 543–552.
  • Mira, A., Möller, J., and Roberts, G. O. (2001). “Perfect Slice Samplers.” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 63(3):593–606.
  • Mira, A., Tenconi, P., and Bressanini, D. (2003). “Variance reduction for MCMC.” Technical Report 2003/29, Universitá degli Studi dell’ Insubria, Italy.
  • Mira, A., Solgi, R., and Imparato, D. (2013). “Zero Variance Markov Chain Monte Carlo for Bayesian Estimators.” Statistics and Computing 23(5):653–662.
  • Møller, J., Pettitt, A. N., Reeves, R, and Berthelsen, K. K. (2006). “An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants.” Biometrika, 93:451–458.
  • Murray, I., Ghahramani, Z., and MacKay, D. (2006). “MCMC for doubly-intractable distributions.” In: Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence, 359–366.
  • Nemeth, C., Sherlock, C., and Fearnhead, P. (2014). “Particle Metropolis adjusted Langevin algorithms.” arXiv:1412.7299.
  • Oates, C. J., Girolami, M., and Chopin, N. (2014). “Control functionals for Monte Carlo integration.” CRiSM Working Paper, The University of Warwick, 14:22.
  • Oates, C. J., Papamarkou, T., and Girolami, M. (2015). “The Controlled Thermodynamic Integral for Bayesian Model Comparison.” Journal of the American Statistical Association, to appear.
  • Øksendal, B. (2003). Stochastic differential equations. Springer-Verlag, Berlin.
  • Papamarkou, T., Mira, A., and Girolami, M. (2014). “Zero Variance Differential Geometric Markov Chain Monte Carlo Algorithms.” Bayesian Analysis, 9(1):97–128.
  • Papamarkou, T., Mira, A., and Girolami, M. (2015). “Hamiltonian Methods and Zero-Variance Principle.” In: Current Trends in Bayesian Methodology with Applications (eds. Dipak K. Dey, Umesh Singh and A. Loganathan), Chapman and Hall/CRC Press.
  • Pillai, N. S., and Smith, A. (2014) “Ergodicity of Approximate MCMC Chains with Applications to Large Data Sets.” arXiv:1405.0182.
  • Potamianos, G., and Goutsias, J. (1997). “Stochastic approximation algorithms for partition function estimation of Gibbs random fields.” IEEE Transactions on Information Theory, 43(6):1948–1965.
  • Propp, J. G., and Wilson, D. B. (1996). “Exact sampling with coupled Markov chains and applications to statistical mechanics.” Random Structures and Algorithms, 9(1):223–252.
  • Read, K. E. (1954). “Cultures of the Central Highlands, New Guinea.” Southwestern Journal of Anthropology 10(1):1–43.
  • Robins, G., Pattison, P., Kalish, Y., and Lusher, D. (2007). “An introduction to exponential random graph models for social networks.” Social Networks, 29:173–191.
  • Rubinstein, R. Y., and Marcus, R. (1985). “Efficiency of Multivariate Control Variates in Monte Carlo Simulation.” Operations Research, 33(3):661–677.
  • Rue, H., Martino, S., and Chopin, N. (2009). “Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations (with discussion).” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 71(2):319–392.
  • Sherlock, C., Thiery, A., Roberts, G. O., and Rosenthal, J. S. (2014). “On the efficiency of pseudo-marginal random walk Metropolis algorithm.” The Annals of Statistics, 43(1), 238–275.
  • Suchard, M., Wang, Q., Chan, C., Frelinger, J., Cron, A., and West, M. (2010). “Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures.” Journal of Computational and Graphical Statistics 19(2):419–438.
  • Welling, M., and Teh, Y. W. (2011). “Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In: Proceedings of the 28th International Conference on Machine Learning, 681–688.
  • West, M., and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models (2nd ed.). Springer-Verlag, New York.
  • Wilkinson, D. J. (2011). Stochastic Modelling for Systems Biology. CRC Press.

Supplemental materials