Bayesian Analysis

Merging MCMC Subposteriors through Gaussian-Process Approximations

Christopher Nemeth and Chris Sherlock

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

Markov chain Monte Carlo (MCMC) algorithms have become powerful tools for Bayesian inference. However, they do not scale well to large-data problems. Divide-and-conquer strategies, which split the data into batches and, for each batch, run independent MCMC algorithms targeting the corresponding subposterior, can spread the computational burden across a number of separate computer cores. The challenge with such strategies is in recombining the subposteriors to approximate the full posterior. By creating a Gaussian-process approximation for each log-subposterior density we create a tractable approximation for the full posterior. This approximation is exploited through three methodologies: firstly a Hamiltonian Monte Carlo algorithm targeting the expectation of the posterior density provides a sample from an approximation to the posterior; secondly, evaluating the true posterior at the sampled points leads to an importance sampler that, asymptotically, targets the true posterior expectations; finally, an alternative importance sampler uses the full Gaussian-process distribution of the approximation to the log-posterior density to re-weight any initial sample and provide both an estimate of the posterior expectation and a measure of the uncertainty in it.

Article information

Source
Bayesian Anal. (2017), 24 pages.

Dates
First available in Project Euclid: 9 August 2017

Permanent link to this document
https://projecteuclid.org/euclid.ba/1502265628

Digital Object Identifier
doi:10.1214/17-BA1063

Keywords
big data Markov chain Monte Carlo Gaussian processes distributed importance sampling

Rights
Creative Commons Attribution 4.0 International License.

Citation

Nemeth, Christopher; Sherlock, Chris. Merging MCMC Subposteriors through Gaussian-Process Approximations. Bayesian Anal., advance publication, 9 August 2017. doi:10.1214/17-BA1063. https://projecteuclid.org/euclid.ba/1502265628


Export citation

References

  • Andrieu, C. and Thoms, J. (2008). “A tutorial on adaptive MCMC.”Statistics and Computing, 18(4): 343–373.
  • Bardenet, R., Doucet, A., and Holmes, C. (2014). “Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach.”Proceedings of The 31st International Conference on Machine Learning, (4): 405–413.
  • Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J.-M., and Stuart, A. (2013). “Optimal tuning of the hybrid Monte Carlo algorithm.”Bernoulli, 19(5A): 1501–1534.
  • Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., and Riddell, A. (2016). “Stan: A probabilistic programming language.”Journal of Statistical Software, 20: 1–37.
  • Chen, T., Fox, E. B., and Guestrin, C. (2014). “Stochastic Gradient Hamiltonian Monte Carlo.” InProceedings of the 31st International Conference on Machine Learning, volume 32(2), 1683–1691.
  • Csató, L. and Opper, M. (2002). “Sparse Online Gaussian Processes.”Neural Computation, 14(2): 641–669.
  • Duvenaud, D. (2014). “Automatic model construction with Gaussian processes.” Ph.D. thesis, University of Cambridge.
  • Fairbrother, J., Nemeth, C., and Rischard, M. (2017). “GaussianProcesses.jl: A Nonparametric Bayes package for the Julia Language (preprint).”
  • Geweke, J. (1989). “Bayesian inference in econometric models using Monte Carlo integration.”Econometrica: Journal of the Econometric Society, 57(6): 1317–1339.
  • Girolami, M. and Calderhead, B. (2011). “Riemann manifold Langevin and Hamiltonian Monte Carlo methods.”Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2): 123–214.
  • Hjort, N. L. and Glad, I. K. (1995). “Nonparametric Density Estimation with a Parametric Start.”The Annals of Statistics, 23(3): 882–904.
  • Hoffman, M. and Gelman, A. (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.”Journal of Machine Learning Research, 15(2008): 30.
  • Huang, Z. and Gelman, A. (2005). “Sampling for Bayesian Computation with Large Datasets.”
  • Liu, H., Lafferty, J., and Wasserman, L. (2007). “Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo.”Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS-07), 2: 283–290.
  • Maclaurin, D. and Adams, R. P. (2014). “Firefly Monte Carlo: Exact MCMC with Subsets of Data.” InProceedings of the 30th Conference on Uncertainty in Artificial Intelligence, 543–552.
  • Neal, R. M. (2010). “MCMC Using Hamiltonian Dynamics.” InHandbook of Markov Chain Monte Carlo (Chapman & Hall/CRC Handbooks of Modern Statistical Methods), 113–162.
  • Neiswanger, W., Wang, C., and Xing, E. (2014). “Asymptotically Exact, Embarrassingly Parallel MCMC.” InProceedings of the 30th Conference on Uncertainty in Artificial Intelligence, 623–632.
  • Nemeth, C. J. and Sherlock, C. (2017). “Supplement for “Merging MCMC Subposteriors through Gaussian-Process Approximations”.”Bayesian Analysis.
  • O’Hagan, A. (1978). “Curve Fitting and Optimal Design for Prediction.”Journal of the Royal Statistical Society, Series B, 40(1): 1–42.
  • Quiñonero-Candela, J., Rasmussen, C. E., and Herbrich, R. (2005). “A unifying view of sparse approximate Gaussian process regression.”Journal of Machine Learning Research, 6: 1935–1959.
  • Quiroz, M., Villani, M., and Kohn, R. (2014). “Speeding up MCMC by efficient data subsampling.”arXiv preprint arXiv:1404:4178v1, (Mcmc): 1–37.
  • Rasmussen, C. and Williams, C. (2006).Gaussian Processes for Machine Learning. MIT Press.
  • Robert, C. and Casella, G. (1999).Monte Carlo Statistical Methods. Springer-Verlag, New York, Inc.
  • Roberts, G. O., Gelman, A., and Gilks, W. (1997). “Weak Convergence and Optimal Scaling of the Random Walk Metropolis Algorithms.”The Annals of Applied Probability, 7(1): 110–120.
  • Roberts, G. O. and Rosenthal, J. S. (1998). “Optimal scaling of discrete approximations to Langevin diffusions.”Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1): 255–268.
  • Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., and McCulloch, R. E. (2016). “Bayes and big data: The consensus Monte Carlo algorithm.”International Journal of Management Science and Engineering Management, 11(2): 78–88.
  • Seeger, M., Williams, C., and Lawrence, N. (2003). “Fast forward selection to speed up sparse Gaussian process regression.” In9th International Workshop on Artificial Intelligence and Statistics., 2003.
  • Snelson, E. and Ghahramani, Z. (2006). “Sparse Gaussian processes using pseudo-inputs.”Neural Information Processing Systems 18.
  • Tierney, L. (1996). “Introduction to general state-space Markov chain theory.” In Gilks, W., Richardson, S., and Spiegelhalter, D. (eds.),Markov Chain Monte Carlo in Practice, 59–74. New York: Chapman and Hall.
  • Titsias, M. K. (2009). “Variational learning of inducing variables in sparse Gaussian processes.” InInternational Conference on Artificial Intelligence and Statistics, 567–574.
  • Wang, X. and Dunson, D. B. (2013). “Parallelizing MCMC via Weierstrass Sampler.”Arxiv preprint arXiv:1312.4605.
  • Wang, X., Guo, F., Heller, K. A., and Dunson, D. B. (2015). “Parallelizing MCMC with random partition trees.” InAdvances in Neural Information Processing Systems, 451–459.
  • Wang, Z., Mohamed, S., and Freitas, N. (2013). “Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers.” InInternational Conference on Machine Learning, 1462–1470.
  • Welling, M. and Teh, Y. W. (2011). “Bayesian learning via stochastic gradient Langevin dynamics.” InProceedings of the 28th International Conference on Machine Learning (ICML), 681–688.
  • Whye Teh, Y., Thiéry, A., and Vollmer, S. (2016). “Consistency and fluctuations for stochastic gradient Langevin dynamics.”Journal of Machine Learning Research, 17(7): 1–33.
  • Wilkinson, D. J. (2005). “Parallel Bayesian Computation.”Handbook of Parallel Computing and Statistics, 477–509.

Supplemental materials