Bayesian Analysis

Merging MCMC Subposteriors through Gaussian-Process Approximations

Christopher Nemeth and Chris Sherlock

Full-text: Open access

Abstract

Markov chain Monte Carlo (MCMC) algorithms have become powerful tools for Bayesian inference. However, they do not scale well to large-data problems. Divide-and-conquer strategies, which split the data into batches and, for each batch, run independent MCMC algorithms targeting the corresponding subposterior, can spread the computational burden across a number of separate computer cores. The challenge with such strategies is in recombining the subposteriors to approximate the full posterior. By creating a Gaussian-process approximation for each log-subposterior density we create a tractable approximation for the full posterior. This approximation is exploited through three methodologies: firstly a Hamiltonian Monte Carlo algorithm targeting the expectation of the posterior density provides a sample from an approximation to the posterior; secondly, evaluating the true posterior at the sampled points leads to an importance sampler that, asymptotically, targets the true posterior expectations; finally, an alternative importance sampler uses the full Gaussian-process distribution of the approximation to the log-posterior density to re-weight any initial sample and provide both an estimate of the posterior expectation and a measure of the uncertainty in it.

Article information

Source
Bayesian Anal., Volume 13, Number 2 (2018), 507-530.

Dates
First available in Project Euclid: 9 August 2017

Permanent link to this document
https://projecteuclid.org/euclid.ba/1502265628

Digital Object Identifier
doi:10.1214/17-BA1063

Mathematical Reviews number (MathSciNet)
MR3780433

Zentralblatt MATH identifier
06989958

Keywords
big data Markov chain Monte Carlo Gaussian processes distributed importance sampling

Rights
Creative Commons Attribution 4.0 International License.

Citation

Nemeth, Christopher; Sherlock, Chris. Merging MCMC Subposteriors through Gaussian-Process Approximations. Bayesian Anal. 13 (2018), no. 2, 507--530. doi:10.1214/17-BA1063. https://projecteuclid.org/euclid.ba/1502265628


Export citation

References

  • Andrieu, C. and Thoms, J. (2008). “A tutorial on adaptive MCMC.” Statistics and Computing, 18(4): 343–373.
  • Bardenet, R., Doucet, A., and Holmes, C. (2014). “Towards scaling up Markov chain Monte Carlo: an adaptive subsampling approach.” Proceedings of The 31st International Conference on Machine Learning, (4): 405–413.
  • Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J.-M., and Stuart, A. (2013). “Optimal tuning of the hybrid Monte Carlo algorithm.” Bernoulli, 19(5A): 1501–1534.
  • Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M. A., Guo, J., Li, P., and Riddell, A. (2016). “Stan: A probabilistic programming language.” Journal of Statistical Software, 20: 1–37.
  • Chen, T., Fox, E. B., and Guestrin, C. (2014). “Stochastic Gradient Hamiltonian Monte Carlo.” In Proceedings of the 31st International Conference on Machine Learning, volume 32(2), 1683–1691.
  • Csató, L. and Opper, M. (2002). “Sparse Online Gaussian Processes.” Neural Computation, 14(2): 641–669.
  • Duvenaud, D. (2014). “Automatic model construction with Gaussian processes.” Ph.D. thesis, University of Cambridge.
  • Fairbrother, J., Nemeth, C., and Rischard, M. (2017). “GaussianProcesses.jl: A Nonparametric Bayes package for the Julia Language (preprint).”
  • Geweke, J. (1989). “Bayesian inference in econometric models using Monte Carlo integration.” Econometrica: Journal of the Econometric Society, 57(6): 1317–1339.
  • Girolami, M. and Calderhead, B. (2011). “Riemann manifold Langevin and Hamiltonian Monte Carlo methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2): 123–214.
  • Hjort, N. L. and Glad, I. K. (1995). “Nonparametric Density Estimation with a Parametric Start.” The Annals of Statistics, 23(3): 882–904.
  • Hoffman, M. and Gelman, A. (2014). “The No-U-Turn Sampler: Adaptively Setting Path Lengths in Hamiltonian Monte Carlo.” Journal of Machine Learning Research, 15(2008): 30.
  • Huang, Z. and Gelman, A. (2005). “Sampling for Bayesian Computation with Large Datasets.”
  • Liu, H., Lafferty, J., and Wasserman, L. (2007). “Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo.” Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics (AISTATS-07), 2: 283–290.
  • Maclaurin, D. and Adams, R. P. (2014). “Firefly Monte Carlo: Exact MCMC with Subsets of Data.” In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, 543–552.
  • Neal, R. M. (2010). “MCMC Using Hamiltonian Dynamics.” In Handbook of Markov Chain Monte Carlo (Chapman & Hall/CRC Handbooks of Modern Statistical Methods), 113–162.
  • Neiswanger, W., Wang, C., and Xing, E. (2014). “Asymptotically Exact, Embarrassingly Parallel MCMC.” In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, 623–632.
  • Nemeth, C. J. and Sherlock, C. (2017). “Supplement for “Merging MCMC Subposteriors through Gaussian-Process Approximations”.” Bayesian Analysis.
  • O’Hagan, A. (1978). “Curve Fitting and Optimal Design for Prediction.” Journal of the Royal Statistical Society, Series B, 40(1): 1–42.
  • Quiñonero-Candela, J., Rasmussen, C. E., and Herbrich, R. (2005). “A unifying view of sparse approximate Gaussian process regression.” Journal of Machine Learning Research, 6: 1935–1959.
  • Quiroz, M., Villani, M., and Kohn, R. (2014). “Speeding up MCMC by efficient data subsampling.” arXiv preprint arXiv:1404:4178v1, (Mcmc): 1–37.
  • Rasmussen, C. and Williams, C. (2006). Gaussian Processes for Machine Learning. MIT Press.
  • Robert, C. and Casella, G. (1999). Monte Carlo Statistical Methods. Springer-Verlag, New York, Inc.
  • Roberts, G. O., Gelman, A., and Gilks, W. (1997). “Weak Convergence and Optimal Scaling of the Random Walk Metropolis Algorithms.” The Annals of Applied Probability, 7(1): 110–120.
  • Roberts, G. O. and Rosenthal, J. S. (1998). “Optimal scaling of discrete approximations to Langevin diffusions.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(1): 255–268.
  • Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., and McCulloch, R. E. (2016). “Bayes and big data: The consensus Monte Carlo algorithm.” International Journal of Management Science and Engineering Management, 11(2): 78–88.
  • Seeger, M., Williams, C., and Lawrence, N. (2003). “Fast forward selection to speed up sparse Gaussian process regression.” In 9th International Workshop on Artificial Intelligence and Statistics., 2003.
  • Snelson, E. and Ghahramani, Z. (2006). “Sparse Gaussian processes using pseudo-inputs.” Neural Information Processing Systems 18.
  • Tierney, L. (1996). “Introduction to general state-space Markov chain theory.” In Gilks, W., Richardson, S., and Spiegelhalter, D. (eds.), Markov Chain Monte Carlo in Practice, 59–74. New York: Chapman and Hall.
  • Titsias, M. K. (2009). “Variational learning of inducing variables in sparse Gaussian processes.” In International Conference on Artificial Intelligence and Statistics, 567–574.
  • Wang, X. and Dunson, D. B. (2013). “Parallelizing MCMC via Weierstrass Sampler.” Arxiv preprint arXiv:1312.4605.
  • Wang, X., Guo, F., Heller, K. A., and Dunson, D. B. (2015). “Parallelizing MCMC with random partition trees.” In Advances in Neural Information Processing Systems, 451–459.
  • Wang, Z., Mohamed, S., and Freitas, N. (2013). “Adaptive Hamiltonian and Riemann manifold Monte Carlo samplers.” In International Conference on Machine Learning, 1462–1470.
  • Welling, M. and Teh, Y. W. (2011). “Bayesian learning via stochastic gradient Langevin dynamics.” In Proceedings of the 28th International Conference on Machine Learning (ICML), 681–688.
  • Whye Teh, Y., Thiéry, A., and Vollmer, S. (2016). “Consistency and fluctuations for stochastic gradient Langevin dynamics.” Journal of Machine Learning Research, 17(7): 1–33.
  • Wilkinson, D. J. (2005). “Parallel Bayesian Computation.” Handbook of Parallel Computing and Statistics, 477–509.

Supplemental materials