Bernoulli

  • Bernoulli
  • Volume 23, Number 4A (2017), 2257-2298.

The geometric foundations of Hamiltonian Monte Carlo

Michael Betancourt, Simon Byrne, Sam Livingstone, and Mark Girolami

Full-text: Open access

Abstract

Although Hamiltonian Monte Carlo has proven an empirical success, the lack of a rigorous theoretical understanding of the algorithm has in many ways impeded both principled developments of the method and use of the algorithm in practice. In this paper, we develop the formal foundations of the algorithm through the construction of measures on smooth manifolds, and demonstrate how the theory naturally identifies efficient implementations and motivates promising generalizations.

Article information

Source
Bernoulli Volume 23, Number 4A (2017), 2257-2298.

Dates
Received: May 2015
First available in Project Euclid: 9 May 2017

Permanent link to this document
https://projecteuclid.org/euclid.bj/1494316818

Digital Object Identifier
doi:10.3150/16-BEJ810

Keywords
differential geometry disintegration fiber bundle Hamiltonian Monte Carlo Markov chain Monte Carlo Riemannian geometry symplectic geometry smooth manifold

Citation

Betancourt, Michael; Byrne, Simon; Livingstone, Sam; Girolami, Mark. The geometric foundations of Hamiltonian Monte Carlo. Bernoulli 23 (2017), no. 4A, 2257--2298. doi:10.3150/16-BEJ810. https://projecteuclid.org/euclid.bj/1494316818


Export citation

References

  • [1] Amari, S. and Nagaoka, H. (2007). Methods of Information Geometry. Providence: American Mathematical Soc.
  • [2] Baez, J. and Muniain, J.P. (1994). Gauge Fields, Knots and Gravity. Series on Knots and Everything 4. River Edge, NJ: World Scientific.
  • [3] Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J.-M. and Stuart, A. (2013). Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli 19 1501–1534.
  • [4] Beskos, A., Pinski, F.J., Sanz-Serna, J.M. and Stuart, A.M. (2011). Hybrid Monte Carlo on Hilbert spaces. Stochastic Process. Appl. 121 2201–2230.
  • [5] Betancourt, M. (2013). Generalizing the No-U-Turn sampler to Riemannian manifolds. Preprint. Available at arXiv:1304.1920.
  • [6] Betancourt, M. (2013). A general metric for Riemannian manifold Hamiltonian Monte Carlo. In Geometric Science of Information (F. Nielsen and F. Barbaresco, eds.). Lecture Notes in Computer Science 8085 327–334. Heidelberg: Springer.
  • [7] Betancourt, M. (2014). Adiabatic Monte Carlo. Preprint. Available at arXiv:1405.3489.
  • [8] Betancourt, M. and Stein, L.C. (2011). The geometry of Hamiltonian Monte Carlo. Preprint. Available at arXiv:1410.5110.
  • [9] Burrage, K., Lenane, I. and Lythe, G. (2007). Numerical methods for second-order stochastic differential equations. SIAM J. Sci. Comput. 29 245–264 (electronic).
  • [10] Byrne, S. and Girolami, M. (2013). Geodesic Monte Carlo on embedded manifolds. Scand. J. Stat. 40 825–845.
  • [11] Caflisch, R.E. (1998). Monte Carlo and quasi-Monte Carlo methods. In Acta Numerica. Acta Numer. 7 1–49. Cambridge: Cambridge Univ. Press.
  • [12] Cannas da Silva, A. (2001). Lectures on Symplectic Geometry. Lecture Notes in Math. 1764. Berlin: Springer.
  • [13] Censor, A. and Grandini, D. (2014). Borel and continuous systems of measures. Rocky Mountain J. Math. 44 1073–1110.
  • [14] Chang, J.T. and Pollard, D. (1997). Conditioning as disintegration. Stat. Neerl. 51 287–317.
  • [15] Cotter, S.L., Roberts, G.O., Stuart, A.M. and White, D. (2013). MCMC methods for functions: Modifying old algorithms to make them faster. Statist. Sci. 28 424–446.
  • [16] Cowling, B.J., Freeman, G., Wong, J.Y., Wu, P., Liao, Q., Lau, E.H., Wu, J.T., Fielding, R. and Leung, G.M. (2012). Preliminary inferences on the age-specific seriousness of human disease caused by Avian influenza A (H7N9) infections in China, March to April 2013. European Communicable Disease Bulletin 18.
  • [17] Dawid, A.P., Stone, M. and Zidek, J.V. (1973). Marginalization paradoxes in Bayesian and structural inference. J. Roy. Statist. Soc. Ser. B 35 189–233.
  • [18] Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Rev. 41 45–76.
  • [19] Draganescu, A., Lehoucq, R. and Tupper, P. (2009). Hamiltonian molecular dynamics for computational mechanicians and numerical analysts. Technical Report No. 2008–6512. Sandia National Laboratories.
  • [20] Duane, S., Kennedy, A.D., Pendleton, B.J. and Roweth, D. (1987). Hybrid Monte Carlo. Phys. Lett. B 195 216–222.
  • [21] Fang, Y., Sanz-Serna, J.-M. and Skeel, R.D. (2014). Compressible generalized hybrid Monte Carlo. J. Chem. Phys. 140 174108.
  • [22] Federer, H. (1969). Geometric Measure Theory. New York: Springer.
  • [23] Fixman, M. (1978). Simulation of polymer dynamics. I. General theory. J. Chem. Phys. 69 1527–1537.
  • [24] Folland, G.B. (1999). Real Analysis: Modern Techniques and Their Applications, 2nd ed. Pure and Applied Mathematics (New York). New York: Wiley.
  • [25] Frenkel, D. and Smit, B. (2001). Understanding Molecular Simulation: From Algorithms to Applications. San Diego: Academic Press.
  • [26] Gelfand, A.E. and Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409.
  • [27] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6 721–741.
  • [28] Ghitza, Y. and Gelman, A. (2014). The great society, Reagan’s revolution, and generations of presidential voting. Unpublished manuscript.
  • [29] Girolami, M. and Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 123–214.
  • [30] Haario, H., Saksman, E. and Tamminen, J. (2001). An adaptive Metropolis algorithm. Bernoulli 7 223–242.
  • [31] Haile, J.M. (1992). Molecular Dynamics Simulation: Elementary Methods. New York: Wiley.
  • [32] Hairer, E., Lubich, C. and Wanner, G. (2006). Geometric Numerical Integration: Structure-Preserving Algorithms for Ordinary Differential Equations, 2nd ed. Springer Series in Computational Mathematics 31. Berlin: Springer.
  • [33] Halmos, P.R. (1950). Measure Theory. New York: D. Van Nostrand Company.
  • [34] Hoffman, M.D. and Gelman, A. (2014). The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo. J. Mach. Learn. Res. 15 1593–1623.
  • [35] Holmes, S., Rubinstein-Salzedo, S. and Seiler, C. (2014). Curvature and concentration of Hamiltonian Monte Carlo in high dimensions. Preprint. Available at arXiv:1407.1114.
  • [36] Horowitz, A.M. (1991). A generalized guided Monte Carlo algorithm. Phys. Lett. B 268 247–252.
  • [37] Husain, S., Vasishth, S. and Srinivasan, N. (2014). Strong expectations cancel locality effects: Evidence from Hindi. PLoS ONE 9 e100986.
  • [38] Izaguirre, J.A. and Hampton, S.S. (2004). Shadow hybrid Monte Carlo: An efficient propagator in phase space of macromolecules. J. Comput. Phys. 200 581–604.
  • [39] Jasche, J., Kitaura, F.S., Li, C. and Ensslin, T.A. (2010). Bayesian non-linear large-scale structure inference of the Sloan digital sky survey data Release 7. Mon. Not. R. Astron. Soc. 409 355–370.
  • [40] José, J.V. and Saletan, E.J. (1998). Classical Dynamics: A Contemporary Approach. Cambridge: Cambridge Univ. Press.
  • [41] Joulin, A. and Ollivier, Y. (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38 2418–2442.
  • [42] Kardar, M. (2007). Statistical Physics of Particles. Cambridge: Cambridge Univ. Press.
  • [43] Lan, S., Stathopoulos, V., Shahbaba, B. and Girolami, M. (2012). Lagrangian Dynamical Monte Carlo. Preprint. Available at arXiv:1211.3759.
  • [44] Leao, D. Jr., Fragoso, M. and Ruffino, P. (2004). Regular conditional probability, disintegration of probability and Radon spaces. Proyecciones 23 15–29.
  • [45] Lee, J.M. (2011). Introduction to Topological Manifolds, 2nd ed. Graduate Texts in Mathematics 202. New York: Springer.
  • [46] Lee, J.M. (2013). Introduction to Smooth Manifolds, 2nd ed. Graduate Texts in Mathematics 218. New York: Springer.
  • [47] Leimkuhler, B. and Reich, S. (2004). Simulating Hamiltonian Dynamics. Cambridge: Cambridge Univ. Press.
  • [48] Livingstone, S. and Girolami, M. (2014). Information-geometric Markov chain Monte Carlo methods using diffusions. Entropy 16 3074–3102.
  • [49] Marx, D. and Hutter, J. (2009). Ab Initio Molecular Dynamics: Basic Theory and Advanced Methods. Cambridge: Cambridge Univ. Press.
  • [50] Meyn, S. and Tweedie, R.L. (2009). Markov Chains and Stochastic Stability, 2nd ed. Cambridge: Cambridge Univ. Press.
  • [51] Murray, I. and Elliott, L.T. (2012). Driving Markov chain Monte Carlo with a dependent random stream. Unpublished manuscript.
  • [52] Neal, R.M. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo (S. Brooks, A. Gelman, G.L. Jones and X.-L. Meng, eds.) New York: CRC Press.
  • [53] Neal, R.M. (2012). How to view an MCMC simulation as a permutation, with applications to parallel simulation and improved importance sampling. Preprint. Available at arXiv:1205.0070.
  • [54] Øksendal, B. (2003). Stochastic Differential Equations, 6th ed. Universitext. Berlin: Springer.
  • [55] Ollivier, Y. (2009). Ricci curvature of Markov chains on metric spaces. J. Funct. Anal. 256 810–864.
  • [56] Petersen, K. (1989). Ergodic Theory. Cambridge: Cambridge Univ. Press.
  • [57] Polettini, M. (2013). Generally covariant state-dependent diffusion. J. Stat. Mech. Theory Exp. 2013 P07005.
  • [58] Porter, E.K. and Carré, J. (2014). A Hamiltonian Monte-Carlo method for Bayesian inference of supermassive black hole binaries. Classical Quantum Gravity 31 145004, 22.
  • [59] Robert, C.P. and Casella, G. (1999). Monte Carlo Statistical Methods. New York: Springer.
  • [60] Roberts, G.O., Gelman, A. and Gilks, W.R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7 110–120.
  • [61] Roberts, G.O. and Rosenthal, J.S. (2004). General state space Markov chains and MCMC algorithms. Probab. Surv. 1 20–71.
  • [62] Sanders, N., Betancourt, M. and Soderberg, A. (2014). Unsupervised transient light curve analysis via hierarchical Bayesian inference. Astrophysics Journal 800 no. 1, 36. The full reference including BibTex can be found at http://iopscience.iop.org/article/10.1088/0004-637X/800/1/36/meta.
  • [63] Schofield, M.R., Barker, R.J., Gelman, A., Cook, E.R. and Briffa, K. (2014). Climate reconstruction using tree-ring data. J. Amer. Statist. Assoc. To appear. Available at arXiv:1510.02557.
  • [64] Schutz, B. (1980). Geometrical Methods of Mathematical Physics. New York: Cambridge Univ. Press.
  • [65] Simmons, D. (2012). Conditional measures and conditional expectation; Rohlin’s disintegration theorem. Discrete Contin. Dyn. Syst. 32 2565–2582.
  • [66] Sohl-Dickstein, J., Mudigonda, M. and DeWeese, M. (2014). Hamiltonian Monte Carlo without detailed balance. In Proceedings of the 31st International Conference on Machine Learning. 719–726. Beijing, China.
  • [67] Souriau, J.-M. (1997). Structure of Dynamical Systems: A Symplectic View of Physics. Boston, MA: Birkhäuser.
  • [68] Stan Development Team (2014). Stan: A C++ Library for Probability and Sampling, Version 2.5.
  • [69] Sutherland, D.J., Póczos, B. and Schneider, J. (2013). Active learning and search on low-rank matrices. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13 212–220. New York: ACM.
  • [70] Tang, Y., Srivastava, N. and Salakhutdinov, R. (2013). Learning generative models with visual attention. Preprint. Available at arXiv:1312.6110.
  • [71] Terada, R., Inoue, S. and Nishihara, G.N. (2013). The effect of light and temperature on the growth and photosynthesis of Gracilariopsis chorda (Gracilariales, Rhodophtya) from geographically separated locations of Japanowth and photosynthesis of Gracilariopsis chorda (Gracilariales, Rhodophtya) from geographically separated locations of Japan. J. Appl. Phycol. 25 1863–1872.
  • [72] Tierney, L. (1998). A note on Metropolis–Hastings kernels for general state spaces. Ann. Appl. Probab. 8 1–9.
  • [73] Tjur, T. (1980). Probability Based on Radon Measures. Chichester: Wiley.
  • [74] Wang, H., Mo, H.J., Yang, X., Jing, Y.P. and Lin, W.P. (2014). Exploring the local universe with reconstructed initial density field I: Hamiltonian Markov chain Monte Carlo method with particle mesh dynamics. Preprint. Available at arXiv:1407.3451.
  • [75] Wang, Z., Mohamed, S. and de Freitas, N. (2013). Adaptive Hamiltonian and Riemann Manifold Monte Carlo. In Proceedings of the 30th International Conference on Machine Learning (ICML-13) 1462–1470. Atlanta, Georgia, USA.
  • [76] Weber, S., Carpenter, B., Lee, D., Bois, F.Y., Gelman, A. and Racine, A. (2014). Bayesian drug disease model with Stan-Using published longitudinal data summaries in population models. In Annual Meeting of the Population Approach Group in Europe 2014, Vol. 23. Alicante, Spain.
  • [77] Weinstein, A. (1983). The local structure of Poisson manifolds. J. Differential Geom. 18 523–557.
  • [78] Zaslavsky, G.M. (2005). Hamiltonian Chaos and Fractional Dynamics. Oxford: Oxford Univ. Press.