Statistical Science

MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster

S. L. Cotter, G. O. Roberts, A. M. Stuart, and D. White

Full-text: Open access

Abstract

Many problems arising in applications result in the need to probe a probability distribution for functions. Examples include Bayesian nonparametric statistics and conditioned diffusion processes. Standard MCMC algorithms typically become arbitrarily slow under the mesh refinement dictated by nonparametric description of the unknown function. We describe an approach to modifying a whole range of MCMC methods, applicable whenever the target measure has density with respect to a Gaussian process or Gaussian random field reference measure, which ensures that their speed of convergence is robust under mesh refinement.

Gaussian processes or random fields are fields whose marginal distributions, when evaluated at any finite set of $N$ points, are $\mathbb{R}^{N}$-valued Gaussians. The algorithmic approach that we describe is applicable not only when the desired probability measure has density with respect to a Gaussian process or Gaussian random field reference measure, but also to some useful non-Gaussian reference measures constructed through random truncation. In the applications of interest the data is often sparse and the prior specification is an essential part of the overall modelling strategy. These Gaussian-based reference measures are a very flexible modelling tool, finding wide-ranging application. Examples are shown in density estimation, data assimilation in fluid mechanics, subsurface geophysics and image registration.

The key design principle is to formulate the MCMC method so that it is, in principle, applicable for functions; this may be achieved by use of proposals based on carefully chosen time-discretizations of stochastic dynamical systems which exactly preserve the Gaussian reference measure. Taking this approach leads to many new algorithms which can be implemented via minor modification of existing algorithms, yet which show enormous speed-up on a wide range of applied problems.

Article information

Source
Statist. Sci. Volume 28, Number 3 (2013), 424-446.

Dates
First available in Project Euclid: 28 August 2013

Permanent link to this document
https://projecteuclid.org/euclid.ss/1377696944

Digital Object Identifier
doi:10.1214/13-STS421

Mathematical Reviews number (MathSciNet)
MR3135540

Zentralblatt MATH identifier
1297.62217

Keywords
MCMC Bayesian nonparametrics algorithms Gaussian random field Bayesian inverse problems

Citation

Cotter, S. L.; Roberts, G. O.; Stuart, A. M.; White, D. MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster. Statist. Sci. 28 (2013), no. 3, 424--446. doi:10.1214/13-STS421. https://projecteuclid.org/euclid.ss/1377696944.


Export citation

References

  • [1] Adams, R. P., Murray, I. and Mackay, D. J. C. (2009). The Gaussian process density sampler. In Advances in Neural Information Processing Systems 21.
  • [2] Adler, R. J. (2010). The Geometry of Random Fields. SIAM, Philadeliphia, PA.
  • [3] Bennett, A. F. (2002). Inverse Modeling of the Ocean and Atmosphere. Cambridge Univ. Press, Cambridge.
  • [4] Beskos, A., Roberts, G., Stuart, A. and Voss, J. (2008). MCMC methods for diffusion bridges. Stoch. Dyn. 8 319–350.
  • [5] Beskos, A., Pinski, F. J., Sanz-Serna, J. M. and Stuart, A. M. (2011). Hybrid Monte Carlo on Hilbert spaces. Stochastic Process. Appl. 121 2201–2230.
  • [6] Beskos, A., Pillai, N. S., Roberts, G. O., Sanz-Serna, J. M. and Stuart, A. M. (2013). Optimal tuning of hybrid Monte Carlo. Bernoulli. To appear. Available at http://arxiv.org/abs/1001.4460.
  • [7] Cotter, C. J. (2008). The variational particle-mesh method for matching curves. J. Phys. A 41 344003, 18.
  • [8] Cotter, S. L. (2010). Applications of MCMC methods on function spaces. Ph.D. thesis, Univ. Warwick.
  • [9] Cotter, C. J., Cotter, S. L. and Vialard, F. X. (2013). Bayesian data assimilation in shape registration. Inverse Problems 29 045011.
  • [10] Cotter, S. L., Dashti, M. and Stuart, A. M. (2012). Variational data assimilation using targetted random walks. Internat. J. Numer. Methods Fluids 68 403–421.
  • [11] Cotter, S. L., Dashti, M., Robinson, J. C. and Stuart, A. M. (2009). Bayesian inverse problems for functions and applications to fluid mechanics. Inverse Problems 25 115008, 43.
  • [12] Da Prato, G. and Zabczyk, J. (1992). Stochastic Equations in Infinite Dimensions. Encyclopedia of Mathematics and Its Applications 44. Cambridge Univ. Press, Cambridge.
  • [13] Dashti, M., Harris, S. and Stuart, A. (2012). Besov priors for Bayesian inverse problems. Inverse Probl. Imaging 6 183–200.
  • [14] Diaconis, P. (1988). Bayesian numerical analysis. In Statistical Decision Theory and Related Topics, IV, Vol. 1 (West Lafayette, Ind., 1986) 163–175. Springer, New York.
  • [15] Duane, S., Kennedy, A. D., Pendleton, B. and Roweth, D. (1987). Hybrid Monte Carlo. Phys. Lett. B 195 216–222.
  • [16] Girolami, M. and Calderhead, B. (2011). Riemann manifold Langevin and Hamiltonian Monte Carlo methods (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 73 123–214.
  • [17] Glaunes, J., Trouvé, A. and Younes, L. (2004). Diffeomorphic matching of distributions: A new approach for unlabelled point-sets and sub-manifolds matching. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on 2 712–718. IEEE.
  • [18] Hairer, M., Stuart, A. M. and Voss, J. (2007). Analysis of SPDEs arising in path sampling. II. The nonlinear case. Ann. Appl. Probab. 17 1657–1706.
  • [19] Hairer, M., Stuart, A. and Voß, J. (2009). Sampling conditioned diffusions. In Trends in Stochastic Analysis. London Mathematical Society Lecture Note Series 353 159–185. Cambridge Univ. Press, Cambridge.
  • [20] Hairer, M., Stuart, A. and Voss, J. (2011). Signal processing problems on function space: Bayesian formulation, stochastic PDEs and effective MCMC methods. In The Oxford Handbook of Nonlinear Filtering (D. Crisan and B. Rozovsky, eds.) 833–873. Oxford Univ. Press, Oxford.
  • [21] Hairer, M., Stuart, A. M. and Vollmer, S. (2013). Spectral gaps for a Metropolis–Hastings algorithm in infinite dimensions. Available at http://arxiv.org/abs/1112.1392.
  • [22] Hairer, M., Stuart, A. M., Voss, J. and Wiberg, P. (2005). Analysis of SPDEs arising in path sampling. I. The Gaussian case. Commun. Math. Sci. 3 587–603.
  • [23] Hills, S. E. and Smith, A. F. M. (1992). Parameterization issues in Bayesian inference. In Bayesian Statistics, 4 (PeñíScola, 1991) 227–246. Oxford Univ. Press, New York.
  • [24] Hjort, N. L., Holmes, C., Müller, P. and Walker, S. G., eds. (2010). Bayesian Nonparametrics. Cambridge Series in Statistical and Probabilistic Mathematics 28. Cambridge Univ. Press, Cambridge.
  • [25] Iserles, A. (2004). A First Course in the Numerical Analysis of Differential Equations. Cambridge Univ. Press, Cambridge.
  • [26] Kalnay, E. (2003). Atmospheric Modeling, Data Assimilation and Predictability. Cambridge Univ. Press, Cambridge.
  • [27] Lemm, J. C. (2003). Bayesian Field Theory. Johns Hopkins Univ. Press, Baltimore, MD.
  • [28] Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, New York.
  • [29] Mattingly, J. C., Pillai, N. S. and Stuart, A. M. (2012). Diffusion limits of the random walk Metropolis algorithm in high dimensions. Ann. Appl. Probab. 22 881–930.
  • [30] McLaughlin, D. and Townley, L. R. (1996). A reassessment of the groundwater inverse problem. Water Res. Res. 32 1131–1161.
  • [31] Miller, M. T. and Younes, L. (2001). Group actions, homeomorphisms, and matching: A general framework. Int. J. Comput. Vis. 41 61–84.
  • [32] Neal, R. M. (1996). Bayesian Learning for Neural Networks. Springer, New York.
  • [33] Neal, R. M. (1998). Regression and classification using Gaussian process priors. Available at http://www.cs.toronto.edu/~radford/valencia.abstract.html.
  • [34] Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo 113–162. CRC Press, Boca Raton, FL.
  • [35] O’Hagan, A., Kennedy, M. C. and Oakley, J. E. (1999). Uncertainty analysis and other inference tools for complex computer codes. In Bayesian Statistics, 6 (Alcoceber, 1998) 503–524. Oxford Univ. Press, New York.
  • [36] Pillai, N. S., Stuart, A. M. and Thiéry, A. H. (2012). Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. 22 2320–2356.
  • [37] Pillai, N. S., Stuart, A. M. and Thiery, A. H. (2012). On the random walk Metropolis algorithm for Gaussian random field priors and gradient flow. Available at http://arxiv.org/abs/1108.1494.
  • [38] Richtmyer, D. and Morton, K. W. (1967). Difference Methods for Initial Value Problems. Wiley, New York.
  • [39] Robert, C. P. and Casella, G. (1999). Monte Carlo Statistical Methods. Springer, New York.
  • [40] Roberts, G. O., Gelman, A. and Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7 110–120.
  • [41] Roberts, G. O. and Rosenthal, J. S. (1998). Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 255–268.
  • [42] Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various Metropolis–Hastings algorithms. Statist. Sci. 16 351–367.
  • [43] Roberts, G. O. and Stramer, O. (2001). On inference for partially observed nonlinear diffusion models using the Metropolis–Hastings algorithm. Biometrika 88 603–621.
  • [44] Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2 341–363.
  • [45] Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. Chapman & Hall/CRC, Boca Raton, FL.
  • [46] Smith, A. F. M. and Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 55 3–23.
  • [47] Sokal, A. D. (1989). Monte Carlo methods in statistical mechanics: Foundations and new algorithms, Univ. Lausanne, Bâtiment des sciences de physique Troisième Cycle de la physique en Suisse romande.
  • [48] Stein, M. L. (1999). Interpolation of Spatial Data: Some Theory for Kriging. Springer, New York.
  • [49] Stuart, A. M. (2010). Inverse problems: A Bayesian perspective. Acta Numer. 19 451–559.
  • [50] Stuart, A. M., Voss, J. and Wiberg, P. (2004). Fast communication conditional path sampling of SDEs and the Langevin MCMC method. Commun. Math. Sci. 2 685–697.
  • [51] Tierney, L. (1998). A note on Metropolis–Hastings kernels for general state spaces. Ann. Appl. Probab. 8 1–9.
  • [52] Vaillant, M. and Glaunes, J. (2005). Surface matching via currents. In Information Processing in Medical Imaging 381–392. Springer, Berlin.
  • [53] van der Meulen, F., Schauer, M. and van Zanten, H. (2013). Reversible jump MCMC for nonparametric drift estimation for diffusion processes. Comput. Statist. Data Anal. To appear.
  • [54] Zhao, L. H. (2000). Bayesian aspects of some nonparametric problems. Ann. Statist. 28 532–552.