The Annals of Applied Probability

Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions

Natesh S. Pillai, Andrew M. Stuart, and Alexandre H. Thiéry

Full-text: Open access

Abstract

The Metropolis-adjusted Langevin (MALA) algorithm is a sampling algorithm which makes local moves by incorporating information about the gradient of the logarithm of the target density. In this paper we study the efficiency of MALA on a natural class of target measures supported on an infinite dimensional Hilbert space. These natural measures have density with respect to a Gaussian random field measure and arise in many applications such as Bayesian nonparametric statistics and the theory of conditioned diffusions. We prove that, started in stationarity, a suitably interpolated and scaled version of the Markov chain corresponding to MALA converges to an infinite dimensional diffusion process. Our results imply that, in stationarity, the MALA algorithm applied to an $N$-dimensional approximation of the target will take $\mathcal{O}(N^{1/3})$ steps to explore the invariant measure, comparing favorably with the Random Walk Metropolis which was recently shown to require $\mathcal{O}(N)$ steps when applied to the same class of problems. As a by-product of the diffusion limit, it also follows that the MALA algorithm is optimized at an average acceptance probability of $0.574$. Previous results were proved only for targets which are products of one-dimensional distributions, or for variants of this situation, limiting their applicability. The correlation in our target means that the rescaled MALA algorithm converges weakly to an infinite dimensional Hilbert space valued diffusion, and the limit cannot be described through analysis of scalar diffusions. The limit theorem is proved by showing that a drift-martingale decomposition of the Markov chain, suitably scaled, closely resembles a weak Euler–Maruyama discretization of the putative limit. An invariance principle is proved for the martingale, and a continuous mapping argument is used to complete the proof.

Article information

Source
Ann. Appl. Probab., Volume 22, Number 6 (2012), 2320-2356.

Dates
First available in Project Euclid: 23 November 2012

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1353695955

Digital Object Identifier
doi:10.1214/11-AAP828

Mathematical Reviews number (MathSciNet)
MR3024970

Zentralblatt MATH identifier
1272.60053

Subjects
Primary: 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) [See also 90B30, 91D10, 91D35, 91E40]
Secondary: 65C05: Monte Carlo methods

Keywords
Markov chain Monte Carlo Metropolis-adjusted Langevin algorithm scaling limit diffusion approximation

Citation

Pillai, Natesh S.; Stuart, Andrew M.; Thiéry, Alexandre H. Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. 22 (2012), no. 6, 2320--2356. doi:10.1214/11-AAP828. https://projecteuclid.org/euclid.aoap/1353695955


Export citation

References

  • [1] Bédard, M. (2007). Weak convergence of Metropolis algorithms for non-i.i.d. target distributions. Ann. Appl. Probab. 17 1222–1244.
  • [2] Bédard, M. (2009). On the optimal scaling problem of Metropolis algorithms for hierarchical target distributions. Preprint.
  • [3] Berger, E. (1986). Asymptotic behaviour of a class of stochastic approximation procedures. Probab. Theory Relat. Fields 71 517–552.
  • [4] Beskos, A., Pillai, N., Roberts, G. O., Sanz-Serna, J. M. and Stuart, A. M. (2012). Optimal tuning of the hybrid Monte-Carlo algorithm. Bernoulli. To appear.
  • [5] Beskos, A., Pinski, F. J., Sanz-Serna, J. M. and Stuart, A. M. (2011). Hybrid Monte Carlo on Hilbert spaces. Stochastic Process. Appl. 121 2201–2230.
  • [6] Beskos, A., Roberts, G. and Stuart, A. (2009). Optimal scalings for local Metropolis–Hastings chains on nonproduct targets in high dimensions. Ann. Appl. Probab. 19 863–898.
  • [7] Beskos, A., Roberts, G., Stuart, A. and Voss, J. (2008). MCMC methods for diffusion bridges. Stoch. Dyn. 8 319–350.
  • [8] Beskos, A. and Stuart, A. (2009). MCMC methods for sampling function space. In ICIAM 076th International Congress on Industrial and Applied Mathematics 337–364. Eur. Math. Soc., Zürich.
  • [9] Breyer, L. A., Piccioni, M. and Scarlatti, S. (2004). Optimal scaling of MaLa for nonlinear regression. Ann. Appl. Probab. 14 1479–1505.
  • [10] Breyer, L. A. and Roberts, G. O. (2000). From Metropolis to diffusions: Gibbs states and optimal scaling. Stochastic Process. Appl. 90 181–206.
  • [11] Christensen, O. F., Roberts, G. O. and Rosenthal, J. S. (2005). Scaling limits for the transient phase of local Metropolis–Hastings algorithms. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 253–268.
  • [12] Da Prato, G. and Zabczyk, J. (1992). Stochastic Equations in Infinite Dimensions. Encyclopedia of Mathematics and Its Applications 44. Cambridge Univ. Press, Cambridge.
  • [13] Dashti, M., Harris, S. and Stuart, A. M. (2012). Besov priors for Bayesian inverse problems. Inverse Probl. Imaging. To appear. Available at http://arxiv.org/abs/1105.0889.
  • [14] Ethier, S. N. and Kurtz, T. G. (1986). Markov Processes: Characterization and Convergence. Wiley, New York.
  • [15] Hairer, M., Stuart, A. M. and Voss, J. (2011). Signal processing problems on function space: Bayesian formulation, stochastic PDEs and effective MCM methods. In The Oxford Handbook of Nonlinear Filtering (D. Crisan and B. Rozovsky, eds.) 833–873. Oxford Univ. Press, Oxford.
  • [16] Hairer, M., Stuart, A. M. and Voss, J. (2007). Analysis of SPDEs arising in path sampling. II. The nonlinear case. Ann. Appl. Probab. 17 1657–1706.
  • [17] Hairer, M., Stuart, A. M., Voss, J. and Wiberg, P. (2005). Analysis of SPDEs arising in path sampling. I. The Gaussian case. Commun. Math. Sci. 3 587–603.
  • [18] Lassas, M., Saksman, E. and Siltanen, S. (2009). Discretization-invariant Bayesian inversion and Besov space priors. Inverse Probl. Imaging 3 87–122.
  • [19] Mattingly, J. C., Pillai, N. S. and Stuart, A. M. (2012). Diffusion limits of the random walk Metropolis algorithm in high dimensions. Ann. Appl. Probab. 22 881–930.
  • [20] Metropolis, N., Rosenbluth, A. W., Teller, M. N. and Teller, E. (1953). Equations of state calculations by fast computing machines. J. Chem. Phys. 21 1087–1092.
  • [21] Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer, New York.
  • [22] Roberts, G. O., Gelman, A. and Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7 110–120.
  • [23] Roberts, G. O. and Rosenthal, J. S. (1998). Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 255–268.
  • [24] Roberts, G. O. and Rosenthal, J. S. (2001). Optimal scaling for various Metropolis–Hastings algorithms. Statist. Sci. 16 351–367.
  • [25] Schwab, C. and Stuart, A. M. (2012). Sparse deterministic approximation of Bayesian inverse problems. Inverse Problems 28 045003.
  • [26] Sherlock, C., Fearnhead, P. and Roberts, G. O. (2010). The random walk Metropolis: Linking theory and practice through a case study. Statist. Sci. 25 172–190.
  • [27] Stuart, A. M. (2010). Inverse problems: A Bayesian perspective. Acta Numer. 19 451–559.