Annals of Applied Probability

Nonasymptotic convergence analysis for the unadjusted Langevin algorithm

Alain Durmus and Éric Moulines

Full-text: Open access

Abstract

In this paper, we study a method to sample from a target distribution $\pi$ over $\mathbb{R}^{d}$ having a positive density with respect to the Lebesgue measure, known up to a normalisation factor. This method is based on the Euler discretization of the overdamped Langevin stochastic differential equation associated with $\pi$. For both constant and decreasing step sizes in the Euler discretization, we obtain nonasymptotic bounds for the convergence to the target distribution $\pi$ in total variation distance. A particular attention is paid to the dependency on the dimension $d$, to demonstrate the applicability of this method in the high-dimensional setting. These bounds improve and extend the results of Dalalyan [J. R. Stat. Soc. Ser. B. Stat. Methodol. (2017) 79 651–676].

Article information

Source
Ann. Appl. Probab., Volume 27, Number 3 (2017), 1551-1587.

Dates
Received: March 2016
Revised: August 2016
First available in Project Euclid: 19 July 2017

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1500451235

Digital Object Identifier
doi:10.1214/16-AAP1238

Mathematical Reviews number (MathSciNet)
MR3678479

Zentralblatt MATH identifier
1377.65007

Subjects
Primary: 65C05: Monte Carlo methods 60F05: Central limit and other weak theorems 62L10: Sequential analysis
Secondary: 65C40: Computational Markov chains 60J05: Discrete-time Markov processes on general state spaces 93E35: Stochastic learning and adaptive control

Keywords
Total variation distance Langevin diffusion Markov Chain Monte Carlo Metropolis adjusted Langevin algorithm rate of convergence

Citation

Durmus, Alain; Moulines, Éric. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab. 27 (2017), no. 3, 1551--1587. doi:10.1214/16-AAP1238. https://projecteuclid.org/euclid.aoap/1500451235


Export citation

References

  • [1] Andrieu, C., De Freitas, N., Doucet, A. and Jordan, M. I. (2003). An introduction to MCMC for machine learning. Mach. Learn. 50 5–43.
  • [2] Bakry, D., Barthe, F., Cattiaux, P. and Guillin, A. (2008). A simple proof of the Poincaré inequality for a large class of probability measures. Electronic Communications in Probability [electronic Only] 13 60–66.
  • [3] Bakry, D., Cattiaux, P. and Guillin, A. (2008). Rate of convergence for ergodic continuous Markov processes: Lyapunov versus Poincaré. J. Funct. Anal. 254 727–759.
  • [4] Bakry, D., Gentil, I. and Ledoux, M. (2014). Analysis and Geometry of Markov Diffusion Operators. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 348. Springer, Cham.
  • [5] Bobkov, S. G. (1999). Isoperimetric and analytic inequalities for log-concave probability measures. Ann. Probab. 27 1903–1921.
  • [6] Bolley, F., Gentil, I. and Guillin, A. (2012). Convergence to equilibrium in Wasserstein distance for Fokker–Planck equations. J. Funct. Anal. 263 2430–2457.
  • [7] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration inequalities. In A Nonasymptotic Theory of Independence. Oxford Univ. Press, Oxford.
  • [8] Bubley, R., Dyer, M. and Jerrum, M. (1998). An elementary analysis of a procedure for sampling points in a convex body. Random Structures Algorithms 12 213–235.
  • [9] Cattiaux, P. and Guillin, A. (2009). Trends to equilibrium in total variation distance. Ann. Inst. Henri Poincaré Probab. Stat. 45 117–145.
  • [10] Chen, M. F. and Li, S. F. (1989). Coupling methods for multidimensional diffusion processes. Ann. Probab. 17 151–177.
  • [11] Cotter, S. L., Roberts, G. O., Stuart, A. M. and White, D. (2013). MCMC methods for functions: Modifying old algorithms to make them faster. Statist. Sci. 28 424–446.
  • [12] Dalalyan, A. S. (2017). Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 651–676.
  • [13] Dalalyan, A. S. and Tsybakov, A. B. (2012). Sparse regression learning by aggregation and Langevin Monte-Carlo. J. Comput. System Sci. 78 1423–1443.
  • [14] Durmus, A., Moulines, É. and Pereyra, M. Sampling from convex non continuously differentiable functions, when Moreau meets Langevin. In preparation.
  • [15] Eberle, A. (2015). Reflection couplings and contraction rates for diffusions. Probab. Theory Related Fields 1–36.
  • [16] Ethier, S. N. and Kurtz, T. G. (1986). Markov Processes: Characterization and Convergence. Wiley, New York.
  • [17] Grenander, U. (1983). Tutorial in Pattern Theory. Brown Univ., Providence, RI.
  • [18] Grenander, U. and Miller, M. I. (1994). Representations of knowledge in complex systems. J. Roy. Statist. Soc. Ser. B 56 549–603.
  • [19] Holley, R. and Stroock, D. (1987). Logarithmic Sobolev inequalities and stochastic Ising models. J. Stat. Phys. 46 1159–1194.
  • [20] Ikeda, N. and Watanabe, S. (1989). Stochastic Differential Equations and Diffusion Processes. North-Holland Mathematical Library. Elsevier, Amsterdam.
  • [21] Karatzas, I. and Shreve, S. E. (1991). Brownian Motion and Stochastic Calculus. Springer, New York.
  • [22] Kullback, S. (1997). Information Theory and Statistics. Dover, Mineola, NY.
  • [23] Lamberton, D. and Pagès, G. (2002). Recursive computation of the invariant distribution of a diffusion. Bernoulli 8 367–405.
  • [24] Lamberton, D. and Pagès, G. (2003). Recursive computation of the invariant distribution of a diffusion: The case of a weakly mean reverting drift. Stoch. Dyn. 3 435–451.
  • [25] Lemaire, V. (2005). Estimation de la mesure invariante d’un processus de diffusion. Ph.D. thesis, Univ. Paris-Est.
  • [26] Lemaire, V. and Menozzi, S. (2010). On some non asymptotic bounds for the Euler scheme. Electron. J. Probab. 15 1645–1681.
  • [27] Lindvall, T. and Rogers, L. C. G. (1986). Coupling of multidimensional diffusions by reflection. Ann. Probab. 14 860–872.
  • [28] Lovász, L. and Vempala, S. (2007). The geometry of logconcave functions and sampling algorithms. Random Structures Algorithms 30 307–358.
  • [29] Mattingly, J. C., Stuart, A. M. and Higham, D. J. (2002). Ergodicity for SDEs and approximations: Locally Lipschitz vector fields and degenerate noise. Stochastic Process. Appl. 101 185–232.
  • [30] Meyn, S. and Tweedie, R. (2009). Markov Chains and Stochastic Stability, 2nd ed. Cambridge Univ. Press, New York.
  • [31] Meyn, S. P. and Tweedie, R. L. (1993). Stability of Markovian processes III: Foster-Lyapunov criteria for continuous-time processes. Adv. in Appl. Probab. 25 518–548.
  • [32] Meyn, S. P. and Tweedie, R. L. (1993). Stability of Markovian processes. III. Foster–Lyapunov criteria for continuous-time processes. Adv. in Appl. Probab. 25 518–548.
  • [33] Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Kluwer, Boston, MA.
  • [34] Parisi, G. (1981). Correlation functions and computer simulations. Nuclear Phys. B 180 378–384.
  • [35] Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2 341–363.
  • [36] Roberts, G. O. and Tweedie, R. L. (2000). Rates of convergence of stochastically monotone and continuous time Markov models. J. Appl. Probab. 37 359–373.
  • [37] Talay, D. and Tubaro, L. (1991). Expansion of the global error for numerical schemes solving stochastic differential equations. Stoch. Anal. Appl. 8 483–509.