Bernoulli

  • Bernoulli
  • Volume 25, Number 4A (2019), 2854-2882.

High-dimensional Bayesian inference via the unadjusted Langevin algorithm

Alain Durmus and Éric Moulines

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We consider in this paper the problem of sampling a high-dimensional probability distribution $\pi$ having a density w.r.t. the Lebesgue measure on $\mathbb{R}^{d}$, known up to a normalization constant $x\mapsto\pi(x)=\mathrm{e}^{-U(x)}/\int_{\mathbb{R}^{d}}\mathrm{e}^{-U(y)}\,\mathrm{d}y$. Such problem naturally occurs for example in Bayesian inference and machine learning. Under the assumption that $U$ is continuously differentiable, $\nabla U$ is globally Lipschitz and $U$ is strongly convex, we obtain non-asymptotic bounds for the convergence to stationarity in Wasserstein distance of order $2$ and total variation distance of the sampling method based on the Euler discretization of the Langevin stochastic differential equation, for both constant and decreasing step sizes. The dependence on the dimension of the state space of these bounds is explicit. The convergence of an appropriately weighted empirical measure is also investigated and bounds for the mean square error and exponential deviation inequality are reported for functions which are measurable and bounded. An illustration to Bayesian inference for binary regression is presented to support our claims.

Article information

Source
Bernoulli, Volume 25, Number 4A (2019), 2854-2882.

Dates
Received: July 2017
Revised: July 2018
First available in Project Euclid: 13 September 2019

Permanent link to this document
https://projecteuclid.org/euclid.bj/1568362045

Digital Object Identifier
doi:10.3150/18-BEJ1073

Mathematical Reviews number (MathSciNet)
MR4003567

Zentralblatt MATH identifier
07110114

Keywords
Langevin diffusion Markov chain Monte Carlo Metropolis adjusted Langevin algorithm rate of convergence total variation distance

Citation

Durmus, Alain; Moulines, Éric. High-dimensional Bayesian inference via the unadjusted Langevin algorithm. Bernoulli 25 (2019), no. 4A, 2854--2882. doi:10.3150/18-BEJ1073. https://projecteuclid.org/euclid.bj/1568362045


Export citation

References

  • [1] Albert, J.H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • [2] Borodin, A.N. and Salminen, P. (2002). Handbook of Brownian Motion—Facts and Formulae, 2nd ed. Probability and Its Applications. Basel: Birkhäuser.
  • [3] Bubeck, S., Eldan, R. and Lehec, J. (2015). Finite-time analysis of projected Langevin Monte Carlo. In Proceedings of the 28th International Conference on Neural Information Processing Systems, NIPS’15 1243–1251. Cambridge, MA, USA: MIT Press.
  • [4] Bubley, R., Dyer, M. and Jerrum, M. (1998). An elementary analysis of a procedure for sampling points in a convex body. Random Structures Algorithms 12 213–235.
  • [5] Chen, M.F. and Li, S.F. (1989). Coupling methods for multidimensional diffusion processes. Ann. Probab. 17 151–177.
  • [6] Choi, H.M. and Hobert, J.P. (2013). The Polya-gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic. Electron. J. Stat. 7 2054–2064.
  • [7] Chopin, N. and Ridgway, J. (2017). Leave Pima Indians alone: Binary regression as a benchmark for Bayesian computation. Statist. Sci. 32 64–87.
  • [8] Dalalyan, A.S. Further and stronger analogy between sampling and optimization: Langevin Monte Carlo and gradient descent. In Proceedings of the 30th Annual Conference on Learning Theory.
  • [9] Dalalyan, A.S. (2017). Theoretical guarantees for approximate sampling from smooth and log-concave densities. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 651–676.
  • [10] Durmus, A. and Moulines, É. (2017). Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab. 27 1551–1587.
  • [11] Durmus, A. and Moulines, É. (2019). Supplement to “High-dimensional Bayesian inference via the unadjusted Langevin algorithm.” DOI:10.3150/18-BEJ1073SUPP.
  • [12] Eberle, A. Quantitative contraction rates for Markov chains on continuous state spaces. In preparation.
  • [13] Eberle, A. (2016). Reflection couplings and contraction rates for diffusions. Probab. Theory Related Fields 166 851–886.
  • [14] Eberle, A., Guillin, A. and Zimmer, R. (2018). Quantitative Harris type theorems for diffusions and McKean–Vlasov processes. Trans. Amer. Math. Soc. To appear.
  • [15] Ermak, D.L. (1975). A computer simulation of charged particles in solution. I. Technique and equilibrium properties. J. Chem. Phys. 62 4189–4196.
  • [16] Faes, C., Ormerod, J.T. and Wand, M.P. (2011). Variational Bayesian inference for parametric and nonparametric regression with missing data. J. Amer. Statist. Assoc. 106 959–971.
  • [17] Frühwirth-Schnatter, S. and Frühwirth, R. (2010). Data augmentation and MCMC for binary and multinomial logic models. In Statistical Modelling and Regression Structures 111–132. Heidelberg: Physica-Verlag/Springer.
  • [18] Gramacy, R.B. and Polson, N.G. (2012). Simulation-based regularized logistic regression. Bayesian Anal. 7 567–589.
  • [19] Grenander, U. (1996). Elements of Pattern Theory. Johns Hopkins Studies in the Mathematical Sciences. Baltimore, MD: Johns Hopkins Univ. Press.
  • [20] Grenander, U. and Miller, M.I. (1994). Representations of knowledge in complex systems. J. Roy. Statist. Soc. Ser. B 56 549–603. With discussion and a reply by the authors.
  • [21] Hanson, T.E., Branscum, A.J. and Johnson, W.O. (2014). Informative $g$-priors for logistic regression. Bayesian Anal. 9 597–611.
  • [22] Holmes, C.C. and Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1 145–168.
  • [23] Joulin, A. and Ollivier, Y. (2010). Curvature, concentration and error estimates for Markov chain Monte Carlo. Ann. Probab. 38 2418–2442.
  • [24] Karatzas, I. and Shreve, S.E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed. Graduate Texts in Mathematics 113. New York: Springer.
  • [25] Klartag, B. (2007). A central limit theorem for convex sets. Invent. Math. 168 91–131.
  • [26] Lamberton, D. and Pagès, G. (2002). Recursive computation of the invariant distribution of a diffusion. Bernoulli 8 367–405.
  • [27] Lamberton, D. and Pagès, G. (2003). Recursive computation of the invariant distribution of a diffusion: The case of a weakly mean reverting drift. Stoch. Dyn. 3 435–451.
  • [28] Lemaire, V. (2005). Estimation de la mesure invariante d’un processus de diffusion. Ph.D. thesis, Université Paris-Est.
  • [29] Lindvall, T. and Rogers, L.C.G. (1986). Coupling of multidimensional diffusions by reflection. Ann. Probab. 14 860–872.
  • [30] Mattingly, J.C., Stuart, A.M. and Higham, D.J. (2002). Ergodicity for SDEs and approximations: Locally Lipschitz vector fields and degenerate noise. Stochastic Process. Appl. 101 185–232.
  • [31] Neal, R.M. (1993). Bayesian learning via stochastic dynamics. In Advances in Neural Information Processing Systems 5, [NIPS Conference] 475–482. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
  • [32] Nesterov, Y. (2004). Introductory Lectures on Convex Optimization: A Basic Course. Applied Optimization 87. Boston, MA: Kluwer Academic.
  • [33] Parisi, G. (1981). Correlation functions and computer simulations. Nuclear Phys. B 180 378–384.
  • [34] Polson, N.G., Scott, J.G. and Windle, J. (2013). Bayesian inference for logistic models using Pólya-Gamma latent variables. J. Amer. Statist. Assoc. 108 1339–1349.
  • [35] Roberts, G.O. and Tweedie, R.L. (1996). Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli 2 341–363.
  • [36] Rossky, P.J., Doll, J.D. and Friedman, H.L. (1978). Brownian dynamics as smart Monte Carlo simulation. J. Chem. Phys. 69 4628–4633.
  • [37] Sabanés Bové, D. and Held, L. (2011). Hyper-$g$ priors for generalized linear models. Bayesian Anal. 6 387–410.
  • [38] Talay, D. and Tubaro, L. (1990). Expansion of the global error for numerical schemes solving stochastic differential equations. Stoch. Anal. Appl. 8 483–509.
  • [39] Villani, C. (2009). Optimal Transport: Old and New. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 338. Berlin: Springer.
  • [40] Welling, M. and Teh, Y.W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) 681–688.
  • [41] Windle, J., Polson, N.G. and Scott, J.G. (2013). Bayeslogit: Bayesian logistic regression. R package version 0.2. Available at http://cran.r-project.org/web/packages/BayesLogit/index.html.

Supplemental materials

  • Supplement to “High-dimensional Bayesian inference via the unadjusted Langevin algorithm”. Most proofs and derivations are postponed and carried out in a supplementary paper.