Bayesian Analysis

The Bayesian Update: Variational Formulations and Gradient Flows

Nicolas Garcia Trillos and Daniel Sanz-Alonso

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

The Bayesian update can be viewed as a variational problem by characterizing the posterior as the minimizer of a functional. The variational viewpoint is far from new and is at the heart of popular methods for posterior approximation. However, some of its consequences seem largely unexplored. We focus on the following one: defining the posterior as the minimizer of a functional gives a natural path towards the posterior by moving in the direction of steepest descent of the functional. This idea is made precise through the theory of gradient flows, allowing to bring new tools to the study of Bayesian models and algorithms. Since the posterior may be characterized as the minimizer of different functionals, several variational formulations may be considered. We study three of them and their three associated gradient flows. We show that, in all cases, the rate of convergence of the flows to the posterior can be bounded by the geodesic convexity of the functional to be minimized. Each gradient flow naturally suggests a nonlinear diffusion with the posterior as invariant distribution. These diffusions may be discretized to build proposals for Markov chain Monte Carlo (MCMC) algorithms. By construction, the diffusions are guaranteed to satisfy a certain optimality condition, and rates of convergence are given by the convexity of the functionals. We use this observation to propose a criterion for the choice of metric in Riemannian MCMC methods.

Article information

Source
Bayesian Anal., Advance publication (2018), 28 pages.

Dates
First available in Project Euclid: 20 December 2018

Permanent link to this document
https://projecteuclid.org/euclid.ba/1545296444

Digital Object Identifier
doi:10.1214/18-BA1137

Subjects
Primary: 62C10: Bayesian problems; characterization of Bayes procedures 62F15: Bayesian inference 49N99: None of the above, but in this section

Keywords
gradient flows Wasserstein space convexity Riemannian MCMC

Rights
Creative Commons Attribution 4.0 International License.

Citation

Garcia Trillos, Nicolas; Sanz-Alonso, Daniel. The Bayesian Update: Variational Formulations and Gradient Flows. Bayesian Anal., advance publication, 20 December 2018. doi:10.1214/18-BA1137. https://projecteuclid.org/euclid.ba/1545296444


Export citation

References

  • Ambrosio, L., Gigli, N., and Savaré, G. (2008). Gradient flows: in metric spaces and in the space of probability measures. Springer Science & Business Media.
  • Attias, H. (1999). “A Variational Bayesian Framework for Graphical Models.” In NIPS, volume 12.
  • Bertozzi, A. L., Luo, X., Stuart, A. M., and Zygalakis, K. C. (2017). “Uncertainty quantification in the classification of high dimensional data.”
  • Besag, J. E. (1994). “Comments on “Representations of knowledge in complex systems” by U. Grenander and M. I. Miller.” Journal of the Royal Statistical Society: Series B, 56: 591–592.
  • Burago, D., Burago, Y., and Ivanov, S. (2001). A course in metric geometry, volume 33 of Graduate Studies in Mathematics. American Mathematical Society, Providence, RI.
  • Burago, D., Ivanov, S., and Kurylev, Y. (2013). “A graph discretization of the Laplace-Beltrami operator.” arXiv preprint arXiv:1301.2222.
  • Cotter, S. L., Roberts, G. O., Stuart, A. M., and White, D. (2013). “MCMC methods for functions: modifying old algorithms to make them faster.” Statistical Science, 28(3): 424–446.
  • do Carmo Valero, M. P. (1992). Riemannian Geometry.
  • Dupuis, P. and Ellis, R. S. (2011). A weak convergence approach to the theory of large deviations, volume 902. John Wiley & Sons.
  • Fox, C. W. and Roberts, S. J. (2012). “A tutorial on variational Bayesian inference.” Artificial intelligence review, 38(2): 85–95.
  • Garcia Trillos, N., Gerlach, M., Hein, M., and Slepcev, D. (2017a). “Spectral convergence of empirical graph Laplacians.” In preparation.
  • Garcia Trillos, N., Kaplan, Z., Samakhoana, T., and Sanz-Alonso, D. (2017b). “On the consistency of graph-based Bayesian learning and the scalability of sampling algorithms.” arXiv preprint arXiv:1710.07702.
  • Garcia Trillos, N. and Sanz-Alonso, D. (2018a). “Continuum limits of posteriors in graph Bayesian inverse problems.” SIAM Journal on Mathematical Analysis, 50(4): 4020–4040.
  • Garcia Trillos, N. and Sanz-Alonso, D. (2018b). Supplementary Material to “The Bayesian Update: Variational Formulations and Gradient Flows”. doi: https://doi.org/10.1214/18-BA1137.
  • Girolami, M. and Calderhead, B. (2011). “Riemann manifold Langevin and hamiltonian Monte Carlo methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(2): 123–214.
  • Guo, F., Wang, X., Fan, K., Broderick, T., and Dunson, D. B. (2016). “Boosting variational inference.” arXiv preprint arXiv:1611.05559.
  • Jordan, R. and Kinderlehrer, D. (1996). “An extended variational principle.” Partial differential equations and applications: collected papers in honor of Carlo Pucci, 187.
  • Jordan, R., Kinderlehrer, D., and Otto, F. (1998). “The variational formulation of the Fokker–Planck equation.” SIAM Journal on Mathematical Analysis, 29(1): 1–17.
  • Kipnis, C. and Varadhan, S. R. S. (1986). “Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions.” Communications in Mathematical Physics, 104(1): 1–19.
  • McCann, R. J. (1997). “A convexity principle for interacting gases.” Advances in Mathematics, 128(1): 153–179.
  • Ohta, S. and Takatsu, A. (2011). “Displacement convexity of generalized relative entropies.” Advances in Mathematics, 228(3): 1742–1787.
  • Pavliotis, G. A. (2014). “Stochastic Processes and Applications.” Texts in Applied Mathematics. Springer, Berlin.
  • Pinski, F. J., Simpson, G., Stuart, A. M., and Weber, H. (2015). “Kullback–Leibler approximation for probability measures on infinite dimensional spaces.” SIAM Journal on Mathematical Analysis, 47(6): 4091–4122.
  • Roberts, G. O. and Tweedie, R. L. (1996). “Exponential convergence of Langevin distributions and their discrete approximations.” Bernoulli, 341–363.
  • Santambrogio, F. (2015). “Optimal transport for applied mathematicians.” Birkäuser, NY.
  • Sturm, K. T. (2006). “On the geometry of metric measure spaces.” Acta Mathematica, 196(1): 65–131.
  • Teschl, G. (2012). Ordinary differential equations and dynamical systems, volume 140. American Mathematical Soc.
  • Villani, C. (2003). Topics in optimal transportation, volume 58. American Mathematical Soc.
  • Villani, C. (2008). Optimal transport: old and new, volume 338. Springer Science & Business Media.
  • Von Luxburg, U. (2007). “A tutorial on spectral clustering.” Statistics and Computing, 17(4): 395–416.
  • von Renesse, M. K. and Sturm, K. T. (2005). “Transport inequalities, gradient estimates, entropy and Ricci curvature.” Communications on Pure and Applied Mathematics, 58(7): 923–940.
  • Wainwright, M. J. and Jordan, M. I. (2008). “Graphical models, exponential families, and variational inference.” Foundations and Trends® in Machine Learning, 1(1–2): 1–305.
  • Zellner, A. (1988). “Optimal information processing and Bayes’s theorem.” The American Statistician, 42(4): 278–280.

Supplemental materials