Open Access
2024 Uniform-in-time propagation of chaos for kinetic mean field Langevin dynamics
Fan Chen, Yiqing Lin, Zhenjie Ren, Songbo Wang
Author Affiliations +
Electron. J. Probab. 29: 1-43 (2024). DOI: 10.1214/24-EJP1079
Abstract

We study the kinetic mean field Langevin dynamics under the functional convexity assumption of the mean field energy functional. Using hypocoercivity, we first establish the exponential convergence of the mean field dynamics and then show the corresponding N-particle system converges exponentially in a rate uniform in N modulo a small error. Finally we study the short-time regularization effects of the dynamics and prove its uniform-in-time propagation of chaos property in both the Wasserstein and entropic sense. Our results can be applied to the training of two-layer neural networks with momentum and we include the numerical experiments.

References

1.

Luigi Ambrosio, Nicola Fusco, and Diego Pallara. Functions of bounded variation and free discontinuity problems. Oxford Math. Monogr. Oxford: Clarendon Press, 2000.  MR1857292Luigi Ambrosio, Nicola Fusco, and Diego Pallara. Functions of bounded variation and free discontinuity problems. Oxford Math. Monogr. Oxford: Clarendon Press, 2000.  MR1857292

2.

Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows in metric spaces and in the space of probability measures. Basel: Birkhäuser, 2nd ed. edition, 2008.  MR2401600Luigi Ambrosio, Nicola Gigli, and Giuseppe Savaré. Gradient flows in metric spaces and in the space of probability measures. Basel: Birkhäuser, 2nd ed. edition, 2008.  MR2401600

3.

François Bolley, Arnaud Guillin, and Florent Malrieu. Trend to equilibrium and particle approximation for a weakly selfconsistent Vlasov–Fokker–Planck equation. ESAIM, Math. Model. Numer. Anal., 44(5):867–884, 2010.  MR2731396François Bolley, Arnaud Guillin, and Florent Malrieu. Trend to equilibrium and particle approximation for a weakly selfconsistent Vlasov–Fokker–Planck equation. ESAIM, Math. Model. Numer. Anal., 44(5):867–884, 2010.  MR2731396

4.

René Carmona and François Delarue. Probabilistic theory of mean field games with applications I. Mean field FBSDEs, control, and games, volume 83 of Probab. Theory Stoch. Model. Cham: Springer, 2018.  MR3752669René Carmona and François Delarue. Probabilistic theory of mean field games with applications I. Mean field FBSDEs, control, and games, volume 83 of Probab. Theory Stoch. Model. Cham: Springer, 2018.  MR3752669

5.

Louis-Pierre Chaintron and Antoine Diez. Propagation of chaos: a review of models, methods and applications. I: Models and methods. Kinet. Relat. Models, 15(6):895–1015, 2022.  MR4489768Louis-Pierre Chaintron and Antoine Diez. Propagation of chaos: a review of models, methods and applications. I: Models and methods. Kinet. Relat. Models, 15(6):895–1015, 2022.  MR4489768

6.

Louis-Pierre Chaintron and Antoine Diez. Propagation of chaos: a review of models, methods and applications. II: Applications. Kinet. Relat. Models, 15(6):1017–1173, 2022.  MR4489769Louis-Pierre Chaintron and Antoine Diez. Propagation of chaos: a review of models, methods and applications. II: Applications. Kinet. Relat. Models, 15(6):1017–1173, 2022.  MR4489769

7.

Fan Chen, Zhenjie Ren, and Songbo Wang. Uniform-in-time propagation of chaos for mean field Langevin dynamics. arXiv preprint  2212.03050, 2022.Fan Chen, Zhenjie Ren, and Songbo Wang. Uniform-in-time propagation of chaos for mean field Langevin dynamics. arXiv preprint  2212.03050, 2022.

8.

Lénaïc Chizat and Francis Bach. On the global convergence of gradient descent for over-parameterized models using optimal transport. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.Lénaïc Chizat and Francis Bach. On the global convergence of gradient descent for over-parameterized models using optimal transport. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018.

9.

Antonin Chodron de Courcel, Matthew Rosenzweig, and Sylvia Serfaty. Sharp uniform-in-time mean-field convergence for singular periodic Riesz flows. To appear in Ann. Inst. Henri Poincaré, Anal. Non Linéaire.Antonin Chodron de Courcel, Matthew Rosenzweig, and Sylvia Serfaty. Sharp uniform-in-time mean-field convergence for singular periodic Riesz flows. To appear in Ann. Inst. Henri Poincaré, Anal. Non Linéaire.

10.

François Delarue and Alvin Tse. Uniform in time weak propagation of chaos on the torus. arXiv preprint  2104.14973, 2021.François Delarue and Alvin Tse. Uniform in time weak propagation of chaos on the torus. arXiv preprint  2104.14973, 2021.

11.

Andreas Eberle, Arnaud Guillin, and Raphael Zimmer. Couplings and quantitative contraction rates for Langevin dynamics. Ann. Probab., 47(4):1982–2010, 2019.  MR3980913Andreas Eberle, Arnaud Guillin, and Raphael Zimmer. Couplings and quantitative contraction rates for Langevin dynamics. Ann. Probab., 47(4):1982–2010, 2019.  MR3980913

12.

Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields, 162(3-4):707–738, 2015.  MR3383341Nicolas Fournier and Arnaud Guillin. On the rate of convergence in Wasserstein distance of the empirical measure. Probab. Theory Relat. Fields, 162(3-4):707–738, 2015.  MR3383341

13.

Arnaud Guillin, Pierre Le Bris, and Pierre Monmarché. Uniform in time propagation of chaos for the 2D vortex model and other singular stochastic systems. To appear in J. Eur. Soc. Math.Arnaud Guillin, Pierre Le Bris, and Pierre Monmarché. Uniform in time propagation of chaos for the 2D vortex model and other singular stochastic systems. To appear in J. Eur. Soc. Math.

14.

Arnaud Guillin, Pierre Le Bris, and Pierre Monmarché. Convergence rates for the Vlasov–Fokker–Planck equation and uniform in time propagation of chaos in non convex cases. Electron. J. Probab., 27:44, 2022. Id/No 124.  MR4489825Arnaud Guillin, Pierre Le Bris, and Pierre Monmarché. Convergence rates for the Vlasov–Fokker–Planck equation and uniform in time propagation of chaos in non convex cases. Electron. J. Probab., 27:44, 2022. Id/No 124.  MR4489825

15.

Arnaud Guillin, Wei Liu, Liming Wu, and Chaoen Zhang. The kinetic Fokker–Planck equation with mean field interaction. J. Math. Pures Appl. (9), 150:1–23, 2021.  MR4248461Arnaud Guillin, Wei Liu, Liming Wu, and Chaoen Zhang. The kinetic Fokker–Planck equation with mean field interaction. J. Math. Pures Appl. (9), 150:1–23, 2021.  MR4248461

16.

Arnaud Guillin and Pierre Monmarché. Uniform long-time and propagation of chaos estimates for mean field kinetic particles in non-convex landscapes. J. Stat. Phys., 185(2):20, 2021. Id/No 15.  MR4333408Arnaud Guillin and Pierre Monmarché. Uniform long-time and propagation of chaos estimates for mean field kinetic particles in non-convex landscapes. J. Stat. Phys., 185(2):20, 2021. Id/No 15.  MR4333408

17.

Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Neural networks for machine learning: Lecture 6a: Overview of mini-batch gradient descent.  http://www.cs.toronto.edu/hinton/coursera/lecture6/lec6.pdf, 2012. Accessed: 2023-06-23.Geoffrey Hinton, Nitish Srivastava, and Kevin Swersky. Neural networks for machine learning: Lecture 6a: Overview of mini-batch gradient descent.  http://www.cs.toronto.edu/hinton/coursera/lecture6/lec6.pdf, 2012. Accessed: 2023-06-23.

18.

Kaitong Hu, Zhenjie Ren, David Šiška, and Łukasz Szpruch. Mean-field Langevin dynamics and energy landscape of neural networks. Ann. Inst. Henri Poincaré, Probab. Stat., 57(4):2043–2065, 2021.  MR4328560Kaitong Hu, Zhenjie Ren, David Šiška, and Łukasz Szpruch. Mean-field Langevin dynamics and energy landscape of neural networks. Ann. Inst. Henri Poincaré, Probab. Stat., 57(4):2043–2065, 2021.  MR4328560

19.

Xing Huang. Coupling by change of measure for conditional McKean–Vlasov SDEs and applications. arXiv preprint  2303.04369, 2023.Xing Huang. Coupling by change of measure for conditional McKean–Vlasov SDEs and applications. arXiv preprint  2303.04369, 2023.

20.

Pierre-Emmanuel Jabin and Zhenfu Wang. Quantitative estimates of propagation of chaos for stochastic systems with W1,kernels. Invent. Math., 214(1):523–591, 2018.  MR3858403Pierre-Emmanuel Jabin and Zhenfu Wang. Quantitative estimates of propagation of chaos for stochastic systems with W1,kernels. Invent. Math., 214(1):523–591, 2018.  MR3858403

21.

Anna Kazeykina, Zhenjie Ren, Xiaolu Tan, and Junjian Yang. Ergodicity of the underdamped mean-field Langevin dynamics. To appear in Ann. Appl. Probab.Anna Kazeykina, Zhenjie Ren, Xiaolu Tan, and Junjian Yang. Ergodicity of the underdamped mean-field Langevin dynamics. To appear in Ann. Appl. Probab.

22.

Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint  1412.6980, 2014.Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint  1412.6980, 2014.

23.

Nikola B. Kovachki and Andrew M. Stuart. Continuous time analysis of momentum methods. Journal of Machine Learning Research, 22(17):1–40, 2021.  MR4253710Nikola B. Kovachki and Andrew M. Stuart. Continuous time analysis of momentum methods. Journal of Machine Learning Research, 22(17):1–40, 2021.  MR4253710

24.

Daniel Lacker and Luc Le Flem. Sharp uniform-in-time propagation of chaos. Probability Theory and Related Fields, pages 1–38, 2023.  MR4634344Daniel Lacker and Luc Le Flem. Sharp uniform-in-time propagation of chaos. Probability Theory and Related Fields, pages 1–38, 2023.  MR4634344

25.

Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. The MNIST database of handwritten digits.  http://yann.lecun.com/exdb/mnist/, 1998. Accessed: 2023-06-23.Yann LeCun, Corinna Cortes, and Christopher J.C. Burges. The MNIST database of handwritten digits.  http://yann.lecun.com/exdb/mnist/, 1998. Accessed: 2023-06-23.

26.

Yanli Liu, Yuan Gao, and Wotao Yin. An improved analysis of stochastic gradient descent with momentum. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 18261–18271. Curran Associates, Inc., 2020.Yanli Liu, Yuan Gao, and Wotao Yin. An improved analysis of stochastic gradient descent with momentum. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 18261–18271. Curran Associates, Inc., 2020.

27.

Yi-An Ma, Niladri S. Chatterji, Xiang Cheng, Nicolas Flammarion, Peter L. Bartlett, and Michael I. Jordan. Is there an analog of Nesterov acceleration for gradient-based MCMC? Bernoulli, 27(3):1942–1992, 2021.  MR4278799Yi-An Ma, Niladri S. Chatterji, Xiang Cheng, Nicolas Flammarion, Peter L. Bartlett, and Michael I. Jordan. Is there an analog of Nesterov acceleration for gradient-based MCMC? Bernoulli, 27(3):1942–1992, 2021.  MR4278799

28.

Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks. Proc. Natl. Acad. Sci. USA, 115(33):e7665–e7671, 2018.  MR3845070Song Mei, Andrea Montanari, and Phan-Minh Nguyen. A mean field view of the landscape of two-layer neural networks. Proc. Natl. Acad. Sci. USA, 115(33):e7665–e7671, 2018.  MR3845070

29.

Pierre Monmarché. Long-time behaviour and propagation of chaos for mean field kinetic particles. Stochastic Processes Appl., 127(6):1721–1737, 2017.  MR3646428Pierre Monmarché. Long-time behaviour and propagation of chaos for mean field kinetic particles. Stochastic Processes Appl., 127(6):1721–1737, 2017.  MR3646428

30.

Boris T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.  MR0169403Boris T. Polyak. Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5):1–17, 1964.  MR0169403

31.

Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of Adam and beyond. In International Conference on Learning Representations, 2018.Sashank J. Reddi, Satyen Kale, and Sanjiv Kumar. On the convergence of Adam and beyond. In International Conference on Learning Representations, 2018.

32.

Panpan Ren and Feng-Yu Wang. Exponential convergence in entropy and Wasserstein for McKean–Vlasov SDEs. Nonlinear Anal., Theory Methods Appl., Ser. A, Theory Methods, 206:21, 2021. Id/No 112259.  MR4206077Panpan Ren and Feng-Yu Wang. Exponential convergence in entropy and Wasserstein for McKean–Vlasov SDEs. Nonlinear Anal., Theory Methods Appl., Ser. A, Theory Methods, 206:21, 2021. Id/No 112259.  MR4206077

33.

Matthew Rosenzweig and Sylvia Serfaty. Global-in-time mean-field convergence for singular Riesz-type diffusive flows. Ann. Appl. Probab., 33(2):954–998, 2023.  MR4564418Matthew Rosenzweig and Sylvia Serfaty. Global-in-time mean-field convergence for singular Riesz-type diffusive flows. Ann. Appl. Probab., 33(2):954–998, 2023.  MR4564418

34.

Grant Rotskoff and Eric Vanden-Eijnden. Trainability and accuracy of artificial neural networks: an interacting particle system approach. Commun. Pure Appl. Math., 75(9):1889–1935, 2022.  MR4465905Grant Rotskoff and Eric Vanden-Eijnden. Trainability and accuracy of artificial neural networks: an interacting particle system approach. Commun. Pure Appl. Math., 75(9):1889–1935, 2022.  MR4465905

35.

Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint  1609.04747, 2016.Sebastian Ruder. An overview of gradient descent optimization algorithms. arXiv preprint  1609.04747, 2016.

36.

Katharina Schuh. Global contractivity for Langevin dynamics with distribution-dependent forces and uniform in time propagation of chaos. To appear in Ann. Inst. Henri Poincaré, Probab. Stat.Katharina Schuh. Global contractivity for Langevin dynamics with distribution-dependent forces and uniform in time propagation of chaos. To appear in Ann. Inst. Henri Poincaré, Probab. Stat.

37.

Othmane Sebbouh, Robert M. Gower, and Aaron Defazio. Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball. In Mikhail Belkin and Samory Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 3935–3971. PMLR, 15–19 Aug 2021.Othmane Sebbouh, Robert M. Gower, and Aaron Defazio. Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball. In Mikhail Belkin and Samory Kpotufe, editors, Proceedings of Thirty Fourth Conference on Learning Theory, volume 134 of Proceedings of Machine Learning Research, pages 3935–3971. PMLR, 15–19 Aug 2021.

38.

Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1139–1147, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.Ilya Sutskever, James Martens, George Dahl, and Geoffrey Hinton. On the importance of initialization and momentum in deep learning. In Sanjoy Dasgupta and David McAllester, editors, Proceedings of the 30th International Conference on Machine Learning, volume 28 of Proceedings of Machine Learning Research, pages 1139–1147, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR.

39.

Alvin Tsz Ho Tse. Quantitative propagation of chaos of McKean–Vlasov equations via the master equation. PhD thesis, The University of Edinburgh, 2019.Alvin Tsz Ho Tse. Quantitative propagation of chaos of McKean–Vlasov equations via the master equation. PhD thesis, The University of Edinburgh, 2019.

40.

Cédric Villani. Hypocoercivity, volume 950 of Mem. Am. Math. Soc. Providence, RI: American Mathematical Society (AMS), 2009.  MR2562709Cédric Villani. Hypocoercivity, volume 950 of Mem. Am. Math. Soc. Providence, RI: American Mathematical Society (AMS), 2009.  MR2562709
Fan Chen, Yiqing Lin, Zhenjie Ren, and Songbo Wang "Uniform-in-time propagation of chaos for kinetic mean field Langevin dynamics," Electronic Journal of Probability 29(none), 1-43, (2024). https://doi.org/10.1214/24-EJP1079
Received: 27 July 2023; Accepted: 8 January 2024; Published: 2024
Vol.29 • 2024
Back to Top