Mean-field Langevin dynamics and energy landscape of neural networks

Kaitong Hu; Zhenjie Ren; David Šiška; Łukasz Szpruch

doi:10.1214/20-AIHP1140

November 2021 Mean-field Langevin dynamics and energy landscape of neural networks

Kaitong Hu, Zhenjie Ren, David Šiška, Łukasz Szpruch

Author Affiliations +

Kaitong Hu,¹ Zhenjie Ren,² David Šiška,³ Łukasz Szpruch³
¹CMAP, École Polytechnique, F-91128 Palaiseau Cedex, France
²CEREMADE, Université Paris Dauphine, F-75775 Paris Cedex 16, France
³School of Mathematics, University of Edinburgh, James Clerk Maxwell Building, Peter Guthrie Tait Road, Edinburgh EH9 3FD, UK

Ann. Inst. H. Poincaré Probab. Statist. 57(4): 2043-2065 (November 2021). DOI: 10.1214/20-AIHP1140

ABOUT
FIRST PAGE
CITED BY
REFERENCES
DOWNLOAD PAPER SAVE TO MY LIBRARY

PERSONAL SIGN IN
Full access may be available with your subscription

Password Forgot your password?

Show

Remember Email on this computer

Remember Password

No Project Euclid account? Create an account
or Sign in with your institutional credentials

PURCHASE SINGLE ARTICLE

This article is only available to subscribers. It is not available for individual sale.

This will count as one of your downloads.

You will have access to both the presentation and article (if available).

DOWNLOAD NOW

This content is available for download via your institution's subscription. To access this item, please sign in to your personal account.

Password Forgot your password?

Show

Remember Email on this computer

Remember Password

No Project Euclid account? Create an account

My Library

You currently do not have any folders to save your paper to! Create a new folder below.

Abstract

Our work is motivated by a desire to study the theoretical underpinning for the convergence of stochastic gradient type algorithms widely used for non-convex learning tasks such as training of neural networks. The key insight, already observed in (Mei, Montanari and Nguyen (2018); Chizat and Bach (2018); Rotskoff and Vanden-Eijnden (2018)), is that a certain class of the finite-dimensional non-convex problems becomes convex when lifted to infinite-dimensional space of measures. We leverage this observation and show that the corresponding energy functional defined on the space of probability measures has a unique minimiser which can be characterised by a first-order condition using the notion of linear functional derivative. Next, we study the corresponding gradient flow structure in 2-Wasserstein metric, which we call Mean-Field Langevin Dynamics (MFLD), and show that the flow of marginal laws induced by the gradient flow converges to a stationary distribution, which is exactly the minimiser of the energy functional. We observe that this convergence is exponential under conditions that are satisfied for highly regularised learning tasks. Our proof of convergence to stationary probability measure is novel and it relies on a generalisation of LaSalle’s invariance principle combined with HWI inequality. Importantly, we assume neither that interaction potential of MFLD is of convolution type nor that it has any particular symmetric structure. Furthermore, we allow for the general convex objective function, unlike, most papers in the literature that focus on quadratic loss. Finally, we show that the error between finite-dimensional optimisation problem and its infinite-dimensional limit is of order one over the number of parameters.

L’objectif de nos travaux est d’étudier le fondement théorique pour la convergence des algorithmes du type gradient stochastique, qui sont très souvent utilisés dans les problèmes d’apprentissage non-convexe, e.g. calibrer un réseau de neurones. L’observation clé, qui a déjà été remarquée dans (Mei, Montanari and Nguyen (2018); Chizat and Bach (2018); Rotskoff and Vanden-Eijnden (2018)), est qu’une certaine classe de problèmes non-convexes fini-dimensionnels devient convexe une fois injectée dans l’espace des mesures de probabilité. À l’aide de cette observation nous montrons que la fonction d’énergie correspondante définie dans l’espace des mesures de probabilité a un unique minimiser qui peut être caractérisé par une condition de premier ordre en utilisant la notion de dérivée fonctionnelle. Par la suite, nous étudions la structure de flux de gradient avec la métrique de 2-Wasserstein, que nous appelons la dynamique de Langevin au champs moyen (MFLD), et nous montrons que la loi marginale du flux de gradient converge vers une loi stationnaire qui correspond au minimiser de la même fonction d’énergie précédente. Sous certaines conditions de régularité du probléme initial, la convergence a lieu à une vitesse exponentielle. Nos preuves de la convergence vers la loi stationnaire est nouvelle, qui reposent sur le principe d’invariance de LaSalle et l’inégalité HWI. Remarquons que nous ne supposons pas que l’interaction potentielle de MFLD soit du type convolution ou symétrique. De plus, nos résultats s’appliquent aux fonctions d’objectif convexes générales contrairement aux beaucoup d’articles dans la littérature qui se limitent aux fonctions quadratiques. Enfin, nous montrons que la différence entre le probléme initial d’optimisation fini-dimensionnel et sa limite dans l’espace des mesures de probabilité est de l’ordre d’un sur le nombre de paramètres.

Acknowledgements

The authors would like to thank the anonymous referees for their valuable comments which have helped improve the clarity of the paper.

The third and fourth authors acknowledge the support of The Alan Turing Institute under the Engineering and Physical Sciences Research Council grant EP/N510129/1.

Citation

Download Citation

Kaitong Hu. Zhenjie Ren. David Šiška. Łukasz Szpruch. "Mean-field Langevin dynamics and energy landscape of neural networks." Ann. Inst. H. Poincaré Probab. Statist. 57 (4) 2043 - 2065, November 2021. https://doi.org/10.1214/20-AIHP1140

Information

Received: 23 May 2020; Revised: 7 December 2020; Accepted: 8 December 2020; Published: November 2021

First available in Project Euclid: 20 October 2021

MathSciNet: MR4328560

zbMATH: 1492.65023

Digital Object Identifier: 10.1214/20-AIHP1140

Subjects:

Primary: 37M25 , 60H30

Keywords: Gradient flow , Mean-field Langevin dynamics , neural networks

ACCESS THE FULL ARTICLE

PERSONAL SIGN IN
Full access may be available with your subscription

Password Forgot your password?

Show

Remember Email on this computer

Remember Password

No Project Euclid account? Create an account
or Sign in with your institutional credentials

PURCHASE THIS CONTENT

PURCHASE SINGLE ARTICLE

This article is only available to subscribers.
It is not available for individual sale.

JOURNAL ARTICLE
23 PAGES

This article is only available to subscribers.
It is not available for individual sale.

+ SAVE TO MY LIBRARY

GET CITATION

My Library

You currently do not have any folders to save your paper to! Create a new folder below.

Folder Name

Folder Description

< Previous Article

Next Article >

Ann. Inst. H. Poincaré Probab. Statist.

Vol.57 • No. 4 • November 2021

Institut Henri Poincaré

Subscribe to Project Euclid

Receive erratum alerts for this article

Kaitong Hu, Zhenjie Ren, David Šiška, Łukasz Szpruch "Mean-field Langevin dynamics and energy landscape of neural networks," Annales de l'Institut Henri Poincaré, Probabilités et Statistiques, Ann. Inst. H. Poincaré Probab. Statist. 57(4), 2043-2065, (November 2021)

Include:

Citation Only

Citation & Abstract

Format:

RIS

EndNote

BibTex

Print Friendly Version (PDF)

Abstract

Acknowledgements

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS