In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.
A.B. was supported by the “Futur & Ruptures” research program which is jointly funded by the IMT, the Mines-Télécom Foundation and the Carnot TSN Institute. S.S. was supported by the “Région Ile-de-France”.
We would like to thank the anonymous reviewers for their outstanding job of refereeing, especially for their comments on the avoidance of traps results and the stability of the iterates of the stochastic algorithm which helped us to improve and clarify our manuscript.
"Stochastic optimization with momentum: Convergence, fluctuations, and traps avoidance." Electron. J. Statist. 15 (2) 3892 - 3947, 2021. https://doi.org/10.1214/21-EJS1880