Analysis of an Adaptive Biasing Force method based on self-interacting dynamics

,


Introduction
Let µ ‹ be a probability distribution on the d-dimensional flat torus T d (with T " R{Z), of the type: where dx is the normalized Lebesgue measure on T d . For applications in physics and chemistry (e.g. in molecular dynamics), µ ‹ is referred to as the Boltzmann-Gibbs distribution associated with the potential energy function V and the inverse temperature parameter β ą 0. For applications in statistics (e.g. in Bayesian statistics),´βV is referred to as the log-likelihood. In this article, the function V : T d Ñ R is assumed to be smooth. In order to estimate integrals of the type ş ϕdµ ‹ , with ϕ : T d Ñ R, probabilistic methods are used, especially when d is large. The Markov Chain Monte Carlo (MCMC) method consists in interpreting the integral as the (almost sure) limit where`W t˘t ě0 is a d-dimensional Wiener process. In practice, discrete-time Markov processes, defined for instance using the Metropolis-Hastings algorithm or an Euler-Maruyama scheme applied to the overdamped Langevin process, are employed. The question of discretisation errors for MCMC algorithms is a classical topic that will not be addressed in this work. The convergence to equilibrium requires that the Markov process explores the entire energy landscape, which may be a very slow process. Indeed, in practical problems, the dimension d, i.e. the number of degrees of freedom in the system, is very large, and the probability distribution µ ‹ is multimodal: the function V admits several local minima (interpreted as potential energy wells) and β is large. In that situation, the Markov process is metastable: when it reaches an energy well, it tends to stay there for a long time (whose expectation goes to infinity when β goes to infinity) before hopping to another energy well. Asymptotic results for the exit time from energy wells when β Ñ 8 are given by Eyring-Kramers type formulas [18,35]. The metastability of the process substantially slows down the exploration of the energy landscape, hence the convergence when T Ñ 8 towards the target quantity ş ϕdµ ‹ .
In the development of Monte Carlo methods in the last decades, many techniques have been studied in order to efficiently sample multimodal distributions. The bottomline strategy to enhance sampling consists in biasing the dynamics and in reweighting the averages: indeed, for any smooth functionṼ : T d Ñ R, one has ş ϕe´β pV´Ṽ q e´βṼ ş e´β pV´Ṽ q e´βṼ " lim tÑ8 ş t 0 ϕpX s qe´β pV pXsq´Ṽ pXsqq ds ş t 0 e´β pV pXsq´Ṽ pXsqq ds , where the biased dynamics is given by dX t "´∇Ṽ pX t qdt`a2β´1dW t . This is nothing but an Importance Sampling method, and choosing carefully the functionṼ may substantially reduce the computational cost. Indeed, if the distribution with density proportional to e´βṼ pxq is not multimodal, the biased processX t converges to equilibrium and explores the state space faster than the unbiased process X t . In the sequel, we explain how to chooseṼ in order to benefit from the importance sampling strategy.
From now on, in order to simplify the notation, β " 1. In addition, without loss of generality, assume that ş T d e´V pxq dx " 1.
Analysis of an ABF method based on self-interacting dynamics Precisely, let ξ : T d Ñ T m be a smooth function, which is referred to as the reaction coordinate (following the terminology employed in the molecular dynamics community). Let us stress that the identification of appropriate reaction coordinates is a delicate question, which depends on the system at hand. The problem of automatic learning of good reaction coordinates currently generates a lot of research, see for instance [17,19] and the recent review [28]. We do not consider this question in the sequel. The biasing potential in the importance sampling schemes considered in this work will be of the typeṼ pxq " V pxq´Apξpxqq, where A : T m Ñ R. In practice, the number of macroscopic variables m is very small compared to the dimension d of the model (which describes the full microscopic system). As will be explained below, without loss of generality, we assume that ξpxq " ξpy, zq " z for all x " py, zq P T d´mˆTm .
This expression for the reaction coordinate simplifies the presentation of the method, however considering more general reaction coordinates ξ is possible up to adapting some definitions below. To explain the construction of the method and to justify its efficiency, we assume that the reaction coordinate is representative of the metastable behavior of the system: roughly, this means that only the exploration in the z variable is affected by the metastability, whereas the exploration in the y variable is much faster (see the discussion at the end of Section 2.3).
In this framework, the fundamental object is the free energy function A ‹ defined as follows: for all z P T m , For general considerations on the free energy and related computational aspects, we refer to [39,40]. By construction, if X " pY, Zq is a random variable with distribution µ ‹ , then the marginal distribution of Z is given by dν ‹ pzq " e´A ‹pzq dz.
Introduce the notation pY 0 t , Z 0 t q " X 0 t for the solution of the overdamped Langevin dynamics in the sense of weak convergence in the set PpT m q of probability distributions on T m .
Since the reaction coordinate is representative of the metastability of the system, this convergence shares the same computational issues as when considering the full process X 0 .
A much better performance can be attained considering the following biased dynam- Define the associated empirical measures on T d and T m respectively: i.e. at the limit the distribution of Z ‹ t is uniform on T m . This observation, which is referred to as the flat histogram property in the literature devoted to applications, means that the process X ‹ does not suffer from slow convergence to equilibrium due to energy barriers, compared to the process X 0 (see the end of Section 2.3 for more details on this question).
In practice, the free energy function A ‹ is not known, thus the ideal approach described above is not applicable. In fact, in many applications, the real objective is the computation of the free energy function. One of the important features of many free energy computation algorithms, such as the one studied in this work, is to compute an approximation of the free energy function on-the-fly, and to use this approximation to enhance sampling. Checking that such adaptive algorithms are efficient and consistent requires careful mathematical analysis.
In this article, we consider a class of adaptive biasing methods, where the dynamics is of the form where the function A t depends on time t, approximates A ‹ when t Ñ 8, and is defined in terms of the empirical measure The process`X t˘t ě0 "`Y t , Z t˘t ě0 is not a Markov process, instead it is a self-interacting diffusion process. The precise construction of the algorithm studied in this article is provided below. This article is organized as follows. The construction of the algorithm (2.5) studied in this work is presented in Section 2 below. The main result, Theorem 2.3, is stated in Section 2.3, and a comparison with the literature is given. Section 3 gives a proof of the well-posedness of the self-interacting dynamics (2.5) (Proposition 2.2). Section 4 exhibits the limiting flow (obtained by applying the ODE method) and establishes the asymptotic pseudotrajectory property. Finally, Section 5 provides the final crucial ingredients for the proof of the main result, Theorem 2.3: a PDE estimate which provides some uniform bounds, and a global asymptotic stability property for the limiting flow.

The Adaptive Biasing Force algorithm
The objectives of this section are to define the Adaptive Biasing Force method [22] studied in this article, and to state the main results.
Recall the definitions (1.1) and (1.2) of the target distribution µ ‹ and of the free energy A ‹ respectively. The potential energy function V is assumed to be of class C 8 .
The reaction coordinate ξ : T d Ñ T m satisfies ξpy, zq " z for all x " py, zq P T d .
This expression substantially simplifies the presentation compared with a more general choice of ξ : T d Ñ R m . In applications, this is not restrictive, and consists in considering the so-called extended ABF algorithm [27]. Precisely, an auxiliary variable Z is added to the state space, the extended potential energy function for X " pX, Zq is given by V pXq " V pXq`1 2σ 2 |ξpXq´Z| 2 , where σ ą 0 is a small parameter, and one sets ξpXq " Z.

Construction
The definition of the algorithm requires us to make precise how in the evolution equation (1.4), the biasing potential function A t , or its gradient ∇A t , is determined in terms of the empirical distribution µ t given by (1.5). The algorithm is based on the following identity: the gradient ∇A ‹ of the free energy function A ‹ defined by (1.2) is given by More generally, let A : T m Ñ R be a smooth function, and let dµ A ‹ pxq9e Apzq dµ ‹ py, zq be Then one has the identity The expressions for the gradient of the free energy function in equations (2.1) and (2.2) are simpler than (for instance) the expressions (5) and (6) in [38] which hold for a general reaction coordinate mapping ξ, whereas we consider only the case ξpy, zq " z. The occupation measures µ t defined by (1.5) are in general singular with respect to the Lebesgue measure on T m . In order to define the mapping µ t Þ Ñ A t , we introduce a regularization kernel K , depending on the parameter P p0, 1s, such that Indeed, formally, the expression (2.1) for ∇A ‹ is obtained with the kernel K pz, z 1 q replaced by a Dirac distribution δpz´z 1 q. See Assumption 2.1 below for precise conditions on the kernel function K . For every P p0, 1s and µ P PpT d q, define the mapping F rµs : T m Ñ R m as follows: F rµsp¨q " ť ∇ z V py, zqK pz,¨qdµpy, zq ť K pz,¨qdµpy, zq .
Due to the action of the regularization kernel K , in general F rµs cannot be written as a gradient. For instance if m " 1, a smooth function F : T Ñ R is a gradient if and only if its average value is zero ş F pzqdz " 0; in general, ş F rµspzqdz ‰ 0. The last ingredient in the construction is a projection operator P, such that one defines ∇A rµs " PpF rµsq. More precisely, for every P p0, 1s and µ P PpT d q, define the mapping A rµs as follows: where H 1 pT m q is the Sobolev space W 1,2 pT m q of real-valued square integrable functions on T m , whose gradient belongs to L 2 pT m q. As will be explained below, A rµs is solution of an elliptic PDE. Note that F rµs and A rµs are functions depending only on z P T m , with a dimension m much smaller than d the total number of degrees of freedom of the system. Typically, one has m P t1, 2, 3u, which makes it possible to use the algorithm in practice.
We are now in position to define the process considered in this article: it is the solution of the system Arbitrary (deterministic) initial conditions Y 0 " y 0 P T d´m , Z 0 " z 0 P T m (and thus µ 0 " δ py0,z0q and A 0 " A rµ 0 s) are provided. This process belongs to the class of selfinteracting diffusions, see [13,14,15,16] for standard references. Note that such processes (also called reinforced dynamics) have been used and studied for other stochastic algorithms, for instance for approximating quasi-stationary distributions of killed Markov processes [41,11]. In particular, although we stick in the present work to continuous-time settings, we remark that arguments similar to ours can be applied to discrete-time chains, as in [10].

Well-posedness of the system (2.5)
Recall that V : T d Ñ R is assumed to be of class C 8 . Let us first state the assumptions satisfied by the kernel function K . For all z P T m , one has In addition, if ψ : T d Ñ R is a continuous function, one has ĳ T d ψpy, z 1 qK pz 1 , zqdydz 1 Ñ Ñ0 ż T d´m ψpy, zqdy , @ z P T m .
Finally, there exists c K P p0, 8q, such that Define m " min z,z 1 PT m K pz 1 , zq and M pkq " max z,z 1 PT m |∇ k z K pz 1 , zq|`max z,z 1 PT m |∇ k z 1 K pz 1 , zq|, where k is a nonnegative integer and ∇ k denotes the derivative of order k. Owing to Assumption 2.1, one has m ą 0 and M pkq ă 8 for all P p0, 1s. However these estimates are not uniform with respect to , i.e. inf m . Using the bounds and passing to the limit Ñ 0, then one obtains m ď ψpzq ş ψpz 1 qdz 1 ď M for every non-negative function ψ and all z P T m . Taking the infimum and supremum over all such functions ψ gives m " 0 and M " 8.
Note that to establish the well-posedness of the system (2.5), where P p0, 1s is fixed, upper bounds are allowed to depend on . However, it will be crucial in Section 5 to derive some upper bounds which are uniform with respect to in order to prove the convergence when t goes to infinity of µ t and A t (to a limit depending on ), see The exact form of the kernel function K has no influence on the analysis below. Let us give an example: let K pz 1 , z 2 q " ś m j"1 k `z 2 j´z 1 j˘, where for all z P T" R{Z, s the so-called von-Mises kernel.
Owing to Assumption 2.1, it is straightforward to check that F rµs is of class C 8 , for any µ P PpT d q. Then the mapping A rµs is the solution of the elliptic linear partial differential equation ∆A rµs " divpF rµsq and standard elliptic regularity theory implies that A rµs is also of class C 8 . See Lemma 3.1 below for quantitative bounds (depending on ).

Proposition 2.2.
Under Assumption 2.1, for any initial conditions x 0 " py 0 , z 0 q P T d , the system (2.5) admits a unique solution, which is defined for all times t ě 0.
The proof of Proposition 2.2 is postponed to Section 3

Main result and discussion
The free energy is in fact defined up to an additive constant. Above, A ‹ has been normalized so that Theorem 2.3. Under Assumption 2.1, there exists 0 ą 0 and, for all p P r1,`8q, there exists C p P r0,`8q such that, for all P p0, 0 s, there exists a unique probability distribution µ 8 P PpT d q which satisfies In addition, one has the error estimate and, for any initial conditions x 0 " py 0 , z 0 q P T d , almost surely, one has the convergence the latter in the sense of weak convergence in the set PpT d q.
The first identity in Theorem 2.3 means that the limit µ 8 of µ t is the fixed point of the mapping µ Þ Ñ µ A rµs ‹ , see Section 4. Equivalently, the limit A rµ 8 The almost sure convergence results of Theorem 2.3 may be loosely rephrased as follows and implies that the empirical distribution ν t " 1 t ş t 0 δ ξpXsq ds satisfies the approximate asymptotic flat-histogram property lim Ñ0 lim tÑ8 ν t " dz.
We stress that µ 8 is not close (when Ñ 0) to the multimodal target distribution µ ‹ : with the notation above one has µ ‹ " µ 0 ‹ ‰ µ A‹ ‹ . However, the algorithm gives a way to approximate ş ϕdµ ‹ by reweighting: using the Cesaro Lemma, it is straightforward to Up to an error depending only on the width ą 0 of the kernel function K , the adaptive algorithm (2.5) is thus a consistent way to approximately compute ş ϕdµ ‹ , as well as the free energy function A ‹ . The approximate asymptotic flat-histogram property stated above shows that the sampling in the slow, macroscopic variable z is enhanced, hence the efficiency of the approach. Such results are a mathematical justification for the use of the ABF method based on self-interating dynamics in practical computations. Remark 2.5. The convergence of A t to A rµ 8 s when t Ñ 8 in fact holds for C k norms, for all integers k. However, the convergence ofĀ ‹´A rµ 8 s when Ñ 0 can be obtained only in W 1,p , for all p P r2, 8q (hence in C 0 due to a Sobolev embedding, for p ą m).
In fact, higher-order derivatives of F rµs (and of A rµs) are expected to blow up when Ñ 0.
From a theoretical point of view, several variants of the ABF algorithm have been considered in various works. In a series of papers [38,1,37,36], Lelièvre and his co-authors considered a process similar to (2.5) except that µ t is replaced by the law of X t . This corresponds to the mean-field limit of a system of N interacting particles as N goes to infinity [30]. The law of X t then solves a non-linear PDE, and long-time convergence is established through entropy techniques. In practice in fact, the biasing potential A t is obtained both from interacting particles and from interaction with the past trajectories, so that µ t is the empirical distribution of a system of N replicas of the system pX t , Y t q that contributes all to the same biasing potential A t .
The case of adaptive biasing algorithm with a self-interacting process is addressed in [26] for the ABF algorithm and in [7,8] for the related adaptive biasing potential (ABP) algorithm. We emphasize on the fact that in these works, µ t is replaced by a weighted empirical measureμ t given, in the spirit of an importance sampling scheme, bȳ Contrary to µ t in Theorem 2.3, this weighted empirical measure converges toward µ ‹ . This makes the theoretical study simpler than in the present case. However, in practice, there should be no reason to use this weighting procedure for ABF due to the identity (2.2). Indeed, provided that A t converges to some A 8 , in the idealized case where K is a Dirac mass, then (2.2) implies that necessarily A 8 " A ‹ . This is no longer true as soon as ą 0 (which is necessary for the well-posedness of the algorithm), and one of the main motivation of the present work was to determine whether the convergence of the natural (non re-weighted) version of ABF, which is the one used in practice, was robust with respect to the regularization step. Our results shows that this is true, provided is small enough.
Note that in the basic versions of the ABF algorithms, the biasing force is directly F rµ t s, without the projection to a gradient. This is not so important for one-dimensional reaction coordinates, but otherwise without projecting it is not possible to use the importance sampling reweighting procedure. Moreover, as studied in [2], the projection reduces the variance of the estimation. Finally, from a theoretical point of view, considering an overdamped Langevin diffusion with non-gradient drift of the form´∇V`F rµs (say for a fixed µ) leads to additional difficulties with respect to the gradient case. In particular, the invariant measure has no explicit form, which in our work is used several times, for instance for proving the crucial quantitative estimate of Proposition 5.7. It may be possible that the proof of Theorem 2.3 can be adapted to some extent to this non-gradient case, with explicit expressions replaced by general estimates on invariant measures of diffusions in terms of their drift, but at any rate this is not straightforward.
While the compact periodic case is rather standard in practical cases for molecular dynamics, applications in Bayesian stastistics like in [21] are more naturally set in R d .
The general strategy of the proof of Theorem 2.3 has already been applied to some self-interacting processes, see [32]. There are two possible ways to give sense to the ABF algorithm in a non-compact state: either the process is defined in the whole space but the biasing force is restricted to a compact subset (see [21,Section 3.4]), in which case adapting most of our proofs (under some suitable growth conditions on V at infinity) should not raise any particular difficulties; or alternatively the process and the adaptive force are both defined in the whole space, in which case an additional biasing confining potential is required (since the Lebesgue measure on a non-compact set has infinite mass, see the discussion in [38]). This second theoretical solution is not really practicable as it would require to keep in memory a function over a non-compact set. The case of a non-compact space with a biasing force defined in a compact subset is addressed for the ABP algorithm in [8]. Theorem 2.3 is only a qualitative result on the consistency of the algorithm, and as such it is not sufficient for comparing the efficiency of the algorithm with respect to classical non-adaptive MCMC. Quantitative results (like explicit convergence rates), which are classical for Markov processes, are much more difficult to establish for selfinteracting processes. In view of Remark 2.4, as far as the asymptotic quadratic risk of the estimator of ş ϕdµ ‹ is concerned, the question of efficiency is expected to boil down to the question of efficiency of the ideal importance sampling scheme where the biasing force is constant equal to ∇A ‹ , namely the solution of (1.3). The answer may depend on the observable ϕ but, for reversible processes such as the overdamped Langevin diffusion, bounds on the asymptotic variance are classically obtained from the spectral gap of the process. As the algorithm is meant to tackle metastability issues, a natural frame to discuss its efficiency is the low temperature regime. The question is thus to compare the Poincaré constant of the probability measures with respective log-density βV and βpV´A ‹ q as β Ñ`8. In the first case (the classical process) it is well-known to scale (up to a sub-exponential factor) as expp´βc˚q where c˚is the so-called critical depth of the potential. On the other hand, applying the results of [34], we see that the spectral gap of (1.3) can be obtained from the Poincaré inequality satisfied by the marginal law of the reaction coordinates on the one hand and by the conditional laws for fixed values of the reaction coordinates. The marginal law being uniform on the torus for all β, the scaling in β of the spectral gap of (1.3) is given by the scaling of the Poincaré inequality of the conditional laws, i.e. only the "orthogonal" metastability intervenes. The spectral gap of (1.3) then scales as expp´β sup zPT d c˚pzqq where c˚pzq is the critical depth of y Þ Ñ V py, zq. For a good choice of reaction coordinate, this can drastically improve the sampling rate. As a matter of fact, the ABF algorithm has been proving its efficiency for nearly 20 years in a large number of empirical studies, see e.g. [23,29,24,22].
Finally, let us discuss the parameter ε. The proof of Theorem 2.3 furnishes an explicit expression for ε 0 . Nevertheless, it is not completely clear that this provides a useful insight for a practical choice of the parameters. First, the explicit expressions of ε 0 and C p obtained from the proof have no reasons to be sharp, and moreover it is not clear whether our restriction on sufficiently small ε is a real limitation or simply an avatar of our particular theoretical proof. Second, the practical issue of the choice of the regularization kernel is closely related to the question of space discretisation, which is not addressed in Theorem 2.3 (see the discussion in [26,Section 4]). In order to design an asymptotically unbiased estimator of the free energy A ‹ , at least at the theoretical level, one could consider a process similar to (2.5) but with the constant parameter ε replaced by a vanishing function t Þ Ñ εptq. Provided that the decay of this function is sufficiently slow, and at the cost of additional technical arguments, it should be possible to extend the proof of Theorem 2.3 to this case. Again, the practical consequence that can be drawn from such a consideration are not straightforward. This may be investigated in future works.

Notation
Let N " t1, . . .u and N 0 " N Y t0u, and let k P N 0 be a nonnegative integer. Let C k pT n1 , R n2 q be the space of functions of class C k on T n1 with values on R n2 . The derivative of order k is denoted by ∇ k . The space C k pT n1 , R n2 q is equipped with the norm }¨} C k , defined by }φpxq}. To simplify, the dimensions n 1 and n 2 are omitted in the notation for the norm }¨} C k .
If φ : T n1 Ñ R n2 is a Lipschitz continuous function, its Lipschitz constant is denoted by Lippφq.
The space PpT d q of probability distributions on T d (equipped with the Borel σ-field) is equipped with the total variation distance d TV and with the Wasserstein distance d W1 .
Recall that one has the following characterizations: here for the total variation distance the supremum is taken over bounded measurable functions ψ.
The space PpT d q is also equipped with the following distance, which generates the topology of weak convergence: where the sequence S " tf n u nPN is dense in C 0 pT d , Rq, and, for all n P N, one has f n P C 8 and }f n } C 0 ď 1.

Proof of the well-posedness result Proposition 2.2
The objective of this section is to prove Proposition 2.2, which states that the system (2.5) is well-posed. Some auxiliary estimates are provided, where the upper bounds are allowed to depend on the parameter . Lemma 3.1 provides estimates for F rµs and A rµs, in C k , uniformly with respect to µ. Lemma 3.2 provides some Lipschitz continuity estimates with respect to µ, in total variation and Wasserstein distances.
Using the estimate above with ψ " ∇ z V and ψ " 1, it is then straightforward to deduce that This concludes the proof of the estimates for F rµs. To prove the estimates for A rµs, observe thatÃ rµs solves the Euler-Lagrange equation associated with the minimization problem in (2.4), ∆Ã rµs " div`F rµs˘.
Using the result proved above, and standard elliptic regularity theory and Sobolev embeddings (see for instance [20]), one obtains the required estimates forÃ rµs: for all P p0, 1s and k P N 0 , there exists C ,k P p0, 8q such that for all µ P PpT d q, }Ã rµs} C k pT m ,Tq ď C ,k .
Since A rµs andÃ rµs only differ by an additive constant, it only remains to prove that }A rµs} C 0 pT m ,Tq ď C ,0 .
This is a straightforward consequence of the estimate }Ã rµs} C 0 pT m ,Tq ď C ,0 and of (2.4).
This concludes the proof of Lemma 3.1.
Using the characterizations of total variation and Wasserstein distances and the regularity properties of V and K (Assumption 2.1), proceeding as in the proof of Lemma 3.1 then yields It remains to apply the same arguments as in the proof of Lemma 3.1 to obtain }Ã rµ 2 s´A rµ 1 s} C k pT m ,Tq`} A rµ 2 s´A rµ 1 s} C k pT m ,Tq ď L ,k dpµ 1 , µ 2 q, which concludes the proof of Lemma 3.2.

Well-posedness
Let T P p0, 8q be an arbitrary positive real number. Introduce the Banach spaces Cpr0, T s, T d q , E " L 2`Ω , Cpr0, T s, T d q˘, equipped with the norms defined by depending on the auxiliary parameter α P p0, 8q. Let Φ : E Ñ E be defined as follows: for all x "`y t , z t˘t ě0 , let µ x t " 1 t ş t 0 δ xs ds and A x t " A rµ x t s, for all t ě 0. Then X " Φpxq is the solution X "`Y ptq, Zptqq tě0 of # dY ptq "´∇ y V py t , z t qdt`?2dW pd´mq ptq, dZptq "´∇ z V py t , z t qdt`∇A x t pz t qdt`?2dW pdq ptq, with initial condition pY p0q, Zp0qq " x 0 P T d , which is fixed. If α is sufficiently large, then the mapping Φ is a contraction, due to Lemma 3.3 stated below.

Lemma 3.3.
There exists C P p0, 8q such that for all α P p0, 8q, and for all x 1 , x 2 P E, Proof of Lemma 3.3. Let x 1 " py 1 , z 1 q and x 2 " py 2 , z 2 q be two elements of E, and set for all t ě 0, one has the almost sure estimate Second, similarly one has, for all t ě 0, Finally, one obtains the almost sure estimate, then taking expectation concludes the proof of Lemma 3.3.
The proof of Proposition 2.2 is then straightforward.
Proof of Proposition 2.2. Observe that the following claims are satisfied.
• Owing to Lemma 3.1, for all x P E, one has the almost sure estimate sup tě0 }∇A x t } C 0 ď C ,0 , and owing to Lemma 3.2, the mapping t Þ Ñ A x t is Lipschitz continuous. Thus the mapping Φ is well-defined.
• The process`Y ptq, Zptq, A t , µ t˘t ě0 solves (2.5) if and only if X " pY, Zq is a fixed point of Φ. • The mapping Φ : E Ñ E is a contraction if α is sufficiently large, and admits a unique fixed point X, owing to Lemma 3.3.
Since the initial conditions x 0 and µ 0 , and the time T P p0, 8q are arbitrary, these arguments imply that the global well-posedness of (2.5) and this concludes the proof.

The limiting flow
Define the mapping Π : µ P PpT d q Þ Ñ Π rµs P PpT d q, for P p0, 1s, as follows: Π rµs " Z rµs´1e´V py,zq`A rµspzq dydz, with Z rµs " ť e´V py,zq`A rµspzq dydz. The notation V µ py, zq " V py, zq´A rµspzq is used in the sequel. The probability measure Π rµs is the unique invariant distribution for the The objectives of this section are twofold. First, one proves that, for every π P PpT d q, there exists a unique solution`Φ pt, πq˘t ě0 of the equation Φ pt, πq " e´tπ`ż t 0 e s´t Π rΦ ps, πqsds.
where the last inequality follows from Lemma 3.2. This concludes the proof of Lemma 4.1. Ψpπqptq " e´tπ`ż t 0 e s´t Π rπ s sds, for π "`π t˘t ě0 . Let d α pπ 1 , π 2 q " sup tě0 e´α t d T V pπ 1 t , π 2 t q, where α ą 0 is chosen below. Then, using Choose α " 2Lp q, and define π 0 "`π 0 t " π˘t ě0 , π n`1 " Ψpπ n q, n ě 0, using the Picard iteration method. Let T P p0, 8q be an arbitrary positive real number. Since C`r0, T s, PpT d q˘is a complete metric space (equipped with the distance d α ), theǹ π n˘n PN converges when n Ñ 8, and the limit π 8 solves the fixed point equation π 8 " Ψpπ 8 q, which proves the existence of a solution, and concludes the proof.
By construction, the flow Φ : R`ˆPpT d q Ñ PpT d q is continuous, when PpT d q is equipped with the total variation distance d TV . Adapting the proof of [13,Lemma 3.3], one checks that it is also a continuous mapping when PpT d q is equipped with the distance d w .

The asymptotic pseudotrajectory property
Recall that a continuous function ζ : R`Ñ PpT d q is an asymptotic pseudotrajectory for Φ , if one has sup sPr0,T s d w`ζ pt`sq, Φ ps, ζptqq˘Ñ tÑ8 0, for all T P R`. See for instance [6] for details.
The following result is the rigorous formulation of the link between the dynamics of the empirical measures µ t in the ABF algorithm, and of the limit flow. The proof requires auxiliary notations and results. For every ą 0 and µ P PpT d q, let V µ py, zq " V py, zq´A rµspzq, and define the infinitesimal generator L µ " ∆´∇V µ¨∇ .
Introduce the projection operator defined by K µ f " f´ş f dΠ rµs and let`P ,µ t˘t ě0 be the reversible semi-group generated by L µ on L 2 pT d q (see e.g. [4,Chapter 3]). Finally, let Q µ " Then one has the following result.

Lemma 4.4.
For every ą 0, there exists C P p0, 8q, such that for all f P C 0 pT d , Rq and all µ P PpT d q. Moreover, L µ Q µ "´K µ .
We refer to [13, Lemma 5.1] for a similar statement. A detailed proof is provided below since the structure of the self-interaction is different. Note that several references to [4] are made in the proof.
Proof. Remark that, from Lemma 3.1, V ε µ P C 8 pT d q, from which it is classical to see that P ,µ t f P C 8 pT d q for all f P C 8 pT d q. In particular, C 8 pT d q is a core for L µ , see [4, Section 3.2] and thus it is enough to prove the result for f P C 8 pT d q.
As a first step, for all P p0, 1s there exists R ą 0 such that for all µ P PpT d q, Π rµs satisfies a log-Sobolev inequality and a Sobolev inequality both with constant R , in the sense that for all positive f P C 8 pT d q, where p " 2d d´2 . Indeed, from Lemma 3.1, the density of Π rµs with respect to the Lebesgue measure is bounded above and below away from zero uniformly in µ P PpT d q.
The inequalities are then obtained by a perturbative argument from those satisfied by the Lebesgue measure, see [4, Proposition 5.1.6]).
As a second step, these inequalities imply the following estimates: for all P p0, 1s there exists R 1 ą 0 such that for all PpT d q, f P C 8 pT d q and t ě 0, Indeed, the first estimate is a usual consequence of the Poincaré inequality, implied by the log-Sobolev one (see [ for some c ą 0 which does not depend on µ P PpT d q thanks to Lemma 3.1. According to [4,Theorem 4.7.2], this implies that which concludes the proof of the third estimate. As a third step, we bound (using that }P ,µ t f } 8 ď }f } 8 for all t ě 0) and similarly Analysis of an ABF method based on self-interacting dynamics from which Q µ f is well defined for f P C 8 pT d q and satisfies (4.1) for some C . Finally, where we used again that }P ,µ t K µ f } 8 ď R 1 e´R pt´1q{2 }K µ f } L 2 pΠ ε rµsq for t ě 1, which vanishes at infinity.
Proof of Theorem 4.3. First, note that the claim is equivalent to the following statement (see [13,Proposition 3.5]): for all f P S and T P Q`, where ε t psq " ż e t`s e t δ Xτ´Π rµ τ s τ dτ and we recall that S " tf n u nPN is a dense sequence in C 0 pT d , Rq such that, for all n P N, one has f n P C 8 and }f n } C 0 ď 1.
Using a Borel-Cantelli argument, and the fact that S is a countable set, it is sufficient to establish that there exists C P p0, 8q, such that for all t ě 0 and f P S. Let f P S and introduce the function F : p0, 8qˆT d Ñ R defined by F pt, xq " t´1Q µt f . Then F is of class C 1,2 on p0, 8qˆT d . Indeed, first, it is straightforward to check that t Þ Ñ F rµ t s P C k pT d , R m q is of class C 1 , for all k P N 0 , since t Þ Ñ µ t P PpT d q (equipped with the Wasserstein distance) is of class C 1 . Second, A rµs is solution of the Euler-Lagrange equation ∆A rµs " divpF rµsq, which establishes that t Þ Ñ A rµ t s P C k pT m , Rq is also of class C 1 . Finally, it remains to apply standard arguments to establish the C 1 regularity of t Þ Ñ Q µt f . Applying Itô formula yields, for all t ě 0 and s P r0, T s, the equality F pe t`s , X e t`s q " F pe t , X e t q`ż e t`s e t`B τ`L µτ˘F pτ, X τ qdτ`?2 ż e t`s e t x∇F pτ, X τ q, dW pτ qy.
One has the estimate Finally, it is straightforward to check that using the identity 9 µ t " 1 t pδ Xt´µt q.
As a consequence, one obtains sup 0ďsďT It remains to deal with the error term ε 4 t psqf . Using Doob inequality implies This concludes the proof of the claim, for all t ě 0 and f P S.
Applying a Borel-Cantelli argument then concludes the proof.

Proof of Theorem 2.3
The objective of this section is to give a detailed proof of Theorem 2.3. There are two main ingredients. The first one is Proposition 5.3 below, which provides a uniform estimate over ą 0 for A rµs, in the C 0 norm (compare with Lemma 3.1 where the upper bound may depend on ). The second key ingredient is Proposition 5.7, which states a contraction property for the mapping Π , for an appropriate distance, for sufficiently small , when restricted to an attracting set identified below (compare with Lemma 4.1 which is valid on the entire state space, but where no upper bound for Lp q holds).
Combining these two ingredients provides a candidate for the limit as t Ñ 8, using a standard Picard iteration argument. Using Theorem 4.3 (asymptotic pseudo-trajectory property) then proves the almost sure convergence of µ t to this candidate limit.

Uniform estimate
The following PDE estimate is crucial for the analysis.
Proposition 5.1. Let m P N. For every p P r2, 8q, there exists C p P p0, 8q, such that the following holds: let F : T m Ñ R m be a continuous function, then the solution A of the elliptic PDE ∆A " divpF q, with the condition ş Apzqdz " 0, satisfies }A} W 1,p pT m ,Rq ď C p }F } C 0 pT m ,R m q , and if p ą m, then }A} C 0 pT m ,Rq ď C p }F } C 0 pT m ,R m q .
Proof. The proof combines three arguments.
• If p ą m, then by Sobolev embedding properties, one has with C p P p0, 8q.
• By elliptic regularity theory, one has }∇A} L p pT m ,R m q ď C p }F } L p pT m ,R m q ď C p }F } C 0 pT m ,R m q , with C p P p0, 8q, see [3,Theorem 15.12].
Using Proposition 5.1, one gets the following crucial estimate, which is uniform for ą 0 (contrary to those given in Lemmas 3.1, 3.2 and 4.1 above).
Proposition 5.3. One has the following estimate: Proof. Using Proposition 5.1 above, it suffices to check that That estimate is a straightforward consequence of the definition 2.3, of the boundedness of ∇ z V , and of the positivity of the kernel function K .

Attracting set
Introduce the following notation: for all B P CpT m , Rq, let dµ B py, zq " Z´1 B e´V py,zq`Bpzq dydz P PpT d q, with Z B " ť e´V py,zq`Bpzq dydz " ş e´A ‹pzq`Bpzq dz. First, for probability distribution of the form µ B , one has the following useful identity for F rµ B s. Proof. This is a straightforward consequence of the two identities below: for all z P T m , ż e´V py,zq dy " e´A ‹pzq , ż ∇ z V py, zqe´V py,zq dy "´∇ˆż e´V py,zq dy˙" e´A ‹pzq ∇A ‹ pzq.
The set of the probability distribution of the type µ B is an attractor for the dynamics of the limit flow, more precisely one has the following result.
Lemma 5.6. For every p P r2, 8q, there exists C p P p0, 8q, such that for every ą 0, and every B P CpT m , Rq, one has }A rµ B s´Ā ‹ } W 1,p pT m q ď C p ? e 2`}B} C 0`}A‹} C 0˘.
Proof. Using Proposition 5.1, one has the following inequality: Owing to Lemma 5.4 and using the Lipschitz continuity of A ‹ , for all z P T m , one haš owing to Assumption 2.1. This inequality concludes the proof.

Contraction property on the attracting set
Let M P p0, 8q. Introduce the set Owing to Proposition 5.3, if M ě M 0 , then A rµs P B M for every µ P PpT d q and ą 0.
Introduce the notation h B py, zq " Z´1 B e´V py,zq`Bpzq andΠ rh B s " h A rµ B s , so that h B andΠ rh B s are the density with respect to the lebesgue measure of, respectively, µ B and Π rµ B s.
In addition, using the Poincaré inequality and the definition of A rµs as the orthogonal projection in L 2 of F rµs, one has }A rµ B 1 s´A rµ B 2 s} 2 ď C}F rµ B 1 s´F rµ B 2 s} 2 .
Analysis of an ABF method based on self-interacting dynamics Then, using Lemma 5.4, one obtains, for all z P T m , using Lipschitz continuity of ∇A ‹ , and the lower bound One has }B 1 } C 0 ď M and }B 2 } C 0 ď M . In addition, owing to Assumption 2.1, one has ş |z 1´z |K pz 1 , zqdz 1 ď C ?
On the one hand, Choosing a sufficiently small parameter η one finally obtains the claim above.
Gathering the estimates finally concludes the proof of the estimate

Proof of the main result
The first part of this section is devoted to the construction of the candidate limits µ 8 and A 8 " A rµ 8 s, of µ t and A t respectively, for small enough .
Let¯ 0 " 1{pC 2 M p0q`1 q, where M " M p0q is given by Proposition 5.3 and C M is given by Proposition 5.7.
Let P p0,¯ 0 s, and consider A p0q P B M p0q . Define µ p0q " µ A p0q , h p0q " h A p0q , and by recursion, for all nonnegative integer k, let µ pk`1q " Π rµ pkq s , h pk`1q "Π rh pkq s, and let A pkq " A rµ pkq s. Then one has h pkq " h A pkq P B M p0q . We claim that`µ pkq˘k ě0 is a Cauchy sequence in the space PpT d q equipped with the total variation distance d TV . Indeed, for all k, ě 0, one has d TV pµ pkq , µ pk` q q ď }h pkq´hpk` q } 2 ď`C M p0q ? ˘k d 2 ph p0q , h p q q ď Cρ k , with ρ P p0, 1q. As a consequence, there exists µ 8 such that d TV pµ pkq , µ 8 q Ñ kÑ8 0.
Owing to Lemma 4.1, the mapping Π is continuous on PpT d q equipped with d TV , thus µ 8 " Π rµ 8 s. This implies that µ 8 " h A ‹ pxqdx where A ‹ " A rµ 8 s P B M p0q . It is then straightforward to check that h 8 " h A 8 is the unique fixed point of the mappingΠ (uniqueness is a consequence of Proposition 5.7).
We claim that, for any initial condition of the type µ B , then Φ pt, µ B q Ñ tÑ8 µ 8 , more precisely one has exponential convergence to the fixed point µ 8 : there exists cp q P p0, 8q such that, for all t ě 0, one has sup BPB M p0q d TV pΦ pt, µ B q, µ 8 q ď Ce´c p qt . (5.2) To prove this claim, observe that for all t ě 0, the probability distribution Φ pt, µ B q can be written as µ Bt , where B t P C 0 , see Proposition 5.5, and without loss of generality ş B t pzqdz " 0. In addition, B t P B M p1q , for all t ě 0, for some M p1q P p0, 8q depending only on M p0q : indeed, the identity and B t pzq is equal (up to an additive constant defined to respect the condition ş B t pzqdz " 0) to A ‹ pzq`log`ş e´V py,zq dyq.
Let 0 " 1{pC 2 M p1q`1 q, and assume in the sequel that P p0, 0 s. Note that M p1q ě M p0q , thus 0 ď¯ 0 . Then A 8 is well-defined, h 8 is the unique fixed point ofΠ , and one obtains d TV pΦ pt, µ B q, µ 8 q ď }h Bt´h  The idea of the proof, using concepts and tools developed in [6] may be described as follows. Since almost surely`µ t˘t ě0 is an asymptotic pseudo-trajectory for the semi-flow Φ , one has the following property: the limit set Lpµq is an attractor free set for the semiflow Φ in PpT d q, in particular it is invariant, i.e. for all t ě 0 one has Φ pt, Lpµqq " Lpµq. Let us check that Lpµq " tµ 8 u. First, introduce the set M " tµ B , B P CpT m , Rqu. Then Proposition 5.5 provides the inclusion Lpµq Ă M. Indeed, let ν P Lpµq and let t ě 0 be arbitrary, then by invariance there existsν P Lpµq such that ν " Φ pνq, thus dpν, Mq " dpΦ pνq, Mq ď 2e´t Ñ tÑ8 0. Similarly, let ν P Lpµq Ă M, and let t ě 0 be arbitrary, then ν " Φ pνq for someν P M. Thus dpν, µ 8 q " dpΦ pt,νq, Φ pt, µ 8 qq ď Ce´c t Ñ tÑ8 0.
Let us now provide a detailed proof using only the results presented above.
Since T 1 and T 2 are arbitrary, letting first T 2 Ñ 8, then T 1 Ñ 8, one has almost surely lim sup tÑ8 d w pµ e t , µ ‹ q " 0, which concludes the proof of the weak convergence of µ t to µ 8 .
It remains to check that A t " A rµ t s converges to A 8 " A rµ 8 s, in C k , for all k P N. This is a consequence of the regularity properties of K and of V , which proves that µ P pPpT d q, d w q Þ Ñ F rµs P C k is continuous for all k P N.
Using Sobolev embedding properties, as in the proof of Lemma 3.1, then concludes the proof.