Large Deviation Principle for invariant distributions of Memory Gradient Diffusions

In this paper, we consider a class of diﬀusion processes based on a memory gradient descent, i.e. whose drift term is built as the average all along the past of the trajectory of the gradient of a coercive function U . Under some classical assumptions on U , this type of diﬀusion is ergodic and admits a unique invariant distribution. In view to optimization applications, we want to understand the behaviour of the invariant distribution when the diﬀusion coeﬃcient goes to 0. In the non-memory case, the invariant distribution is explicit and the so-called Laplace method shows that a Large Deviation Principle (LDP) holds with an explicit rate function, that leads to a concentration of the invariant distribution around the global minima of U . Here, except in the linear case, we have no closed formula for the invariant distribution but we show that a LDP can still be obtained. Then, in the one-dimensional case, we get some bounds for the rate function that lead to the concentration around the global minimum under some assumptions on the second derivative of U .


Introduction
The aim of this paper is to study some asymptotic properties of a diffusive stochastic model with memory gradient when the noise component vanishes.The evolution is given by the following stochastic differential equation (SDE)on R d : where ε > 0 and (B t ) is a standard d-dimensional Brownian motion.A special feature of such equation is the integration over the past of the trajectory depending on a function k which quantifies the amount of memory.Our work is mainly motivated by optimization considerations.Indeed, in a recent work Cabot, Engler, and Gadat (2009b) have shown that the solution of the deterministic dynamical system (ε = 0) converges to the minima of the potential U .Without memory, that is without integration over the past of the trajectory, the model (1.1) reduces to the classical gradient descent model and such convergence results are well-known.Even in the deterministic framework, a potential interest of the gradient with memory is the capacity of the solution to avoid some local traps of U .Indeed, the solution of (1.1) (when ε = 0) may keep some inertia even when it reaches a local minimum of U which implies a larger exploration of the space than a classical gradient descent which cannot escape from local minima (see Alvarez (2000) and Cabot (2009)).Usually such property is obtained by introducing a small noise term.
In the classical case, this leads to the usual following SDE (1.2) As mentioned above, the behaviour of the invariant distribution of this model when ε goes to 0 is well-known.Using the so-called Laplace method, it can be proved that a Large Deviation Principle (LDP) holds and that the invariant distribution of (1.2) concentrates on the global minima of U when the parameter ε → 0 (see e.g.Freidlin and Wentzell (1979)).
It is then natural to investigate the study of the stochastic memory gradient (1.1) in order to obtain similar results.A major difference with the usual gradient diffusion is that the integration over the past of the trajectory makes the process (X ε t ) t≥0 non Markov.This can be overcome with the introduction of an auxiliary (Y ε t ) defined by In general, the couple Z ε t = (X ε t , Y ε t ) gives rise to a non-homogeneous Markov process (see Gadat and Panloup (2012)).In order to consider the notion of invariant measure, we concentrate on the case where k(t) = e λt which turns (Z ε t ) into an homogeneous Markov process.In this context, Gadat and Panloup (2012) have shown existence and uniqueness of the invariant measure ν ε for (Z ε t ).
In the present work, our objective is to obtain some sharp estimations of the asymptotic behaviour of (ν ε ) as ε → 0.More precisely, we shall first show that (ν ε ) ε>0 satisfies a Large Deviation Principle.Then, we will try to obtain some sharp bounds for the associated rate function in order to understand how the invariant probability is distributed as ε → 0. In particular, we will establish the concentration around the global minima of U up to technical hypotheses.In the classical setting of (1.2), this is an essential step towards implementing the strategy of the so-called simulated annealing.Developing a simulated annealing optimization procedure to the memory gradient diffusion is certainly a motivation of the study of (1.1).This will be addressed in a forthcoming work.
The paper is also motivated by extending some results of Large Deviations for invariant distributions to a difficult context where the process is not elliptic and the drift vector field is not the gradient of a potential.These two points and especially the second one strongly complicate the problem since explicit computations of the invariant measure are generally impossible.This implies that the works on elliptic Kolmogorov equations by Chiang, Hwang, and Sheu (1987), Miclo (1992) or Holley and Stroock (1988) for instance, can not be extended to our context.For similar considerations in other non-Markov models, one should also mention the recent works on Mac-Kean Vlasov diffusions by Herrmann and Tugaut (2010) and on self-interacting (with attractive potential) diffusions by Raimond (2009).
Here, in order to obtain a LDP for (ν ε ) ε≥0 we adapt the strategy of Puhalskii (2003) and Freidlin and Wentzell (1979) to our degenerated context.We shall first show a finite-time LDP for the underlying stochastic process.Second we prove the exponential tightness of (ν ε ) ε≥0 by using Lyapunov type arguments.Finally, we show that the associated rate function, denoted as W in the paper, can be expressed as the solution of a control problem (in an equivalent way as the solution of a Hamilton-Jacobi equation).However, at first sight the solution of the control problem is not unique.This uniqueness property follows from an adaptation of the results of Freidlin and Wentzell (1979) to our framework.In particular, we obtain a formulation of the rate function in terms of the costs to join stable critical points of our dynamical system.Next, the second step of the paper (sharp estimates of W ) is investigated by the study of the cost to join stable critical points.
The paper is then organized as follows.In Section 2, we recall some results about the longtime behaviour of the diffusion when ε is fixed.Moreover, we provide the main assumptions needed for obtaining the LDP for (ν ε ).In Section 3, we prove the exponential tightness of (ν ε ) and show that any rate function W associated with a convergent subsequence is a solution of a finite or infinite time control problem.In Section 4, we prove the uniqueness of W by adapting the Freidlin and Wentzell approach to our context (see also the works of (Biswas & Budhiraja, 2011) and (Cerrai & Röckner, 2005) for other adaptations of this theory).Since the study of the cost function is quite hard in a general setting, we focus in Section 5 on the case of a doublewell potential U .In this context, we obtain some upper and lower bounds for the associated quasi-potential function.Then, we provide some conditions on U and on the memory parameter λ which allow us to prove the concentration of the invariant distribution around the global minima.Note that, even if our assumptions in this part seem a little bit restrictive, the proofs of the bounds (especially the lower bound) are obtained by an original (and almost optimal) use of some Lyapunov functions associated with the dynamical system.
Acknowledgments: The authors would like to thank Guy Barles for its hospitality, and are grateful to Guy Barles, Laurent Miclo and Christophe Prieur for helpful discussions and comments.

Notations and background on Large Deviation theory
In the paper, the scalar product and the Euclidean norm on R d are respectively denoted by , and | .|.The space of d × d real-valued matrices is referred as M d (R) and we use the notation . for the Frobenius norm on M d (R).
We denote by H(R + , R d ) the Cameron-Martin space, i.e. the set of absolutely continuous functions ϕ : R + → R d such that ϕ(0) = 0 and such that φ ∈ L 2,loc (R + , R d ).
For a C 2 -function f : R d → R, ∇f and D 2 f denote respectively the gradient of f and the Hessian matrix of f .In the one-dimensional case, we will switch to the notation f ′ and f " in order to emphasize the difference with Obviously these notations are naturally extended to ∇ y f , D 2 x,y f and D 2 y f .Finally, for any vector v ∈ R d , v t will refer to the transpose of v.For a measure µ and a µ-measurable function f , we set µ(f ) = f dµ.
Let us now recall some definitions relative to the Large Deviation theory (see Dembo and Zeitouni (2010) for further references on the subject).Let (E, d) denote a metric space.A family of probability measures (ν ε ) ε>0 on E satisfies a Large Deviation Principle (shortened as LDP) with speed r ε and rate function I if for all open set O and closed set F , The function I is referred to be good if for any c ∈ R, {x ∈ E, I(x) ≤ c} is compact.In this paper, we will use some classical compactness results in Large Deviation theory.A family of probability measures (ν ε ) ε>0 is said to be exponentially tight of order r ε if Then, we recall the link between exponential tightness and the Large Deviation Principle (see Feng and Kurtz (2006), chapter 3 for instance).
Proposition 2.1 Let (S, d) be a metric space and (ν ε ) ε≥0 a sequence of exponentially tight probability measures on the Borel σ-algebra of S with speed r ε .Then there exists a subsequence (ε k ) k≥0 such that ε k → 0 along which the LDP holds with good rate function I and speed r ε k .

Averaged gradient diffusions
(2.1) As announced in Introduction with k(t) = e λt , we are interested in the stochastic evolution of where λ > 0 and (B t ) t≥0 is a standard d-dimensional Brownian motion.The process (X ε t ) t≥0 is not a Markov process but enlarging the space by defining the auxiliary process (Y ε t ) t≥0 as Gadat and Panloup (2012) for instance).More precisely (Z ε t ) t≥0 satisfies: (2.2) When necessary, we will denote by (Z ε,z t ) t≥0 the solution starting from z ∈ R d and by P ε z the distribution of this process on C(R + , R d ).In the sequel, we will also intensively use the deterministic system obtained when ε = 0 in (2.2): if (z(t)) t≥0 := (x(t), y(t)) t≥0 , the canonical differential system is

Assumptions
The function ∇U being not necessarily Lipschitz continuous, we assume in all the paper that there exists C > 0 such that for all x ∈ R d , D 2 U (x) ≤ CU (x).This assumption ensures the non-explosion (in finite horizon) of (Z ε t ) t≥0 (see Proposition 2.1 of Gadat and Panloup (2012)).Since ∇U is locally Lipschitz continuous, existence and uniqueness hold for the solution of (2.2) and (Z ε t ) t≥0 is a Markov process with semi-group denoted by (P ε t ) t≥0 and infinitesimal generator We first recall some results obtained by Gadat and Panloup (2012) on existence and uniqueness for the invariant distribution of (2.2).To this end, we need to introduce a mean-reverting assumption denoted by (H mr ) and some hypoellipticity assumption (H Hypo ).The mean reverting assumption is expressed as follows: x,∇U (x) = 0.
Concerning the second assumption, let us define E U by and denote by M U the complementary manifold M U = R d \ E U .Assumption (H Hypo ) is then defined by: The above assumption implies the uniqueness invariant distribution: the smoothness of U and the fact that dim(M U ) ≤ d − 1 ensure that the Hörmander condition is satisfied on a sufficiently large subspace of R 2d whereas the fact that lim |x| −1 U (x) = +∞ as |x| → +∞ (which implies that U grows at least linearly) is needed for the topological irreducibility of the semi-group (see Gadat and Panloup (2012) for details).Under these assumptions, we deduce the following proposition from Theorems 2.3 and 3.2 of Gadat and Panloup (2012): Proposition 2.2 Assume (H mr ).Then, for all ε > 0, the solution of (2.2) admits an invariant distribution.Furthermore, if (H Hypo ) holds, the invariant distribution is unique and admits a λ 2d -a.s.positive density.We denote by ν ε this invariant distribution.
Note that (H mr ) implies Assumption (H 1 ) of Gadat and Panloup (2012) in the particular case σ = I d and r ∞ = λ.
Our goal is now to obtain a Large Deviation Principle for (ν ε ) ε>0 when ε → 0. To this end, we need to introduce some more constraining mean-reverting assumptions than (H mr ): Remark 2.1 Assumptions (H Q+ ) and (H Q− ) correspond respectively to super-quadratic and subquadratic potentials.For instance, assume that U (x) = (1 + |x| 2 ) p .When p ≥ 1, (H Q+ ) holds with ρ ∈ (0, 1 2p ) and if p ∈ (1/2, 1], (H Q− ) holds with a = p.These assumptions are adapted to a large class of potentials U with polynomial growth (more than linear).However, they do not cover the potentials with exponential growth ((H Q+ )(ii) is no longer fulfilled).

Exponential tightness and Hamilton Jacobi equation
Let ϕ ∈ H.When existence holds, we denote respectively by z ϕ := (z ϕ (t)) t≥0 and by zϕ := (z ϕ (t)) t≥0 , a solution of Note that (H Q+ ) and (H Q− ) ensure the finite-time non-explosion of z ϕ and zϕ for all ϕ ∈ H (see e.g.Equation (3.4)).Thus, since ∇U is locally Lipschitz continuous, for all z ∈ R 2d , the solutions starting from z respectively denoted by z ϕ (z, .)and zϕ (z, .)exist and are unique.Finally, we will also need the following assumption.
(H D ) : The set of critical points (x ⋆ i ) i=1...ℓ of U is finite and each D 2 U (x ⋆ i ) is invertible.This assumption will be necessary to obtain some uniqueness property.We are now able to express our first main result.
Equation (2.7) satisfied by W may be seen as an Hamilton-Jacobi equation (see e.g.Barles (1994) for further details on such equations).

Freidlin and Wentzell estimates
Let us stress that the main problem in the expression (2.8) is that the uniqueness is only available conditionally to the values of W (z ⋆ i ), i = 1, . . .ℓ. Thus, in order to obtain a LDP, we now need to show that that this uniqueness is not conditional, i.e. that the values of W (z ⋆ i ) are uniquely determined.We are going to obtain this result by following the Freidlin and Wentzell (1979) approach.To this end, we first recall some useful elements of Freidlin and Wentzell theory.
{i}-Graphs Following the notations of Theorem 2.1, we denote by {z ⋆ 1 , . . ., z ⋆ ℓ } this finite set of equilibria and we recall here the definition of {i}-Graphs defined on this set.For any i ∈ {1, . . ., ℓ}, we denote by G(i) the set of oriented graphs with vertices {z ⋆ 1 , . . ., z ⋆ ℓ } that satisfy the three following properties.
(i) Each state z ⋆ j = z ⋆ i is the initial point of exactly one oriented edge in the graph.(ii) The graph does not have any cycle.(iii) For any z ⋆ j = z ⋆ i , there exists a (unique) path composed of oriented edges starting at state z ⋆ j and leading to the state z ⋆ i .
L 2 control cost between between equilibria We now define for any couple of points the minimal L 2 cost to go from ξ 1 to ξ 2 within a finite time t as and also the minimal L 2 cost to go from ξ 1 to ξ 2 within any time: (2.9) The next corollary follows immediately from Theorem 2.1 and Theorem 2.2.
Corollary 1 Assume that (H Hypo ) holds and that either holds, (ν ε ) satisfies a large deviation principle with speed ε −2 and good rate function W such that where W(z ⋆ i ) is given by (2.9).
Case of a double-well non-convex potential In the sequel, we are interested by the location of the minimum of W .More precisely, we expect that this minimum is located on the set of global minima of U .Using Equation (2.8), this point is clear when U is a stricly convex potential.
Regarding now the non-convex case, the situation is more involved.Thus, we only focus on the double-well one-dimensional case.Without loss of generality, we assume that U has two local minima denoted by x ⋆ 1 and x ⋆ 2 with where x ⋆ is the unique local maximum between x 1 and x 2 .We obtain the following result: Theorem 2.3 Assume the hypothesis of Corollary 1 and that U satisfies (2.10).Then, and finally (ν ε ) ε≥0 weakly converges towards δ z ⋆ 1 as ε → 0. In the next sections, we prove the above statements.Note that throughout the rest of the paper, C will stand for any non-explicit constant.Note also that except in Section 5, we will prove all the results with λ = 1 for sake of convenience (one can deduce similar convergences with small modifications for any λ > 0).
3 Large Deviation Principle for invariant measures (ν ε ) ε∈ (0,1] This section describes the proof of Theorem 2.1 which contains two important parts.The first one concerns the exponential tightness of the invariant measures (ν ε ) ε∈(0,1] and the second result is a functional equality for any good rate function associated to any (LD)-convergent subsequence We first establish a LDP for (Z ε ) ε>0 on C(R + , R 2d ) (space of continuous functions from R + to R 2d ) and then we detail how one can derive the exponential tightness property of (ν ε ) ε∈(0,1] using suitable Lyapunov functions for our dynamical system.Finally, we show that a functional equality such as (2.7) holds.

Large Deviation Principle for (Z ε ) ε>0
The next lemma establishes a LDP for trajectories of ((Z ε t ) t≥0 ) ε>0 within a finite time.
Then, (Z ε,zε ) ε>0 satisfies a LDP on C(R + , R 2d ) (endowed with the topology of uniform convergence on compact sets) with speed ε −2 .The corresponding (good) rate function I z is defined for all absolutely continuous (z(t)) t≥0 = (x(t), y(t)) t≥0 by where z ϕ (z, .)= z(.)means that for all t ≥ 0, z ϕ (z, t) = z(t).In particular, for all t ≥ 0, (P ε t (z ε , .))ε>0 satisfies a LDP with speed ε −2 .The corresponding rate function I t (z, .) is defined for all z, z ′ ∈ R 2d by: where Z t (z, z ′ ) denotes the set of absolutely continuous functions z(.) such that z(0) = z, z(t) = z ′ .Furthermore, the function I t can be written as Remark 3.1 Note that such result is quite classical when z ε = z and when the coefficients are Lipschitz continuous functions (see e.g.Azencott (1980) for instance).Here, we have to handle the possibly super-linear growth of the drift vector field b (and also the degeneracy of the diffusion).
Proof : We wish to apply Theorem 5.2.12 of Puhalskii (2001).To this end, we need to prove the following four points: • Uniqueness for the maxingale problem: this step is an identification of the (potential) LDlimits of (Z ε ) ε>0 .More precisely, we need to prove that the idempotent probability π z (.) := exp(−I z (.)) is the unique solution to the maxingale problem (z, G) where G : ) is given by: The fact that π z solves the maxingale problem follows from Theorem 3.1 and Lemma 3.2 of (Puhalskii, 2004).Setting E(x, y) = U (x) + |y| 2 2 , note that Lemma 3.2 can be applied since ∇E(x, y), b(x, y) ≤ 0 (see condition (3.6a) of (Puhalskii, 2004)).Furthermore, since b is locally Lipschitz continuous, for all ϕ ∈ H, the ordinary differential equation has a unique solution.Thus, uniqueness for the maxingale problem is a consequence of the second point of Lemma 2.6.17 of (Puhalskii, 2001) and of Theorem 3.1 of (Puhalskii, 2004).
• Continuity condition: some continuity conditions must be satisfied for the characteristics of the diffusion.In fact, since the diffusive component is constant, it is enough to focus on the drift component and to show that for all t ≥ 0 the function Since b is Lipschitz continuous on every compact set of R 2d , this point is obvious.
• Local majoration condition: in this step, we have to check that for all M > 0, there exists an increasing continuous map with z ∞ = sup t≥0 |z(t)|.Since b is locally bounded, this point is true with • Non-Explosion condition (NE): The Non-Explosion condition holds if (ii) For all t ≥ 0 and for all a ∈ (0, 1], the set Point (i): The property that π z is upper-compact means that for all a ∈ (0, 1], the set K a := {z, π z (z) ≥ a} is a compact set (for the topology of uniform convergence on compact sets).For this, we use the Ascoli Theorem.We first show the boundedness property for the paths of K a .
From the definition of π z , we observe that for any z of K a , there exists a control ϕ ∈ H such that z = z ϕ and such that Using the above defined function E, one checks that for all p > 0, and the Gronwall Lemma implies that Finally, Equation (3.2) combined with (3.4) and the fact that Now, let us prove that K a is equicontinuous: for all t > 0, u, v ∈ [0, t] with u ≤ v and z ∈ K a , we know that for a suitable constant Ct,a,z , the controlled trajectories of K a are a priori bounded: The two conditions of the Ascoli Theorem being satisfied, the compactness of K a follows.Point (ii): We do not detail this item which easily follows from the controls established in the proof of (i) (see (3.4)).Finally, the other conditions of Theorem 5.2.12 of Puhalskii (2001) being trivially satisfied, the lemma follows.

Exponential tightness (Proof of i) of Theorem 2.1)
In the next proposition, we investigate the exponential tightness of (ν ε ) ε∈(0,1] .Our approach consists in showing sufficiently sharper estimates for hitting time of the process (Z ε t ) t≥0 .
Proposition 3.1 Assume (H Q+ ) or (H Q− ), then there exists a compact set B of R 2d , such that the first hitting time τ ε of B defined as τ ε = inf{t > 0, Z ε t ∈ B} satisfies the three properties: As a consequence, the family of invariant distributions (ν ε ) ε∈(0,1] is exponentially tight. The conclusion of the above proposition follows directly from Lemma 7 of Puhalskii (2003).A fundamental step of the proof of Proposition 3.1 is the next lemma which shows some meanreverting properties for the process (with some constants that do not depend on ε).Its technical proof is postponed in the appendix.Note that such lemma uses a key Lyapunov function V which is rather not standard due to the kinetic form of the coupled process.
• Proof of (i): We use a Lyapunov method to bound the second moment of the hitting time τ ε .Let p ∈ (0, 1).By the Itô formula, we have (3.11)where (M t ) is the local martingale defined by Since V is a positive function, we have that Note that in the above expression, the martingale ( Mt ε ) t≥0 has been compensated by its stochastic bracket in order to use further exponential martingale properties.The l.h.s. of (3.13) satisfies Then, a localization of (M t ) combined with the Fatou Lemma yields for all stopping time τ The final step relies on the fact that there exists p ∈ (0, 1) and M 1 > 0 such that: Let us prove the above inequality under condition ((H Q+ )) or (H Q− ).First, since m ∈ (0, 1), one can check that there exists C > 0 such that As a consequence, we have lim Now, owing to the assumptions on ∇U , it follows that, From now on, assume that (3.17)By Lemma 3.2, we then obtain that for all (x, y) ∈ R 2d and ε ∈ (0, 1] where p is defined in Lemma 3.2.Under (3.17), one checks that 2p − 1 < p and then uniformly in ε: lim and (3.14) follows.Next, we consider (3.6) with τ being τ ε = inf{t ≥ 0, Z ε t ∈ B(0, M 1 )}, where M 1 is such that (3.14) holds.We then have Computing the integral and using the Fatou Lemma, we get The Jensen Inequality applied to x → x 1 ε 2 yields that for every (x, y) ∈ R 2d , for all ε ∈ (0, 1] The first statement follows using that V p is locally bounded. • Proof of (ii): Thanks to (3.15), we have for all p > 0 and for |(x, y)| large enough, Multiplying by δ/ε 2 , this inequality suggests the computation of for appropriate p and τ .Applying the Itô formula to the function ψ ε (x, y) = exp(δV p (x, y)/ε 2 ), we get for all t where (M t ) t≥0 is a local martingale that we do not need to make explicit.Let us choose p ∈ (0, 1) such that inequality (3.10) of Lemma 3.2 holds.Since V (x, y) → +∞ as |(x, y)| → +∞ and since p > 0, we deduce that As a consequence, for all positive δ, there exists Without loss of generality, we can assume that M 2 is such that (3.18) is valid for all (x, y) ∈ B(0, M 2 ) c .It follows that for all ε ∈ (0, 1], t ≥ 0 and (x, y) ∈ B(0, M 2 ) c , x,y) .
From the above inequality, we finally deduce (3.7).
• Proof of (iii): With the notations of the two previous parts of the proof, the properties (3.6) and (3.7) hold with In this last part of the proof, we then set B = B(0, M ) where Second, remark that it is enough to show that the result holds with τ ε ∧ 1 instead of τ ε .Now, let K be a compact set of R 2d such that B ∩ K = ∅ and let (ε n , z n ) n≥1 be a sequence such that ε n → 0, such that z n ∈ K for all n ≥ 1 and such that Up to an extraction, we can assume that (z n ) n≥1 is a convergent sequence.Let z denote its limit.Lemma 3.1 implies that (L((Z εn,zn ) t∈[0,1] ) n≥1 is exponentially tight and then tight on Using a second extraction, we can assume that (Z εn,zn ) n≥1 converges in distribution to Z (∞) .Furthermore, since ε n → 0, the limit process Z (∞) is a.s. a solution of the o.d.e.ż = b(z) starting at z.The function b being locally Lipschitz continuous, uniqueness holds for the solutions of this o.d.e. and we can conclude that (Z εn,zn ) n≥1 converges in distribution to z(z, .)(where z(z, .)denotes the unique solution of ż = b(z) starting from z).The function z(z, .)being deterministic, the convergence holds in fact in probability and at the price of a last extraction, we can assume without loss of generality that (Z εn,zn ) n≥1 converges a.s. to z(z, .).In particular, setting δ := d(K, B) (δ > 0), there exists n 0 ∈ N such that for all n ≥ n 0 , Setting now, we deduce that for all n ≥ n 0 , Using the Fatou Lemma, we can conclude that Finally, since t → z(z, t) is a continuous function and since d(K, B(0, M + δ 2 )) > 0, the stopping time τ z, δ 2 is clearly positive.The result follows and this finishes the proof of Proposition 3.1.

Hamilton-Jacobi equation (Proof of ii) of Theorem 2.1)
This point is a consequence of the finite time large deviation principle for (Z ε ) ε≥0 (Lemma 3.1) and of the exponential tightness of (ν ε ) ε≥0 (Proposition 3.1).This is the purpose of the next proposition which is an adaptation of Corollary 1 of Puhalskii (2003).
Then, (ν ε ) ε>0 admits a (LD)-convergent subsequence and for such subsequence (ν ε k ) k≥0 (with With the terminology of Puhalskii (2003), Equation (3.20) means that W defined for all Γ ∈ B(R 2d ) by W (Γ) = sup y∈Γ exp(−W (y)) is an invariant deviability for (P ε t (z, .))t≥0,z∈R 2d .In Corollary 1 of Puhalskii (2003), this result is stated with a uniqueness assumption on the invariant deviabilities.The above proposition is in fact an extension of this corollary to the case where uniqueness is not fulfilled.We refer to Appendix A for details.
Owing to Proposition 3.1 and Lemma 3.1, Proposition 3.2 can be applied with I t (z, .)defined in (3.1).The rate function W is solution of (3.20) and Equation (2.7) is satisfied.Thus, the next result proves the assertion ii) of Theorem 2.1.
Proposition 3.3 Assume that either (H Q+ ) or (H Q− ) is satisfied, then any (good) rate function W associated to any (LD)-convergent subsequence (ε n ) n≥1 satisfies for all t ≥ 0 and for all Proof : We know that W satisfies (3.20) and thus for any Remarking that g : [0, t] → R 2d defined by g(s) = z ϕ (t − s) is a controlled trajectory associated to −b and −ϕ, we deduce that for all t ≥ 0 The result follows from the change of variable φ = −ϕ.

Infinite horizon Hamilton-Jacobi equation
The aim of this part is to show that when there is a finite number of critical points, we can "replace t by +∞" in (2.7).This proof is an adaptation of Theorem 4 of Biswas and Borkar (2009).The main novelty of our proof is the second step.Indeed, using arguments based on asymptotic pseudo-trajectories and Lyapunov functions, we prove that the optimal controlled trajectory is attracted by a critical point of the drift vector field.
Proof of (iii) of Theorem 2.1: The proof is divided in three parts.We first build an optimal path t → zψ (z, t) for the Hamilton Jacobi equation of interest.Then, we focus in the second step on its long time behaviour and obtain that zψ (z, t) converges to z ⋆ which belongs to {z, b(z) = 0}.
In order to conclude, we need to prove the continuity of W at each point of {z, b(z) = 0}.This is the purpose of the third step.
• Step 1: We show that we can build a function ψ ∈ H such that for all z ∈ R 2d the couple First, let T > 0 and let (z Since W is non negative, it is clear that ) which denotes the set of square-integrable functions on [0, T ] endowed with the weak topology.This also implies that Since lim |z|→+∞ E(z) = +∞, Equation (3.22) also follows in this case.Now, since b is locally Lipschitz, b is then Lipschitz continuous on B(0, M ) and a classical argument based on the Ascoli Theorem shows that (z ) and thus there exists a convergent subsequence to (ẑ T , ψT ) which belongs to C([0, T ], R 2d )×L 2 w ([0, T ], R 2d ).Using that b is a continuous function, one checks that ẑT (t) = zψ T (z, t), for all t ∈ [0, T ] and the couple (ẑ T , ψT ) satisfies (3.21) (for a fixed T ).Furthermore, for all t ∈ [0, T ], we have and then (3.21) holds for all t ∈ [0, T ].As a consequence, we can build (z ψ (z, .), ) (by concatenation) which satisfies (3.21) (for all t ≥ 0). • Step 2: Dropping the initial condition z, we show that (z ψ (t + .))t≥0 converges as t → +∞ to a stationary solution of ż = −b(z).First, as in (3.23), and it follows that (W (z ψ (t)) t≥0 is a non-increasing and thus bounded function.Since W is a good rate function, for all M > 0, W −1 ([0, M ]) is a compact subset of R 2d .This means that (z ψ (t)) t≥0 is bounded.From (3.24), we deduce that ( t+T t | ψ(s)| 2 ds) t≥0 is also bounded.Thus, as in Step 1, owing to the previous statements and to the fact that b is locally Lipschitz continuous, we deduce from the Ascoli Theorem that (z ψ (t + .)) is relatively compact (for the topology of uniform convergence on compact sets).
We denote now by z∞ ψ (.) the limit of a convergent subsequence.Let us show that (z ∞ ψ (t)) t≥0 is a solution of ż = −b(z).First, since (W (z ψ (t))) t≥0 is non-increasing (and non negative as a rate function), we again deduce from (3.24) that As a consequence, using that for all s ≥ 0, the map z → z(s) − z(0) It remains to show that (z ∞ ψ (t)) t≥0 is stationary, i.e. that every limit point of (z ψ (t)) t≥0 belongs to {z ∈ R 2d , b(z) = 0}.Denote by (Φ t (z)) t,z the flow associated to the o.d.e.ż = −b(z).Again, owing to the fact that for all T > 0, This means that (z ψ (t)) t≥0 is an asymptotic pseudo-trajectory for Φ (see (Benaim, 1996)).As a consequence, by Proposition 5.3 and Theorem 5.7 of Benaim (1996), the set K of limit points of (z ψ (t)) t≥0 is a (compact) invariant set for Φ such that Φ K has no proper attractor.This means that there is no strict invariant subset A of K such that for all z ∈ K, d(Φ t (z), A) To this end for a positive ρ, we consider L : R 2d → R defined by If z is solution of ż = −b(z), we have : Since K is a bounded invariant set and that D 2 U is locally bounded, we can choose ρ small enough and α ρ > 0 such that for all (z(t)) For all starting point z ∈ K, the function t → L(z(t)) is then non-decreasing and thus convergent to ℓ ∞ ∈ R. Since (z(t)) t≥0 is bounded, an argument similar to the one developed in Step 1 combined with the Ascoli Theorem yields that (z(t + .)) is relatively compact.If (z(t n + .))n≥0 denotes a subsequence of (z(t + .)),we can assume (at the price of a potential extraction) that (z(t n + .))n≥0 converges to z ∞ (.).We have necessarily L(z ∞ (t)) = ℓ ∞ for all t ≥ 0 and thus By (3.25), we deduce that y ∞ (t) = ∇U (x ∞ (t)) = 0.This means that z ∞ (.) is a stationary solution and that every limit point of (z(t)) t≥0 is an equilibrium point of the o.d.e.Thus, we can conclude that every limit point of (z ψ (t)) t≥0 belongs to {z, b(z) = 0}.Finally, since the set of limit points of (z ψ (t)) t≥0 is compact connected and {x, ∇U (x) = 0} is finite, it follows that zψ (t) → z ⋆ = (x ⋆ , 0) as t → +∞ where x ⋆ ∈ {x, ∇U (x) = 0}.Then, by (3.21), we will deduce the announced result if we prove that W is continuous at z ⋆ .This is the purpose of the next step.
• Step 3: We prove that for all z ⋆ ∈ {z, b(z) = 0}, i.e. for all z ⋆ = (x ⋆ , 0) with x ⋆ ∈ {x, ∇U (x) = 0}, W is continuous at z ⋆ .Since D 2 U (x ⋆ ) is invertible, we deduce from Lemma 4.1 (available in the appendix) that the dynamical system is locally controllable around z ⋆ , i.e. that for all T > 0, for all ε > 0, there exists η > 0 such that for all z ∈ B(z ⋆ , η), I T (z, z ⋆ ) ≤ ε and I T (z ⋆ , z) ≤ ε.Now, the definition of W implies that W (z ⋆ ) ≤ W (z) + ε and W (z) ≤ W (z ⋆ ) + ε.The continuity of W follows and it ends the proof of (iii) in Theorem 2.1 by letting t go to +∞ in (3.21).

Freidlin and Wentzell theory
In this section, we establish some sharp estimations of the behaviour of (ν ε ) (as ε → 0).To this end, we follow the roadmap of Freidlin and Wentzell (1979).Our goal is twofold: first, we aim to obtain some uniqueness property for the rate function W defined in Theorem 2.1 and thus to derive a large deviation principle for (ν ε ).Second, we want to obtain a more explicit formulation of W in order to characterize at least in some particular cases, the limit behaviour of (ν ε ) for some non-convex potential U .In the rest of the paper, we will assume that the potential U satisfies Assumption (H D ) defined in Section 2.4.1, the set of critical points of U is thus finite and we set {x ∈ R d , ∇U (x) = 0} = {x ⋆ 1 , . . ., x ⋆ ℓ }.First, we classify the critical points, that is, we link the critical points of the vector field b with those of U and we determine their nature (stable or unstable).Next, with respect to these critical points, we construct the so-called skeleton Markov chain associated to the process (X ε t , Y ε t ).With all these ingredients, we finally derive the LDP for (ν ε ).
Proposition 4.1 Assume that D 2 U (x ⋆ i ) is invertible for all i ∈ {1, . . ., ℓ}.Then, if x ⋆ i is a minimum of U , then z ⋆ i is a stable equilibrium of the deterministic dynamical system.Otherwise, z ⋆ i is an unstable equilibrium.
Proof : We denote by I the minima of U and by J the other critical points.Let us compute the Jacobian matrix of the vector field b: Now, simple linear algebra yields the characterization of the spectrum of the linearized vector field near each equilibrium z ⋆ i : ) has some negative eigenvalues µ.Then, Db(z ⋆ i ) has some positive eigenvalues (since 1/4 − µ < 1/2 in this case) and z ⋆ i is thus an unstable equilibrium of the deterministic dynamical system.This ends the proof of the proposition.

Skeleton representation
The Freidlin and Wentzell (1979) description of the invariant measure ν ε of the continuous time Markov process strongly depends on its representation using the invariant measure of a specific skeleton Markov chain.This formula, due to Khas'minskiȋ (see Khas'minskiȋ (1980), chapter 4) in the uniform elliptic case, will remain true in our framework even if the original process is hypoelliptic and defined on a non compact manifold.This is the purpose of Proposition 4.2 below but before a precise statement, we first need to define the skeleton Markov chain associated to our process.
Let ρ 0 be the half of the minimum distance between two critical points: Now, let 0 < ρ 1 < ρ 0 and set g i = B(z ⋆ i , ρ 1 ).Each boundary ∂g i is smooth as well as the one of the set g defined as g = ∪ i g i .(4.2) Note that by construction, g i ∩ g j = ∅ if i = j.Finally, we denote by Γ the complementary set of the ρ 0 -neighbourhood of the set of the critical points z ⋆ i : We provide in Figure 1 a short summary of the construction of the sets (g i ) i , g, Γ as well as the positions of the critical points z ⋆ i .It also provides an example of a trajectory (Z ε,z t ) t≥0 (K will be defined in the sequel).
Figure 1: Graphical representation of the neighbourhood g i , the process (Z ε,z t ) t≥0 , the skeleton chain and the compact sets K and K 1 .Now, we consider any initialisation on the boundary of the neighbourhoods of critical points z ∈ ∂g (in our figure, Z ε,z 0 = z ∈ ∂g 1 ), and we define ( Zn ) n∈N the skeleton Markov chain which lives in ∂g through the classical construction of hitting and exit times of the neighbourhoods defined above.First, we set τ 0 (∂g) = 0 and we also define We then follow the natural recursion We will show in Proposition 4.2 that for all n ≥ 0, τ n (∂g) < +∞ a.s.The skeleton is then defined for all n ∈ N by, Zn = Z ε,z τn(∂g) .Note that ( Zn ) n≥0 belongs to ∂g and that ( Zn ) n≥0 is a Markov chain (this is actually a consequence of the strong Markov property).The set ∂g being compact, existence holds for the invariant distribution ( Zn ) n∈N .We denote such distribution by μ∂g ε .The proposition states that ν ε may be related to μ∂g ε .
Proposition 4.2 (i) Following the notations introduced before, we have (ii) For any borelian set A ∈ B(R d × R d ) and for any ρ 1 ∈ (0, ρ 0 ) the measure is invariant for the process (Z ε t ) t≥0 .Hence, µ ∂g ε is a finite measure proportional to ν ε .
Proof : We first prove (i).Owing to Proposition 3.1, we first check that one can find a compact set K such that g ⊆ K an such that for all compact set K 1 such that K ⊆ K 1 , the first hitting time τ (K) of K satisfies sup Then, the idea of the proof is to extend to our hypoelliptic context the proofs of Lemma 4.1 and 4.3 of Khas'minskiȋ (1980) given under some elliptic assumptions.Let z ∈ ∂g and set and recursively for all n ≥ 2, By construction, we have a.s.: Then, by the strong Markov property and (4.6), it follows from a careful adaptation of the proofs of Lemma 4.1 and 4.3 of Khas'minskiȋ (1980) that sup z∈∂g E ε z [τ 1 (∂g)] < ∞ if the two following points hold for all ε > 0: • sup z∈K\g p ε (z) < 1 where p ε (z) := P(Z ε,z τ (∂g∪∂K 1 ) ∈ ∂K 1 ).As concerns the first point, it follows from Remark 5.2 of Stroock and Varadhan (1972) that it is enough to check that there exists T > 0, a control (ϕ(t)) t∈[0,T ] such that Indeed, in this case, using the support theorem of Stroock and Varadhan (1970), we obtain that sup z∈K P(τ (∂K 1 ) ≤ T ) < 1 and the first point follows from the strong Markov property (see Remark 5.2 of Stroock and Varadhan (1972) for details).Now, we build (ϕ(t)) t≥0 as follows.Let us consider the system: Setting φ = y + I d , we obtain a controlled trajectory z ϕ (z, .)and it is clear from its design that for all M > 0, there exists T > 0, such that for all z ∈ K, |x ϕ (T )| > M .The first point easily follows.
It is well-known (see for instance Stroock and Varadhan (1972)) that for all ε > 0, p ε is a solution of Thus, since sup z∈K 1 \g E[τ (δg ∪ δK 1 )] < +∞, since h defined by h(x) = 1 on ∂K 1 and h(x) = 0 on ∂g is obviously continuous on ∂g ∪ ∂K 1 , we can apply Theorem 9.1 of Stroock and Varadhan (1972) with k = f = 0 to obtain that z → p ε (z) is a continuous map.Furthermore, for all z ∈ K\g, we can build a controlled trajectory starting at any z ∈ ∂K which hit ∂g before ∂K 1 .Taking for instance φ = 0, we check that (E(x 0 (t), y 0 (t))) t≥0 is non-increasing (with E(x, y) = U (x) + |y| 2 /2) and that the accumulation points of (x 0 (t), y 0 (t)) lie in {z, b(z) = 0}.Thus, taking K 1 large enough in order that sup (x,y)∈K E(x, y) < inf (x,y)∈K c 1 E(x, y), leads to an available control for all z ∈ K. Finally, using again the support theorem of Stroock and Varadhan (1970) implies that for each z ∈ ∂K, p ε (z) < 1.The second point then follows from the continuity of z → p ε (z).This ends the proof of i).
Regarding now the second point (ii), as argued in the paragraph before the statement of this proposition, ( Zn ) n∈N possesses a unique invariant measure μ∂g ε .The fact that µ ∂g ε is invariant for (Z ε t ) t≥0 is standard and relies on the strong Markov property of the process (see e.g. in Has'minskii (1960)).
Remark 4.1 One could also have used an uniqueness argument for viscosity solutions to obtain the continuity of z → p ε (z) using the maximum principle on A ε (as it is already used by Stroock and Varadhan (1972)).One may refer to Barles (1994) for further details.

Transitions of the skeleton Markov chain
This paragraph is devoted to the description of estimations obtained through the Freidlin and Wentzell theory for the Markov skeleton chain defined above.These estimations as well as Proposition 4.2 are then used to obtain the asymptotic behaviour of ν ε .In view of Theorem 2.1, we know that there exists a subsequence (ε n ) n∈N such that (ν εn ) satisfies a large deviation principle of rate ε 2 n with good rate function W .We then consider in the sequel this extracted subsequence but keep the notation ε.Hence, ε → 0 means ε n → 0 as n → +∞ with our appropriate subsequence along with the large deviation principle holds.In the same way, ε small enough will correspond to n large enough.

Controllability and exit times estimates
In order to obtain some estimates related to the transition of skeleton Markov chain, the first step is to control the exit times of some balls B(z ⋆ i , δ) where z ⋆ i denotes a critical point of ż = b(z) (similarly to Section 1, Chapter 6 of Freidlin and Wentzell (1979)).In our hypoelliptic framework, such controls of the exit times are strongly based on the controllability around the equilibria.We have the following property: Then, for all δ > 0, there exists ρ(δ) > 0 small enough such that and B = I d 0 0 0 , the linearized system (at z ⋆ i ) associated with the controlled system ż = b(z) + φ 0 (4.9) can be written ż = Az+Bu where u = ( φ, ψ) t with ψ ∈ H(R d ).Using that D 2 U (x ⋆ i ) is invertible, one easily checks that Span(Bu, ABu, u ∈ R 2d ) = R 2d .As a consequence, the Kalman condition (see e.g.(Coron, 2007)) is satisfied and it follows from Theorems 1.16 and 3.8 of (Coron, 2007) that the system (4.9) is locally exactly controllable at z ⋆ i .The lemma is then proved.We are now able to obtain the following estimation.
Lemma 4.2 Assume that (H D ) holds and that either (H Q+ ) or (H Q− ) is satisfied.Then, for all γ > 0, there exists δ > 0 and ε 0 small enough such that if we define Proof : Let i ∈ {1 . . .ℓ} and fix any γ > 0. By Lemma 4.1 applied with T = 1, one can find ρ > 0 such that ) and we fix a = z and take b such that It is then possible to follow the proof of Lemma 1.7, chapter 6 of Freidlin and Wentzell (1979): remark that Second, using that G is a compact set,there exists a convergent sequence (z k ) of G and a sequence (ε k ) such that ε k → 0 and such that, Now, owing to Lemma 3.1 and to (4.10), we deduce that where z ∞ := lim k→+∞ z k .As a consequence, there exists ε 0 > 0 such that for all ε ∈ (0, ε 0 ], for all z ∈ G, Following the same kind of argument using again the key Lemma 4.1 and the finite time large deviation principle, we also obtain that Lemma 1.8, chapter 6 of Freidlin and Wentzell (1979) still holds.In our context, this leads to the following lemma.
Lemma 4.3 Assume that (H D ) holds and that either (H Q+ ) or (H Q− ) is satisfied.For any i ∈ {1 . . .ℓ} and z ⋆ i an equilibrium of (2.3), we define for any ρ > 0 the neighbourhood G := B(z ⋆ i , ρ) of z ⋆ i .Then, for any γ > 0, there exists δ ∈ (0, ρ] such that if we define g = B(z ⋆ i , δ), we have for ε small enough

Transitions of the Markov chain skeleton
By Proposition 4.2, the idea is now to deduce the behaviour of ν ε from the control of the transitions of the skeleton chain ( Zn ) n∈N .We recall that for any denotes the L 2 -minimal cost to go from ξ 1 to ξ 2 in a finite time t: and In the sequel, we will also need to introduce Ĩ(z ⋆ i , z ⋆ j ) defined for all ∀(i, j) ∈ {1 . . .ℓ} 2 by: The quantity Ĩ(z ⋆ i , z ⋆ j ) is the minimal cost to join z ⋆ j from z ⋆ i avoiding other equilibria of (2.3).In the following proposition, we prove that Ĩ(z ⋆ i , z ⋆ j ) is finite.
In the proof, we assume that i = j.The idea is to build a controlled trajectory starting at z ⋆ i and ending at z ⋆ j (in a finite time) that avoids the other equilibria neighbourhoods ∪ k =(i,j) g k .We first assume that d > 1.In this case, for all fixed t 0 > 0, for any ρ 1 -neighbourhood g k of z ⋆ k , one can find a smooth trajectory (x 0 (t)) t≥0 satisfying x 0 (0 Then, denote by (y 0 (t)) t≥0 a solution of ẏ0 (t) = ∇U (x 0 (t))− y 0 (t) with initial condition y 0 (0) = 0 and let ϕ 0 ∈ H satisfy φ0 (t) = ẋ0 (t) + y 0 (t).We thus obtain a controlled trajectory z ϕ 0 (z ⋆ i , .) which satisfies z ϕ 0 (z ⋆ i , t) = (x 0 (t), y 0 (t)) for all t ∈ [0, t 0 ].This way, we have It remains now to join (x ⋆ j , 0) from (x ⋆ j , y 0 (t 0 )) without hitting ∪ k =i,j g k .Let (x 1 (t), y 1 (t)) t≥t 0 be defined for all t ≥ t 0 by x 1 (t) = x ⋆ j and y 1 (t) = y 0 (t 0 )e t 0 −t (so that y 1 is a solution of ẏ1 = −y 1 with y 1 (t 0 ) = y 0 (t 0 )).Once again, (x 1 (t), y 1 (t)) t≥t 0 can be viewed as a controlled trajectory ).As a consequence, there exists T such that z ϕ 1 ((x ⋆ j , y 0 (t 0 )), T ) ∈ g j .Hence, one can find a controlled trajectory starting from z i and ending into any sufficiently small neighbourhood of z j in a finite time and avoids the other ρ 1 neighbourhood of ∪ k =(i,j) g k .It remains to use Lemma 4.1 to obtain a controlled trajectory starting at z ϕ 1 ((x ⋆ j , y 0 (t 0 )), T ) and ending at point z ⋆ j within a finite time.The global controlled trajectory initialized at z ⋆ i ends at z ⋆ j with a finite L 2 control cost.The result then follows when d > 1.
Consider now the case d = 1 and let x ⋆ i , x ⋆ j be two critical points of U .Without loss of generality, one may suppose that x ⋆ i < x ⋆ j .From (H D ), the number of critical points which belong to [x ⋆ i , x ⋆ j ] is finite (denoted by p): Now, we consider a path which joins x ⋆ i to x ⋆ j parametrised as with α(0) = 0 and α(T ) = 1 for T large enough which will be given later.Of course, y α (t) is then defined as For the sake of simplicity, we consider only increasing maps α.If p = 0, we know that (x α (t), y α (t)) t∈[0;1] avoids ∪ k =(i,j) (x ⋆ k , 0) and then Ĩ(z ⋆ i , z ⋆ j ) < +∞ which proves the proposition.If p > 0, there exists t 1 , . . .t p such that x α (t k ) = x ⋆ i k and we shall prove that one can find α such that y α (t k ) = 0. Since we consider only increasing paths, we first show that one can find a monotone α such that y α (t 1 ) = 0. Let α be any C 1 increasing parametrisation defined on [0; t 1 ].We know that U ′ does not vanish on ]x ⋆ i , x ⋆ i 1 [ and from equation (4.11), y α (t 1 ) = 0. Suppose without loss of generality that y α (t 1 ) < 0, which means that we continue the parametrisation α from t 1 to t1 such that x( t1 ) = ξ 1 and α remains constant on [ t1 ; t1 + δt 1 ].Expanding the integral that defines y α (see Equation (4.11)) between [0, t 1 ], [t 1 , t1 ] and [ t1 , t1 + δt 1 ], simple computation yields Hence, it is obvious to see that we can find a sufficiently large δt 1 such that y α ( t1 + δt 1 ) > 0 since U ′ (ξ 1 ) > 0. We continue the parametrisation α until t 2 , time at which x ⋆ i 2 is reached.By construction, y α (t 2 ) > 0. Now, one can repeat the same argument by induction to find α such that y α (t k ) = 0 for all k ≤ p + 1 such that x α (t k ) = x ⋆ i k .Thus, at time t p+1 , x α (t p+1 ) = x ⋆ j and y α (t p+1 ) = 0.It remains now to join z ⋆ j = (x ⋆ j , 0) without hitting ∪ k =i,j g k and this can be achieved exactly as in dimension d > 1.
It is now possible to compute the estimations of the invariant measure μ∂g ε of the skeleton chain.The key estimation of the transition probability of ( Zn ) n∈N denoted P ε (z, .) is as follows.
Proposition 4.4 For any γ > 0, there exists a sufficiently small ρ 0 and ρ 1 satisfying 0 < ρ 1 < ρ 0 such that with the definition (4.2) and (4.4), we have for ε small enough The proof is a simple adaptation of the proof of Lemma 2.1 and 2.2, chapter 6 of (Freidlin & Wentzell, 1979) in view of our three Lemmas 4.1, 4.2, 4.3 and of Proposition 4.3.

{i}-Graphs and invariant measure estimation
We recall that {i}-Graphs for Markov chains are defined in paragraph 2.4.2. for any finite set {z ⋆ 1 , . . ., z ⋆ ℓ } and the set of all possible {i}-Graphs is referred as G(i) .According to this definition, we can set W(z ⋆ i ) = min and as pointed in Lemma 4.1 of Freidlin and Wentzell (1979), one can check that We are now able to obtain the main result of this paragraph.From the skeleton representation (Proposition 4.2) and from the estimations given by Lemma 4.3 and Proposition 4.4, one may describe the asymptotic behaviour of ν ε as ε −→ 0. The result is as follows.
Theorem 4.1 For any γ > 0, there exists ρ 1 satisfying 0 < ρ 1 < ρ 0 such that if g j = B(z ⋆ j , ρ 1 ): for all i ∈ {1 . . .ℓ}.As well, in terms of W , we get that The proof of this theorem is straightforward according to the previous results: the invariant measure ν ε only weights small neighbourhoods of global minima of W , when ε → 0. Such global minima are appropriately described using the quasipotential I and the function W obtained through the {i}-graph structures.
5 Minoration and Majoration of the rate function with a doublewell landscape in R This last part is devoted to the proof of Theorem 2.3.Here, we concentrate on a one dimensional potential U with a double-well profile and on the memory gradient system with fixed memory parameter λ.In this case we obtain the precise behaviour of our measure ν ε when ε → 0. From Freidlin and Wentzell estimates, we can see that ν ε concentrates on the minima of W (this set is also the minima of W).Here we want to show that ν ε concentrates on the global minimum of U .To this end, we consider of a double-well potential U whose two minima are denoted by x ⋆ 1 and x ⋆ 2 .In order to show the final result, one needs to compare the costs . Without loss of generality, we fix x ⋆ 1 < x ⋆ 2 and assume that there exists a uniquelocal maximum x ⋆ of U such that x ⋆ 1 < x ⋆ < x ⋆ 2 and U ′ (x ⋆ ) = 0, U "(x ⋆ ) < 0. We assume moreover that U (x ⋆ 1 ) < U (x ⋆ 2 ).Such potential U is represented in Figure 2. We first describe how one can provide a lower bound of the cost I(z ⋆ 1 , z ⋆ 2 ).We propose two approaches using sharp estimates of particular Lyapunov functions.In the next subsection, we adopt a non-degenerate approach where the main idea is to project the drift vector field onto the gradient of the Lyapunov function.However, even if the idea seems to be original, the bounds are not very satisfactory (see Proposition 5.1).In Subsection 5.2 we propose a second approach which provides better bounds (see Proposition (5.2)).

Minoration using a non-degenerate approach
Let (β, γ) ∈ R 2 , in this section, we consider the following Lyapunov function defined by For the sake of simplicity, we will omit the dependence on β and γ and denote by L this function.
Here, the main idea relies on the fact that ∇L corresponds to a favoured direction of the drift b.This will allow us to control the L 2 cost to move the system from z ⋆ 1 to z ⋆ 2 .First let us remark that the cost I is necessarily bounded from below by the L 2 cost for an elliptic system.In particular, in the elliptic context the L 2 cost is defined as which can also been written as (5.1)As a consequence, since the set of admissible control for the degenerate cost I T is contained in the set of admissible controls for I E,T (v is forced to be 0 in Equation (5.1)), we easily deduce that I T is greater than I E,T .This way, a lower bound of I E,T will yield a lower bound for I T .Now, let u and v be admissible controls for I E,T , we have (5.2) Adapting the approach developed in Chiang et al. (1987), we shall use the Lyapunov function L to bound from below the term above (somehow the Lyapunov function L will play the role of U ). Indeed, if ∇L = 0, one can decompose b as follows where b ∇L(z) is the orthogonal projection of b on the direction ∇L.In the special case ∇L = 0, we fix b ∇L(z) = 0 so that Equation (5.3) makes sense for any z.Let us now remark that This way, if one can find β and γ such that there exists α > 0 such that (5.4) then it is possible to conclude that for all T > 0 Now, remark that for admissible controls, (z(t)) t≥0 moves continuously from z 1 to z 2 and there exists t ⋆ such that x(t ⋆ ) = x ⋆ .We then obtain In the definition of L, if β ≥ 0, one obtains a lower bound of the cost of the form If β ≤ 0, the only available minoration is obtained taking t = T and we then get the weaker bound The next proposition provides a lower bound of the cost in the (restrictive) case of subquadratic potential U (that is under H Q− ).
At last, we have Proof : The idea is to optimize the ratio given in Equation (5.4) for the largest possible α.Such ratio can be written as a quadratic form on y and U ′ (x).This way, an algebraic argument based on a simultaneous reduction of these quadratic forms allows to obtain a suitable calibration for M and α.
Let us first compute the projection of b(z) on ∇L(z) when it does not vanish.We can expand b(z); ∇L(z) as a quadratic form on variables (U ′ (x), y).

b(z)
we then obtain (5.5) In the same way, one can compute that where In order to have an equal equilibrium between the repelling effect on U ′ (x) 2 and y 2 on the quadratic form defined by M 1 , a natural choice for β and γ would be Hence, we set The end of the proof then falls into an algebraic argument : denote (a, b) = (U ′ (x), y), we are looking for a bound similar to (5.4) with the larger possible α .The projection of b on ∇L can be expressed as where q M 1 and q M 2 are the two quadratic forms defined from expressions (5.5) and (5.6).To bound the ratios of these two quadratic forms, remark that M 1 is invertible except if M = 0 which is a rather trivial case.Then, M 1 is symetric and positive definite as well as M 2 is nonnegative and symetric.It is possible to use a simultaneous reduction of q M 1 and q M 2 .We denote ρ 1 and ρ 2 the eigenvalues of M −1 1 M 2 associated to eigenvectors e 1 , e 2 which are an orthonormal basis for q M 1 , if (ã, b) are the coordinates in this basis we have , and the minimum of With our choice of β and γ and setting ξ = U "(x)/M ∈ [−1; 1], simple algebra yields For small M , it is immediate to show that M −1 1 M 2 becomes independent of ξ and that .
For large M , the maximum eigenvalue of M −1 1 M 2 is reached for ξ = 1 and one obtains again after tedious computations that .
For any M > 0, the coefficient α(M ) is then obtained by 1 ρ and the result holds.At last, remark that for any large λ, the matrix M −1 1 M 2 becomes diagonal with the two eigenvalues 2 and 0. This proves that lim λ→+∞ α λ (M ) = 2.
5.2 Minoration using a degenerate approach (Proof of (ii) of Theorem 2.3) In order to take into account the degeneracy of the dynamical system for the control cost, we directly bound the terms in the integral of I T (z 1 , z 2 ) by a gradient of a suitable Lyapunov function.This may lead to better estimates since obviously in the previous paragraph we use a minoration technique based on elliptic argument.
Let α > 0 and (β, γ) ∈ R 2 , here we consider the Lyapunov function defined by We are looking for an ideal choice of (α, β, γ).For all ϕ ∈ H(R + , R d ), we set u = φ.If u denotes any admissible control and (z(t)) t≥0 the corresponding controlled trajectory, we aim to obtain a bound such as for all ϕ ∈ H(R + , R d ), we have (5.7) Recall that t ⋆ is the first time such that x reaches the local maximum of U (i.e.x(t ⋆ ) = x ⋆ ).Such lower bound is usefull especially if α is positive, largest as possible and β non-negative.Indeed assume that we can obtain lower bound of the form (5.7), we then have for all (5.8) The next proposition shows that indeed such minoration (5.7) holds for some suitable choice of β, γ.In some case, this minoration is almost optimal.
Proof : In order to obtain a minoration such as (5.7), we fix any t > 0 and any admissible control u.Dropping the time parameter, note that, we have Let us now define M 1 and M 2 as for all x ∈ R.This way, we can write Remark that again, the product yU ′ (x) is essential in the structure of the Lyapunov function since it creates some repelling effect in M 2 in variables y 2 and U ′ (x) 2 .Without this term, there is no chance to obtain positiveness of M 1 − 2M 2 .Moreover, we immediately get that β and γ should be positives.
It is then sufficient to obtain that the symmetric matrix S := M 1 − 2M 2 is positive.We again introduce a parameter ξ but this time it is easier to manipulate ξ = U "(x) ∈ [−M ; M ].The matrix S becomes: S is positive if and only if principal minors are positives.
Hence, ∆ 3 is a quadratic polynomial of ξ, it is impossible to obtain the positiveness of ∆ 3 (ξ) for all ξ ∈ R (this justifies we suppose that U " ∞ < +∞).For any α > 0, we aim to maximise the absolute values of the roots of ∆ 3 among the convenient choices of β and γ.By a symmetry argument, it is easy to check that one should have necessarily B = 0 since in this case the roots of ∆ 3 are opposite.Thus the parameter β can be expressed in terms of α and γ: and for this choice, the roots of ∆ 3 are Note also that for this choice, we obtain ∆ 3 (0) = −α 2 + 2λγ(1 + (α − 1) 2 ) − γ 2 (4(1/α − 1) 2 )λ 2 .
When M is large, the admissible values for α vanish and our lower bound becomes useless.When M → 0, we obtain I(z 1 , z 2 ) ≥ 2[U (x ⋆ ) − U (x 1 )] which is optimal in view of the upper bound constructed in the next paragraph (it is obviously better than the bound obtained in Proposition 5.1).The evolution of admissible α is shown in Figures 3 for several values of λ.
As announced in the beginning of the section, one may remark that the second approach is clearly more efficient than the first one.However, we chose to keep the first approach since the idea can be of interest in a more general context, especially in an elliptic case with a drift vector field which is not a gradient.

Upper-Bound for the cost function (Proof of i) of Theorem 2.3)
Remind that we assume that there are two local minima for U denoted by x ⋆ 1 and x ⋆ 2 with U (x ⋆ 1 ) < U (x ⋆ 2 ) and a local maximum denoted by x ⋆ .We set again z ⋆ 1 = (x ⋆ 1 , 0), z ⋆ 2 = (x ⋆ 2 , 0) and z ⋆ = (x ⋆ , 0).In this particular setting, we want to obtain an upper-bound for I(z ⋆ 2 , z ⋆ 1 ) and then for W .This is the purpose of the next proposition.
Note that this point is a direct consequence of Lemma 4.1 if U ′′ (x 0 ) = 0.However, we choose to give below an alternative proof under a less constraining assumption, based on the turning implies that √ U is sublinear).Under (H Q− ) we also have sup x V (x, y) < +∞, and since p ∈ 0, 1 we have This way, there exist α > 0, β such that for all ε ∈ [0, 1] and all (x, y), we have AV p (x, y) ≤ β − αV p+a−1 (x, y).
Throughout this paper, we denote by U : R d → R a smooth (at least C 2 ) function on R d and coercive, i.e.