Convergence of the kinetic annealing for general potentials

The convergence of the kinetic Langevin simulated annealing is proven under mild assumptions on the potential U for slow logarithmic cooling schedules, which widely extends the scope of the previous results of [15]. Moreover, non-convergence for fast logarithmic and non-logarithmic cooling schedules is established. The results are based on an adaptation to non-elliptic non-reversible kinetic settings of a localization/local convergence strategy developed by Fournier and Tardif in [7] in the overdamped elliptic case, and on precise quantitative high order Sobolev hypocoercive estimates


Introduction and main results
Given a potential U : R d → R, the goal of a simulated annealing procedure is to minimize U by designing a stochastic process (X t ) t 0 whose law at time t is close to the probability density proportional to e −βtU (x) dx, where β : R + → R + is called the cooling schedule.As β goes to infinity, this probability law concentrates around the global minimizers of U .
The most classical case is based on the overdamped Langevin process: where (B t ) t 0 is a standard Brownian motion on R d .For a fixed β, the density e −βU (x) dx is a stationary measure for this process.The idea is thus that, if β increases sufficiently slowly, the law of X t gets and remains close to its instantaneous equilibrium.As a consequence, the convergence of the simulated annealing algorithm, in the sense of convergence in probability of U (X t ) toward min U as t → +∞, is related to the longtime convergence to equilibrium of the process at a fixed but high β.On the contrary, when β goes to infinity too fast, the algorithm is expected to fail with positive probability, i.e. the law of X t is not expected to be close to e −βtU and U (X t ) to converge to min U .Proof of the convergence of the overdamped Langevin simulated annealing for slow logarithmic cooling schedule (with the optimal condition involving the critical height of the potential, see below) and of the non-convergence for fast logarithmic cooling schedule was first established by Holley, Kusuoaka and Strook [10,9], using Sobolev inequalities, for a potential U on a compact manifold.The case of R n has been studied by Chiang, Hwang and Sheu [5], Royer [17] and Miclo [13], under restrictive conditions on the behavior at infinity of U , in particular |∇U | → +∞ at infinity.These conditions are related to the functional inequalities used in these works, in particular spectral gap and Nelson hypercontractivity inequalities.The question of reducing these assumptions in order to consider slowly-growing potentials has been addressed by Zitt in [22], essentially by replacing spectral gap inequalities by weaker functional inequalities.The results of Zitt apply for instance if, outside some ball, U (x) = |x| α with α ∈ (0, 1).
More recently, Fournier and Tardif in [7] and with one of the author in [6] have been interested in somehow minimal conditions on the growth of U at infinity.In [6], they established that, for coercive potentials in the sense that x • ∇U (x) 0 for |x| large enough, there is a phase transition for U α (x) = α ln(1 + ln(1 + |x| 2 )) at some value α * of α (depending on the cooling schedule β and the dimension), i.e. there is convergence if α > α * and non-convergence if α < α * , which is related to the transient properties of Bessel processes.More generally, convergence of the annealing algorithm is also proven in [6] under conditions that allow arbitrarily slow growth.In [7], the convergence of the simulated annealing is established as soon as lim |x|→∞ U (x) = ∞ and R d e −α 0 U (x) dx < ∞ for some α 0 > 0. Although it doesn't cover all the cases of [6] (notice indeed that the condition is not met for U α for any α > 0), this is a very simple and mild condition.One of the main differences of [7,6] with respect to previous works is that the question of the recurrence of the process is treated separately from the question of convergence in probability to the minimum of U .Indeed, once recurrence is proven, it is essentially sufficient to use the known results of convergence in the compact case to conclude.Notice that, unfortunately, this localization argument does not provide a rate of convergence as did previous works.Note as well that the idea that the behavior of U at infinity is not so important already appears in [5,17] (see in particular [5,Lemma 6.4]) where it is proven that it is sufficient (under the conditions enforced in these works) to prove the result in the case where U (x) = |x| 4 for |x| large enough.
On the other hand, the second author studied in [15] the simulated annealing based on the kinetic Langevin process: For a constant β, the invariant measure is proportional to e −βH with the Hamiltonian H(x, y) = U (x) + |y| 2 /2.The use of this process, which is non-reversible and has a ballistic rather than diffusive behavior, is motivated by its better convergence properties with respect to the overdamped process (although, as discussed in [15], in the regime β → +∞, it doesn't reduce the critical height of the energy landscape).Convergence of the kinetic simulated annealing in established in [15] for slow logarithmic cooling schedules similar to the overdamped case, under restrictive conditions on U , namely U is essentially quadratic at infinity, and the Hessian of U is bounded.The proof is similar to the overdamped Langevin case except that establishing quantitative longtime convergence estimates (at a fixed β) for the process toward its equilibrium rely on socalled hypocoercive methods, as introduced by Villani in [20].The arguments of [15] have been adapted to Generalized Langevin processes in [4].The present work is concerned with the kinetic Langevin simulated annealing.Our contributions with respect to [15] are the following.First, following the method of [7], the conditions on U are considerably weakened.Notice that, in the kinetic case, it means we can consider potentials that grow slower than those considered in [15], but also potentials that grow much faster (arbitrarily fast in fact), while the results of [15] require U (x)/(|x| 2 + 1) and ∇ 2 U to be bounded.Second, we prove the failure of the algorithm with fast cooling schedule, which was yet to be established in an hypocoercive case.Indeed, the failure of the overdamped Langevin simulated annealing is proven in [9] thanks to hypercontractivity results that are not available for the kinetic process, and the hypocoercive convergence results used in [15] to prove convergence are too weak to conclude about the failure of the algorithm in the fast cooling case (see the discussion at the beginning of Section 2.4).By proving the non-convergence of the algorithm under the same condition as in the overdamped case, we make rigorous the heuristic discussion in [15] according to which the kinetic process does not change the optimal condition on the cooling schedule.Third, as noted in [4], a technical truncation argument in [15], required for the rigorous computation of the modified entropy dissipation, is incorrect, and we have solved this issue (see Section 4 and more precisely Remark 2).
More precisely, we study the Markov process (Z t ) t 0 = (X t , Y t ) t 0 on R 2d that solves where γ : R + → R + is a friction parameter.We retrieve the settings of [15] with γ t = β t .
We remark that the extension to Generalized Langevin processes as in [4] would not raise any particular difficulty, but we don't consider it for the sake of clarity.We will work under the following set of assumptions : and there exists α 0 > 0 such that : • The cooling schedule β : R + → R + is given by for some parameters c > 0 and β 0 > 0.
• The friction γ : R + → R + is a C 1 function and there exists κ > 0 such that for all t 0, γ t κ and γ t 1 κ(1+t) .In particular there exists L such that γ t Lβ t .• The critical height c * of U is finite, where c * = sup x 1 ,x 2 c(x 1 , x 2 ) and where the infimum runs over ξ The condition min U = 0 is imposed for simplicity, it can always be enforced by changing U to U − min U .The specific form of β is also made for simplicity, since it is known that, in order to study the convergence in probability for large time for simulated annealing algorithm, only logarithmic schedules are relevant.In particular, notice that, under Assumption 1, the time-shifted process (Z t 0 +t ) t 0 for any t 0 0 satisfies Assumption 1 with the same U, c, κ, L and with β 0 replaced by β t 0 .
The critical height c * represents the largest energy barrier the process has to cross in order to go from any local minimum to any global one.In the classical overdamped case, disregarding the question of the behavior of U at infinity, it is well known that, at least in the case where the global minimizer of U is unique, the algorithm converges if c > c * (slow cooling) and has a positive probability to fail (i.e. to never visit a global minimum) if c < c * (fast cooling).We retrieve this dichotomy in the kinetic case.
In the slow cooling case, we extend the results of [15]: Under Assumption 1, assume furthermore that c > c * .Then any solution (Z t ) t 0 of (1) satisfies ∀ δ > 0, P(H(Z t ) δ) → 0. Since H(x, v) = U (x) + |v| 2 /2, this implies the convergence in probability of X t to the set of global minimizers of U .
On the other hand, if c < c * , then the process might remain stuck in a region that contains no global minimum of U .In fact, a slightly stronger condition is required.Indeed, it is possible that c * > 0 with all minima of U being global, in which case H(Z t ) may go to zero with fast cooling schedule, while the law of Z t is not close to its local equilibrium.To be more precise, we need some additional definitions.For x, y ∈ R d , let where the infimum runs over ξ ∈ C [0, 1] , R d , ξ(0) = x, ξ(1) = y .We define the depth of x ∈ R d by > 0 and a ∈ (0, D(x)), we define the cup of bottom x and height a (this is the vocabulary of [8]) as We want to discard pathological cases where there are two non-global local minima 1 (these cases do not prevent the result to hold, but the proof doesn't work directly, see Remark 1 below. Notice that the problem is not that there are two local minima with the same energy level and depth, which is a pretty common situation as soon as there are some symmetries in the system; the problem is that the elevation c(x 1 , x 2 ) between them is exactly the parameter c chosen by the user in the cooling schedule which, now, is a very unlikely situation).Hence, we work under the following condition.
If U has a finite number of minima and a unique global minimum, it is easily seen that there exists a non-global minimum x such that D(x) = c * .More generally, since for all a < D(x) the minimum value of U over the cup C(x, a) is U (x), if there exists a non-global minimum x of U with depth D(x) > c, the previous results implies that, with positive probability, inf t 0 U (X t ) U (x) > min U .Moreover, even if the probability to start in C(x, c) is initially zero, due to the controllability of the process, it is positive for all positive times (see e.g.[15, proposition 5]).As a conclusion, we immediately get the following: Corollary 3.Under Assumption 2, for all initial condition and all t 0 > 0, Notice that, in practice, one can keep track of X s(t) where U (X w )}, so that X s(t) may converge to a minimizer of U even if X t does not.However, our results show that this doesn't solve the issue of non-convergence for fast cooling schedules.
Remark 1.In Figure 1, in the case c = b (so that Assumption 2 do not hold) we cannot deduce from our results that the process stays stuck with positive probability in the cusp C(x, b) because, for any δ > 0, C(x, b + δ) contains y (contrary to C(x, b)).In fact it is clear that the process can stay stuck with positive probability in C(x, b + δ) for any δ > 0 (so that the conclusion of Corollary 3 also holds), but we cannot deduce it from our proof which requires that the critical depth within the cusp (i.e. for a suitable modification of the potential which only consider the local situation of this cusp, see Section 2.4 and Figure 2) is strictly smaller than c (in order to apply a variation of Theorem 1), while it is exactly b = c in this example.We refer to [14] where a fine analysis is conducted on a related question on finite graphs.
Finally we address the case of faster than logarithmic cooling schedules.For simplicity we restrict this study to the case of a constant friction parameter γ (although as discussed at the end of the proof it can be extended to the non-constant case with suitable conditions on γ depending on β).In the following we do not assume that t → β t is increasing, and possibly β t = +∞ for some t.
the following holds.For all δ > 0 and all initial condition z 0 = (x 0 , y 0 ) ∈ R 2d with |x 0 − x * | δ/2, the solution Z = (X, Y ) of (1) is such that The rest of the paper is dedicated to the proofs of Theorems 1, 2 and 4. It is organised as follows.The main step of the proofs in the logarithmic case are exposed in Section 2, while technical intermediary results are postponed to Sections 3, 4 and 5.More precisely, Section 3.1 is dedicated to the proof of a uniform in time energy bound, which is the main ingredient in the proof that the process goes back infinitely many times to a compact set.A result of small-time conditional regularization is proven in Section 3.2, which is used to replace deterministic initial conditions by smooth distributions (with some quantitative bounds).Section 4 presents hypocoercivity estimates similar to those of [15], which are used to prove the convergence of the algorithm in the slow cooling case.In the fast logarithmic cooling case, similar estimates have to be established in higher order Sobolev norms, which is the topic of Section 5. Section 6 is dedicated to the faster than logarithmic case, with the proof of Theorem 4.

Main steps of the proofs
The sketch of the proofs is the following.
The first point is to get uniform in time moment estimates.This would classically be done using Lyapunov arguments, but this would typically require some assumption on ∇U , which we want to avoid.We adapt an argument from [7].From these estimates, we get that lim inf t→+∞ H(Z t ) is almost surely finite, i.e. there exists a (random) compact set {H A} which will be visited infinitely often by the process.
The second step is to prove that, for any A > 0, there exists A > A such that, provided the process is in {H A} at some time t 0 , there is a probability at least, say, 1/4, that the process remains in {H A } for all times t t 0 .This is reminiscent of the study in [9] of fast cooling schedules, where it is proven that there is a positive probability that, starting in a potential well of depth larger than c, the process never climbs high enough to exit the well (as in Theorem 2).We will follow a similar proof, except that we have to check the dependency of the estimates with respect to t 0 or, equivalently by taking t 0 = 0, to β 0 .That way, we will conclude that, each time the process goes below A, it has a probability 1/4 to never go above A again so that, if it goes below A infinitely often, eventually it will stay below A .
Combining the two previous steps, we get that the process is almost surely bounded.It is thus sufficient to prove the convergence of the process when the position space is a compact torus, which is then similar to [15], but without the issue of the behavior of U at infinity.
The strategy for the non-convergence in the fast cooling case is similar to [9], the technical difficulties coming from the degeneracy of the process.Indeed the estimates used in [15] in the kinetic case to prove convergence are too weak to get that the process has a positive probability to stay forever below some energy level.On the other hand the estimates of [9] are based on hypercontractivity of elliptic diffusions on compact manifold.We are not aware of similar results for hypoelliptic diffusions, and thus we overcome this difficulty by working with higher Sobolev norms.
The faster than logarithmic case is similar and somehow simpler: in that case the process has a positive probability to converge to (x * , 0), it is thus sufficient to linearize ∇U at this point and to study the corresponding Gaussian process.

Return to a compact set
Diffusions defined by (1) are time-inhomogeneous Markov processes with generator : First, let us check that the process is well-defined.
Proof.Since the coefficients of the diffusion (1) are all smooth, there is existence and uniqueness until a time ξ of explosion.For all x, y ∈ R d and t 0, Considering for N ∈ N the stopping time τ N = inf {t 0; H(Z t ) N }, we get hence ξ < +∞ almost surely, which concludes.
The first main point is to to strengthen the energy bound of Proposition 5 to a uniform in time estimate.For f a probability law or density on R 2d , we write E f and P f expectations and probabilities with respect to the process Z solving (1) with initial condition Z 0 distributed according to f .If f = δ z for some z ∈ R 2d , we write E z and P z instead.Lemma 6.Under Assumption 1, there exists b > α 0 that depends only on U such that the following holds.Provided β 0 b then, for any C ∞ probability density f 0 with compact support, The proof is postponed to Section 3.1.The next result enables the use of Lemma 6 when the initial condition is not smooth.Lemma 7.Under Assumption 1 with c > c * , fix A > 1 and ε > 0. Then there exist t * , b A , C 1 A , C > 0 that do not depend on β 0 such that, for all β 0 b A and z 0 ∈ R 2d with H(z 0 ) A, the following holds.Writing B = sup t t * H(Z t ) A + 1 , Moreover, the law at time t * of the process (1) with initial condition Z 0 = z 0 conditioned on the event B has a density f c t * that satisfies This is proven in Section 3.2.The two previous lemmas yield the following.
Proof.For t large enough, β t b where b > α 0 is the constant from Lemma 6.Notice that, for t 0 0, (Z t 0 +t ) t 0 solves (1) except that β 0 is replaced by β t 0 and γ is also timeshifted.Hence, by the Markov property, without loss of generality, we can assume that β 0 b.Moreover, by conditioning with respect to the initial condition, it is sufficient to consider the case of a deterministic initial condition z 0 ∈ R 2d .
Fix ε > 0 and A = H(z 0 ).From Lemma 7, there exist t * , C > 0 such that P z 0 (B) 1 − ε where B = sup t t * H(Z t ) A + 1 , and such that the law of the process at time t * and conditioned on B has a density ft * C 1 {H A+1} .Let C > 0 and f 0 be a C ∞ probability density on R 2d with compact support such that Cf 0 C 1 {H A+1} .Then, denoting by (F t ) t 0 the filtration associated with (Z t ) t 0 , by the strong Markov property and Lemma 6, for all t 0, As a consequence, by Fatou's Lemma, Finally which concludes since ε is arbitrary.

Position in a compact set
As in [7], with Proposition 8 at hand, we can now focus on the behavior of the process when the position is in a compact set, ignoring the behavior of U at infinity.To this end, we fix some parameter we consider periodic boundary conditions, which will be technically simpler than e.g.reflecting boundary conditions).We now define a process More precisely, write θ K : R d → M K the canonical projection and MK = ]−L K , L K ] d , so that M K = θ K ( MK ).For a non-negative function V : R d → R, we define the critical height c * (V ) with the same definition as c * except that U is replaced by V .
Let Ũ K ∈ C ∞ (R d ) be equal to U on some open set O K containing {U K}, nonnegative, 2L K -periodic, and such that c * K = c * ( Ũ K ) c * .Such a function exists from [7, Notation 9].We write U K the corresponding function on M K , given by U K (θ K (x)) = Ũ K (x), and Finally, given the same Brownian motion as in (1), we consider with Then, by design, For β > 0, write µ K β the probability measure on M K × R d with density proportional to e −βH K .Lemma 9. Fix δ, α > 0. There exists C > 0 such that for all β > 0, Proof.For completeness, we recall the short proof of [15, section 3.1].First, for some C > 0. The ratio of those two inequalities concludes the proof.
The goal of this section is to prove a similar result but with the law of Z K t instead of µ K β , with an explicit dependency of the constants in term of β 0 .Denote by f K t the law of Z K t .Since ∇U K and its derivatives are all bounded, we will be able to show that f K t is smooth, see Section 4. Consider the relative density We start with a uniform in time quantitative bound of the norm of h K t in L 2 µ K t , for nice initial conditions.Proposition 10.Under Assumption 1 with c > c * , let K 1.There exists b K > 0, which depends on U and K, such that, if The proof of this proposition is postponed to Section 4.
There exist C A > 0 and b A b A that do not depend on β 0 such that, for all β 0 b A and z 0 ∈ R 2d with H(z 0 ) A, we have that, , and for all t 0 In the rest of the proof we denote by C different constants.Let us introduce a function which will serve as a new initial condition.Let v d and w d be respectively the volumes of {H A + 1} and Notice that f 0 does not depend on β 0 , and that ∇f 0 is bounded since f 0 is smooth on a compact set.As in the proof of Lemma 8, from Lemma 7 and the strong Markov property, where Z K A ,β t * is a process similar to Z K A except that β 0 has been replaced by β t * and (γ t ) t 0 by (γ t+t * ) t 0 .Denote by h * t the relative density of the law of with respect to µ K A β t+t * .By the Cauchy-Schwartz inequality, For the second term, applying Lemma 9 with α = 1, 1) .For the first term, Proposition 10 applies if β 0 b K A , which yields for some C > 0 that does not depend on β 0 , where we used Assumption 1 and a uniform bound on f 0 and ∇f 0 .For any β 1, using that e −βU K A (x) 1, Hence, Everything put together gives, if , using the monotonicity of t → β t , we conclude with In fact, a similar proof already yields the convergence of the kinetic annealing on the compact torus: Proposition 12.Under Assumption 1 with c > c * , for all K > 1 and δ > 0, Proof.As in the proof of Proposition 8, by the Markov property, without loss of generality we assume that β 0 b K and that the initial condition is a Dirac mass at some z 0 ∈ M K × R d .Fix ε > 0, let A = H(z 0 ) and, from Lemma 7 (applied to the process Z K rather than Z, the proof is the same) let t 0 , C > 0 be such that the event B = {∀t ∈ [0, t 0 ], H(Z K ) A + 1} has a probability larger than 1 − ε and the law of We then have: where Z is a process similar to Z K A except that β 0 has been replaced by β t 0 and γ has been time-shifted.Denoting by ft the law of Zt (with initial condition f 0 ), Proposition 10 ) is bounded.As in the previous proof, applying the Cauchy-Schwarz inequality and Lemma 9 with α = δ/2 then yields : As a consequence, which concludes since ε is arbitrary.

Localization and convergence
Building upon the results of the previous section, we can now prove the following.
Proposition 13.Under Assumption 1 with c > c * , fix some A > 1.There exist b A > 1, K A > A which depends on A, U , c, and κ but not β 0 such that, for all β 0 b A and all initial condition z 0 ∈ {H A}, Proof.It is enough to show the same result for the process Z K A since, from (8), We use the definitions and notations of Lemma 11.Let Recall the definition B = sup t t * H(Z t ) A + 1 .From Ito's formula, where M t is a local martingale and R t = t t * L s Φ A (Z K A s )ds.We then get that for β 0 b A , We take β 0 large enough to get that, by Markov's inequality, P(E|B) 3/4, where , so that Consider the stopping time . As a conclusion, As announced, combining this result with Proposition 8 yields the following.
Proof.For A > 0, let It is sufficient to show that, for all A > 0, (Z t ) t 0 is bounded on Ω A since, from Proposition 8, P (∪ A 0 Ω A ) = 1.Hence, we fix A > 0. Let b A , K A be as in Proposition 13 and S 0 = t A where t A satisfies From Proposition 13 and the Markov property, where we used that β T k β t A b A for all k ∈ N.This implies that a.s.there exists J ∈ N such that S J = ∞.On Ω A , T J is finite and sup t T J H(Z t ) K A , which concludes.
We can now finally prove Theorem 1.

Full process on a compact space
One of the main point of the proof of Theorem 2 is essentially to get something of the form for suitable initial conditions z 0 .First, let us highlight some of the difficulties in order to motivate the rest of this section.Refining the proof of Proposition 13, we would obtain a similar result but with c replaced by 4c.The factor 4 is due to two things.First, in Lemma 11, we prove a bound of order 1/t 2 while the proof of Proposition 13 in fact only requires an integrable bound, i.e. 1/t 1+δ for any δ > 0 is enough.The second factor 2 is lost when the Cauchy-Schwartz inequality is used in (9).To solve this, the Cauchy-Schwartz inequality has to be replaced by the Hölder inequality, which means the L 2 estimate of Proposition 10 is not sufficient and we need L p estimates for all p > 2, or even better, L ∞ estimates (besides, in [15], the convergence is stated in relative entropy, which is weaker than L p for any p > 1, which is why we said in the introduction that these results were not sufficient to conclude in the fast cooling case).In fact in the elliptic case the proof of [9] relies on L ∞ bounds.In order to get such bounds in an hypocoercive case, we will work with H k -Sobolev norms for k 1, as in the work [21] of Zhang, and then use Sobolev embeddings.This should be done with a correct control of the dependency in time of the constants, and to do so it is convenient to work with a process (both position and velocity) in a compact manifold.This is done by replacing the Hamiltonian H(x, y) where W is some periodic potential with W (y) = |y| 2 /2 below some threshold.This raises an issue in the dissipation of the hypocoercive modified entropy.Indeed, in the modified H 1 -norm of Villani (and similarly in the modified H k -norm of Zhang), the key point is that the missing coercivity in x is recovered through a term where [A, B] = AB − BA stands for the commutator of A and B. When H is replaced by H K , this term becomes which is negative at maxima of W .For this reason, and since anyway we are not really interested in the process above some energy threshold, we will add some Brownian noise in the position variable where W (y) = |y| 2 /2.
As a conclusion, for these reasons, in this section, we consider a process where B is as in (1), B is another independent d-dimensional Brownian motion, and σ, W and U K are 2L K -periodic non-negative functions on R d (with The term σ∇U K has been added so that, for fixed γ, β, this new process admits the explicit stationary measure µ K β ∝ e −βH K (x,y) dxdy that satisfies a Poincaré inequality (see Section 5).Here and in all this section, when there is no ambiguity we identify 2L K -periodic functions on R d and functions on M K .Let us now give the precise definition of L K , σ, W and U K .In all this section we consider fixed U ∈ C ∞ (R d ), β, γ, x ∈ R d and a > 0 satisfying Assumption 2. Setting K = a+1, similarly to Section 2.2 we fix some Cσ 0 for some C > 0. Set σ = rσ 0 where r > 0 is chosen small enough that ∇σ ∞ ∇U K ∞ 1 2 and |∇σ| 2 σ.The useful properties of W and σ can be summarized as follows : Lemma 15.
• Seeing W as a periodic function from • On M K , W has a unique local minimum, which is global.
• There exist n ∈ ( √ 2K, m) and σ * > 0 such that, seeing σ as a periodic function from We write By construction of U K , W and σ, if we consider Z and Z K the solutions respectively of ( 1) and (10) with the same initial condition in C(x, c) and the same Brownian motion B, then H(Z t ) = H K (Z K t ) for all t τ := inf{s 0, H(Z s ) a}.In particular, for δ > 0 small enough so that c which means we are lead to prove that the left hand side has a positive probability.Denote by f K t the law of the solution of (10), write µ K β = Z K β −1 e −βH K (z) dz where Z K β makes µ K β a probability measure, and let Similarly to Section 2.3, the key point is the following estimate, proven in Section 5.
Lemma 17.Under Assumption 2, for all probability density f K 0 ∈ C ∞ (M K × M K ) and δ > 0, there exists C > 0 such that for all t 0, Proof.Applying Proposition 16, for t 0 for some C > 0, where we used Lemma 9 with α = δ/2.
Lemma 18.Under Assumption 2, for all probability density f K 0 ∈ C ∞ (M K × M K ) and δ > 0, there exists t b > 1 such that Proof.It is enough to show the result for δ satisfying c + δ < a.Let B the event There exists some constant , and from Ito's formula we can write : s ds, and if t b t 0 : where the constant C depends only on U K and its derivative, but not on t b .Since (1 + ln(1 + t))/(1 + t) 1+ δ 8c is integrable, we can take t b great enough so that the event E = sup t t b |R t | 1 10 has probability at least 3  4 .On E ∩B, M t takes value in − 1 10 , 11 10 because 0 ψ A 1. Using Doob's up-crossing inequality as in the proof of Lemma 13, we get that the probability of Ψ K going to 1 knowing B is less then 1 2 .We conclude by :

Non-convergence with fast cooling schedules
We are now ready to prove Theorem 2. In this section, Assumption 2 is enforced and we use the definitions and notations of Section 2.4.
We start with a result on the position of the process for small times, as well as a Doeblin-like condition, which will be proven in Section 3.2 : Lemma 19.For all t, δ > 0, write : Then, for all z 0 = (x 0 , y 0 ) with x 0 ∈ C(x, c), P z 0 (B t ) > 0, and more precisely for all compact set K included in the interior of C(x, c + δ) × R d , there exists ε > 0 such that where stands for the Lebesgue measure.
Proof of Theorem 2. By conditioning on the initial condition, it is sufficient to prove the result with a fixed initial condition z 0 = (x 0 , y 0 ) with x 0 ∈ C(x, c).Moreover it is sufficient to prove the result for δ > 0 small enough.We consider a fixed δ > 0 such that c a, which means (11) holds and a probability density and t b > 0 as in Lemma 18 applied with δ/2 instead of δ such that: In other words, denoting by f the law at time t b of the process solution to equation (10) with initial condition f K 0 and conditioned to {H K (Z t b ) c + δ/2}, and (Z K,t b ) t 0 the solution to equation (10) with initial condition f at time t b , Let B t b be as in Lemma 19.From Lemma 19 and the fact f has a bounded density on its support {H K c + δ/2} (which we see as a subset of [−L K , L K ] 2d ), there exists ε > 0 such that Finally, thanks to (11), we conclude that 3 Auxiliary results

Uniform energy bounds
This section is dedicated to the proof of Lemma 6.First, we consider a family of approximation functions η m ∈ C ∞ c for m 1 in order to justify some PDE computations below.Let Φ(s) = e Proposition 20.Assume U → +∞ at infinity and ∇U is bounded.
• There exists C > 0 such that L t η m Cβ t /m and |∇η m | Cβ t /m for all m 1.
Proof of Lemma 6.We first show the result when ∇U and its derivative are bounded, the general case being then obtained by approximating U .Hence, suppose for now that ∇U and its derivative are bounded.In this case, it can be shown that the law of the process at time t admits a bounded density f t such that , see the proof of Lemma 24.Define g t (z) = f t (z)e βtH(z) and u(t) = E (H(Z t )).Since f and U are smooth, so is g.Consider From the inequality ln(1 + x) ln(x) + 1/x for all x > 0, Since f t is bounded on [0, T ] × R 2d , and is in L 1 (dz) for all t 0, t → R 2d f t ln(f t ) is locally bounded and so is N (t).In order to differentiate N , we introduce for m 1 the approximation Integrating by parts, we see that the dual of the generator L t in L 2 (e −βtH ) is f t H = β t u(t).
Consider the carré du champ operator Γ t associated to L * t given by for smooth g 1 , g 2 , and Γ t (g) := Γ t (g, g).Using that e −βtH is invariant for L * t so that L * t ge −βtH = 0 for all smooth g and the diffusion property for any smooth φ, we get The first term is negative (since η m Γ t (g t ) 0) and the others are bounded as follows: thanks to Proposition 20.We conclude that for all m 1, 0 s t : By the monotone convergence theorem, N m (t) → N (t) as m → ∞, hence : On the other hand, from the variational formula for the entropy: we get with g 0 = e −α 0 H /Z α 0 , where Z α 0 = e −α 0 H , As a consequence, writing φ(t) = N (t) + ln(Z α 0 ) for t 0, Then Gronwall's lemma gives φ(t) φ(0) βt−α 0 β 0 −α 0 and using again that u(t) φ(t)/(β t −α 0 ) concludes the proof in the case where ∇U and all its derivatives are bounded.
Let us now consider the general case, without any assumption on ∇U .For n large enough, we fix some U n ∈ C ∞ (R d ) equal to U on B(0, n), to |x| outside of B(0, n + 1) and such that for all x ∈ R d , U n (x) min(U (x), |x|) − 1.Let (Z n t ) t 0 = (X n t , Y n t ) t 0 be the diffusion defined by Equation ( 1) where U is replaced by U n and starting from the same initial distribution f 0 .By design, Z t and Z n t are equal up to the time τ n = inf {t 0; |X t | n}.As Z does not explode in finite time, lim n τ n = ∞ and, for all t 0, lim n H n (Z n t ) = H(Z t ) almost surely, where H n (x, y) = U n (x) + |y| 2 /2.By Fatou's lemma, for all t 0, where κ β 0 ,n (f 0 ) is define as κ β 0 ,n (f 0 ) but with U replaced by U n .Now, since f 0 is compactly supported, κ β 0 ,n (f 0 ) is independent of n for n large enough.Finally, using that e −β 0 Un(x) e α 0 (e −α 0 U (x) + e −α 0 |x| ), we can apply the Dominated Convergence Theorem to get that Z n α 0 → Z α 0 as n → ∞, which concludes.

Small time regularisation
Proof of Lemma 7. It is enough to show the first point for ∇U bounded.Indeed, for all Ũ equal to U on {U A + 1}, and ( Z) t 0 the corresponding process, we have the equality We only need a bound on sup t t * |Y t − y 0 | because : A with probability larger than 1 − ε.Thus, with probability at least 1 − ε, we have : .
Since A > 1, this is less than |y 0 | 2 + 1 2 and we write t For the second point, fix some t * t * 2 and let, for t t * , ( Xt , Ȳt ) be the solution of the system Ito's formula gives for t t * : Girsanov's theorem yields that, under the change of probability P → Q, ( Xt , Ȳt ) t t * is a solution to the original Equation ( 1) and, for all φ 0, t t * , It only remains to show that ( Xt , Ȳt ) has a density bounded by some e Cβ 0 .As a Gaussian process, the density of ( Xt , Ȳt ) is bounded by (2π) −d/2 / det(Q d ) where Q d is the covariance matrix of the process at time t * .By independence, det(Q d ) = (det(Q 1 )) d , and a straightforward computation yields Now, let , λ > 1 be such that λ 2 < 4/3 and take t * small enough (depending only on c and κ) so that uniformly in β 0 1 This choice ensures and thus the result.
Proof of Lemma 19.The first part follows from the controlability of the process, see [15, proposition 5].The second part follows from the fact the distribution of the process killed when it leaves C(x, c + δ} × R d solves a parabolic hypoelliptic Dirichhet problem, hence has a continuous positive density.In fact, in the time-homogeneous case, this is exactly [12,Theorem 2.20].In our time-inhomogeneous case, we can proceed as follows.First, for some A > max(|y 0 |, A) large enough, consider D the interior of C(x, c + δ} × [−A , A ] and τ = inf{s 0, Z t / ∈ D}.Then, for all t > 0, It is well-known that p D t solves a parabolic equation with generator L t on D. Since L t is hypoelliptic, p D t (z 0 , •) has a continuous density.Finally, the coefficients of L t being smooth and bounded on D, p D t (z 0 , •) being not identically zero and the process being controllable, we can use the strong maximum principle of [18,Theorem 6.1] to deduce that p D t (z 0 , •) cannot take the value 0 in D. As a continuous positive function, p D t (z 0 , •) is thus lower bounded by a positive constant over any compact subset of D, which concludes.

L 2 -hypocoercivity
In this section we use the definitions and notations of Section 2.2 The hypocoercivity issue arises in the proof of Proposition 10 when computing the evolution of the L 2 -norm of h K t .In the standard elliptic case, as in [9], one would simply differentiate this quantity and concludes with a Poincaré inequality of the form : where Γ is the carré du champ associated to the process.However, in the kinetic case, as we saw in (13) , which means such an inequality cannot hold since, for non-constant functions of x, the left hand side is positive while the right hand side vanishes.For this reason, we work with a modified norm as in [20].
More precisely, at a formal level, the proof of Proposition 10 is the following: writing Differentiating Ñ , one can (formally) check that for some constant C > 0, the definition of Ñ being motivated by the − Ĩ term in this inequality.Using a Poincaré inequality for µ K βt (with the full gradient rather than Γ t ) with a constant λ(β t ) that scales as 1/t c * /c β t β t , we get that Ñ (t) 0 for t large enough (or equivalently for all t 0 if β 0 is large enough), hence Ñ is bounded.We conclude the proof of Proposition 10 by bounding In the remainder of this section, this formal proof is made rigorous and we give the details of the computations.
Remark 2. The proof of [15] is based on a similar argument but, as mentioned in the introduction, and as noticed by the authors of [4], it contains an error.The problem occurs when it comes to justify rigorously the derivation of Ñ .In [15], a compactlysupported truncation function is added within the integral.This leads to additionnal terms in ∂ t Ñ .One of these terms is said to be non-positive in [15, Lemma 16], which is false (there is a sign error).In [4], the authors add a small elliptic term to the dynamics, use elliptic regularity results to justify the computation and then let the small ellipticity parameter vanish afterwards.In this section, we make a correct version of the argument of [15], combining some bounds on the density of the process (Lemma 24 below) and some moment estimates (Lemma 23 below).Also, notice that, by comparison with [15,4], we have already reduced the problem to a compact state space for the position x, the noncompact part of the dynamics only concerns the velocity.
First, we need a few preliminary lemmas.We start by stating the following Poincaré inequality (with the full gradient) : and moreover lim For µ a probability measure, H 1 (µ) denotes the usual Sobolev space of functions in L 2 (µ) with derivative in L 2 (µ).

Conclusion follows by integrating with respect to x.
The next two lemmas will be used in forthcoming computations to justify that some quantities are finite and therefore allowing to interchange differentiation and integration.Lemma 23.Fix some K > 0 and any c > 0. Then for all α > 0 and initial condition is finite and locally bounded.
Proof.Write τ N = inf t 0, H K (Z K t ) N and φ t (x, y) = e (βt−α)H K (x,y) .For β 0 > α, we compute for t ∈ [0, T ] : We can choose β 0 great enough so that βt 0 for all t 0. We then classically get for t ∈ [0, T ] : The fact that f K 0 has a compact support and Fatou's lemma then yield the result.
Lemma 24.Fix K, c > 0. Then there exists b K such that if β 0 b K , the law f K t of the process defined by the Equation (7), with an initial condition f K 0 ∈ C ∞ with compact support, is smooth, bounded along with its derivative, and satisfies : and t → C t is locally bounded.
Proof.First, by Ito's formula, f K t (x, y) is a weak measure solution of the forward equation where Lt is the generator of the process ( X, Ỹ ) solving Using again Ito's formula, we get that the function given by (x, y, t) → e d t 0 γsds E x,y f K also solves equation (15).Uniqueness of the weak solution of ( 15) is ensured by [3, Theorem 9.8.7], hence from which we immediately get that f K t is bounded uniformly over [0, T ] for all T > 0. Thanks to [11, Theorem 1], f K t is smooth for all t 0. We can always differentiate (15) in a weak sense (i.e.once integrated with respect to a smooth compactly supported function of time and space and formally integrating by parts), from which we get that ∇f K t is a weak solution of where Lt acts component-wise on ∇f K t and which is bounded uniformly over [0, T ] for all T > 0. For a fixed T > 0, considering (W t ) t 0 the matrix-valued process solution of dW t = W t J T −t ( Xt , Ỹt ) with W 0 = I d and using again the uniqueness of the weak solution of the PDE, we get the Feynman-Kac representation which can be obtained by ItÃ´'s formula for the process ( Xt , Ỹt , W t ) t 0 applied to the test function h(x, y, w, t) = w∇f K t (x, y).Hence, ∇f K t is uniformly bounded over [0, T ], and the same argument applies to all derivatives of f K t .Now, let us prove the Gaussian bound of the statement, starting with f K t .For 2 .Then for any z = (x 1 , y 1 ) ∈ R 2d : where ω 2d is the volume of the unit-sphere in dimension 2d.Then there are two possibilities.If In any case, using the fact that t → E e H K (Z K t )/2 is locally bounded for β 0 great enough, we have the result for f K t with α = 1 4(d+1) .Recall the definition of the approximation functions η m from Proposition 20.We use them here with U replaced by U K (or equivalently with H replaced by H K ), but it doesn't change any of their properties since now x is in a compact set.We now turn to ∇f K t by using the approximation functions η m from Section 3.1 and integrating by parts: for some constant C > 0 independent from m, where we used uniform bounds on the two first derivatives of f K t .We can then conclude in the same way as for f K t .
The main technical tool that we will need in order to study the evolution of those functionals will be, given φ : C ∞ → C ∞ , quantities of the form : where D h (φ) is the pointwide differential of φ, see [16].The reason is that for regular enough h, by writing L K β,γ the generator (6) on M K with fixed β and γ, we have This kind of quantities has been studied in [16], where the author showed among other things : where Γ(( Proof.This is [16,Example 3].Let us simply recall the key idea (to alleviate notations we omit the generator in the subscript of Γ).First, Γ is linear in φ, hence for φ as defined above: where Γ is the classical "carré du champs" Γ(h) = γ t β −1 t |∇ y h| 2 .We then have the following equality: where the brackets denotes the commutators of two operators.An elementary computation concludes.
Proof of Proposition 10.First, β 0 must be great enough so that we can apply all previous lemmas of Section 4. Let's first justify why Ñm , Ñ , Ĩm and Ĩ are finite.They are all finite at time 0 because f 0 has compact support.For m ∈ N, Ñm and Ĩm are finite for all times as integrals of continuous functions with compact support.
Write ∇ * t = −∇+β t ∇H K the dual of the gradient operator ∇ in L 2 µ K βt .Integrating by parts, for some α > 0 and C t > 0 locally bounded and independent from m, using Lemma 24 and that the derivatives of f K t are bounded.Lemma 23 and the monotonous convergence of Ĩm towards Ĩ then implies that Ĩ is locally bounded.We conclude that Ñ is also locally bounded thanks to the Poincaré inequality of Proposition 21.As in (12), we have that the dual in L 2 (µ K β ) of the generator L t is With the regularity result from Lemma 24, and the compactly supported approximation functions η m , one can differentiate N m and I m to get for all m: From Lemma 25, we get Γ φ β,γ (h) 1 2 |∇h| 2 + Γ t ((∇ x + ∇ y )h), and since Cβt m , we get : Now we look at the derivative in β, knowing that ∂ β Z 0 by writting : This gives We now have to treat all those terms.We will use that N m N .First, since U K is bounded, for some C > 0 (in the rest of the proof we denote by C several constants which do not depend on t nor m).From Proposition 22, applied with g = h K t , we get : Again, because U K is bounded : Using again Proposition 22 we get : Finally, using that a.b Z β , we get : and we conclude as the second term.Finally, since γ only appears in the definition of σ, we have : From the previous computation, and from the fact that γ t Lβ t , we get : By integration, we get for all 0 s t: We consider β 0 great enough so that 1 − Cβ t β t 0 for all t.Using monotonous convergence, the fact that Ĩ Ĩm , and Fatou's lemma we get : Now, from the Poincaré inequality of Proposition 21 and the definition of σ, for some λ satisfying 1 β ln( λ) → −c * , so that we can find a λ 0 such that where α = (c−c * )/(2c).Taking β 0 large enough so that Cβ t (1+β 2 t ) λ 0 /4(e cβ 0 +t) −1+α , we have finally obtained that, if β 0 bK for some bK , then for all 0 s t and this concludes (as explained at the beginning of this section).

H k -hypocoercivity
In all this section, whose goal is to prove Proposition 16, we use the definitions and notations of Section 2.4.In particular, f K t stands for the law of the process defined by Equation (10), µ K β = e −βH K (z) dz/Z K β where Z K β makes µ K β a probability density on M K , and h K t = f K t /µ K βt .Similarly to the previous section, according to [19], f K t is a smooth function and solves but here the dual in L 2 (µ βt ) of the generator of the process is where the dual are taken in L 2 (µ K β ): Indeed, the additional diffusive part in x has been designed to be reversible.In order to prove Proposition 16, we introduce for m ∈ N the classical H m -Sobolev norms on The general strategy is similar to the L 2 case of Section 4, namely we will prove that h K t − 1 H m goes to zero for all m, and conclude by a Sobolev embedding for m large enough.The constant in the Sobolev embedding depends on the time t which should be compensated by the fact h K t − 1 H m goes to zero fast enough.As in L 2 case, we need to introduce some modified Sobolev norm to deal with the lack of dissipativity in the x variable in some part of the space.Following [21], for m ∈ N, we consider a modified H m -Sobolev norm of the form for some weights ω to be fixed later on.Recall the definition If y ∈ M, then we obtain ∇ x in the derivative of and for y / ∈ M, it comes from the σ(y)∇ * x ∇ x part of L K, * t .Here we used the notations We also define Ñm (t) = N m (t, β t ).Since the process is now in a compact set, we do not need the approximation functions η m , and the subscript here indicates the order of the Sobolev norm in contrast to the previous section.Besides, in Section 4, we had to keep track of the dependency of the constant in β 0 , as some uniformity in time was necessary in Proposition 13 for the renewal argument of the proof of Proposition 14.This is no longer the case here.
In order to study the evolution of N m , we need first to have some commutation results and control over the derivative of the L 2 -norm of ∇ m 1 x ∇ m 2 y h as in Lemma 25.Here, ∇ m 1 x ∇ m 2 y h denotes the vector of all derivative of h of order m 1 on x and m 2 on y: The method used here is adapted from [21] to take into account the time-inhomogeneity and the new dynamic.Recall the notation ( 16) for generalized Γ operators.
Lemma 26.Let m 1 , m 2 ∈ N, then there exists some constant θ > 0 such that for all h ∈ C ∞ M 2 K : Proof.As in [15, Lemma 10], we have for smooth h : Then first : Then, we can write Then, using Cauchy-Schwarz inequality, Putting those last two lines together and using Cauchy-Schwarz we get : This concludes the proof.
Lemma 27.For all m ∈ N there exists some constant θ > 0 such that for all smooth h, denoting Proof.The derivative of P m is given by : We then have to study the two terms in the integral separately at first, by using commutators.For any smooth h : where we used that |∇σ∇U K | 1 2 .Similarly, we have : We conclude with the fact that : In order to state the main lemma, we introduce the following set : where P stands for polynomial.The main step in order to prove an analogous to Proposition 10 in higher Sobolev norms is then the following dissipation result.
Lemma 28.For all m 1, there exist q m , r m , (ω i,m ) i m and ω m some functions of β, all in P, such that if and : Proof.The proof is by induction.In fact we start at m = 0, setting simply Hence, which is the result for m = 0 and r 0 (β t ) = β −1 t min(1, κ).One could also have initialized the induction for m = 1 as in Proposition 10.Now, let's fix m ∈ N and suppose we have the result for m − 1 : We set with weights to be determined later on, so that In this equality, using the two previous lemmas, the terms of order m + 1 are bounded by: In order to get (18), we would like this to be less than −r m (β t ) Using that |∇σ| 2 σ, this is indeed the case if we impose the conditions ω m,m (β) 4βω m (β) + βω m 1 ,m and ω i,m (β) r m β max(κ −1 , 1 2 ).Next, the terms of order at most m − 1 are bounded by : which means, similarly, in order to get (18), we want to impose the conditions max 0 i m ω i,m θ(1+ γ t ) , ωm 2 , β −1 min(κ, 2)ω i,m ).These choices ensures that all the conditions in the computations above are met, which means that (18) holds.Moreover, all these functions are in P, which concludes.
Similarly to Proposition 21 but now in the fully (position and velocity) compact case, we have the following Poincaré inequality: Proposition 29.If U K : M K → R is a C ∞ function, W is as constructed in Section 2.4, and µ K β is the probability measure on M K × M K proportional to e −βH K , then there exists λ : R + → R + such that for all f ∈ C ∞ (M K × M K ) : We are now ready to prove Proposition 16.
From the Sobolev embedding, there exists C > 0 such that for any smooth h on . Then we have : .
Applying this with h = h K t and using that h K t − 1 H m (µ K β t

)
Cβ a t Ñm (t) for some a, C > 0, we get constants C, b > 0 such that : yielding the result.

Faster than logarithmic schedules
Proof of Theorem 4. Some parts of the proof follow the proof of Theorem 2, to which we refer for details, focusing on the new arguments in the present settings.
Up to a translation we assume without loss of generality that x * = 0.As in the proof of Theorem 2, it is in fact sufficient to prove that for all δ > 0 for a given fixed initial condition f 0 and for t 0 large enough and then use a controlability argument based on Lemma 19 and the Markov property to get the claimed result.Besides, it is sufficient to prove this result for δ small enough and, since the times in [0, t 0 ] are treated with the controlability argument, focusing on the times larger than t 0 , we are interested in an event under which the process stays in a small ball around 0, and we can thus modify U oustside such a ball without modifying the result.
Using that the law of Zt is Gaussian with zero mean, we get that there exist K, h > 0 such that for all t 0 This is integrable in time if we chose α 2 t = sup s t 2 ln(s)κ s /h (which is non-increasing).Since ln(t) = o(β t ) and κ t sup s 2t/3 1/(rβ s ) + e −rt/3 (1 + t sup u 0 1/β u ), we get that κ t = o(1/ ln(t)), in other words α t → 0. We conclude by using the previous bounds on αt and κ t and the fact that Remark: in order to adapt the previous proof in a case where γ t is not constant, we can still find M and r such that (19) holds with the Jacobian matrix of the drift at time t, but they depend on γ t , hence when differentiating Z t − Zt 2 M there is an additional term involving ∂ t M which has to be sufficiently small to be absorbed by the contraction at rate r.Moreover, r scales as γ t when γ t → 0 and as 1/γ t when γ t → +∞, which means for the proof to be valid, γ t or 1/γ t should not be too small depending on β t .

Figure 1 :
Figure 1: Case with two local non-global minima x and y at the same energy level.If c < b, then Assumption 2 holds by chosing either x = x or x = y and any a ∈ (c, b).If c ∈ (b, D(x)), again Assumption 2 holds with x ∈ {x, y} and any a ∈ (b, D(x)).However, if c = b, Assumption 2 does not hold with x ∈ {x, y}.

Figure 2 :
Figure 2: Construction of U K from U , with x chosen so that c * (U ) = D(x).
and such that, seen as a periodic function on R d , U K (x) = U (x) − U (x) for all x ∈ C(x, a).Such a function exists: indeed, since c is continuous, C(x, a) is a compact set, and then, under Assumption 2, sup{c(y, x), y ∈ C(x, a)} < c.We can thus choose δ ∈ (0, a − c) small enough so that sup{c(y, x):y ∈ C(x, a)} + 2δ < c and, using that the boundary of C(x, a) is in {U be obtained by a straight line from y to some y * ∈ C(x, a) with c(y, y * ) 2δ and then a path to x, where Assumption 2 is used.See Figure 2 for an illustration of the construction of U K .Concerning the new potential W for the velocity, first, write n = √ 2K + 1 and m = √ 2K + 2.Then, fix some 2L K -periodic W 1 ∈ C ∞ (R, R) such that for all s ∈ [−m, m], W 1 (s) = s 2 /2, W 1 is symmetric on [−L K , L K ], and increasing on [0, L K ].For y and notice that, by construction, for y ∈ M K \ M, σ(y) = σ * .Moreover, if |y| 2 /2 K then W (y) = |y| 2 /2 and σ(y) = 0.

− 1 s
to get the quantitative convergence speed ε t stated in the theorem.
M t∧σ is a bounded local martingale, hence a martingale.Moreover, since Φ A takes values in [0, 1], M t = M t∧σ for all t 0 on E ∩ B. By Doob's up-crossing inequality, for any T > 0, the probability that M t up-crosses [1/10, 9/10] before time T is bounded by βtH , g t solves ∂ t g t = β t Hg t + L * t g t .Using that η m is compactly supported for all m 1, we can differentiate N m (t) = m − β t Hg t ln(1 + g t )e −βtH + ∂ t g t ln(1 + g t )e −βtH + g t 1 + g t ∂ t g t e −βtH = R 2d η m L * t g t ln(1 + g t ) + t 1 + g t β t He −βtH .