Large deviations of empirical measures of diffusions in weighted topologies

We consider large deviations of empirical measures of diffusion processes. In a first part, we present conditions to obtain a large deviations principle (LDP) for a precise class of unbounded functions. This provides an analogue to the standard Cramér condition in the context of diffusion processes, which turns out to be related to a spectral gap condition for a Witten–Schrödinger operator. Secondly, we study more precisely the properties of the Donsker–Varadhan rate functional associated with the LDP. We revisit and generalize some standard duality results as well as a more original decomposition of the rate functional with respect to the symmetric and antisymmetric parts of the dynamics. Finally, we apply our results to overdamped and underdamped Langevin dynamics, showing the applicability of our framework for degenerate diffusions in unbounded configuration spaces.


Introduction
Empirical averages of diffusion processes and their convergence are commonly studied in statistical mechanics, probability theory and machine learning. In statistical physics, an observable averaged along the trajectory of a diffusion typically converges to the expectation with respect to its stationary distribution, which provides some macroscopic information on the system [74,84]. For reversible dynamics, this convergence is known to be characterized by an entropy functional [106,7], which generalizes results for small fluctuations such as the central limit theorem [75] or Berry-Esseen type inequalities [91]. It has been shown for some time that the approach can be extended to Large deviations of empirical measures of diffusions in weighted topologies derive a variational representation of the rate function similar to the Donsker-Varadhan formula [33]. This provides a variational representation of the principal eigenvalue for any non-symmetric linear second order differential operator associated with a diffusion, under confinement and regularity conditions. To the best of our knowledge, there is no such formula in an unbounded setting, a fortiori for unbounded functions. Finally, it has been shown in a pioneering work [15], for a specific choice of dynamics, that the above mentioned duality allows to decompose the rate function into two parts: one corresponding to a "reversible" part and the other to an "irreversible part" of the dynamics. We extend these results to general diffusions by using Sobolev seminorms, a feature inspired by the small fluctuations framework developed in [75]. This decomposition turns out to be useful for various purposes. For illustration we apply it to study more precisely the rate function of the Langevin dynamics, in particular its dependence on the friction both in the Hamiltonian and overdamped limits.
We now sketch the main results of the paper, the precise setting being presented in Section 2.1.

Main results
Consider a diffusion process (X t ) t 0 over a state space X ⊂ R d with generator L, invariant probability measure µ, and empirical measure L t := 1 t t 0 δ Xs ds, t 0, (1.1) where δ x is the Dirac measure at x ∈ X .
Our first contribution is to prove a large deviations principle for the empirical measure (L t ) t 0 in a weak topology associated with an unbounded function κ : X → [1, +∞). That is, we prove the following type of long time scaling: for Γ ⊂ P(X ), P L t ∈ Γ e −t infν∈Γ I(ν) , (1.2) where I is a rate function. Here, P(X ) denotes the set of probability measures on X , and the above scaling holds for the weak topology on P(X ) associated with measurable functions f satisfying As is standard for LDPs on unbounded state spaces [109,115], our result relies on the existence of a twice differentiable Lyapunov function W : X → [1, +∞) such that Ψ := − LW W (1.4) has compact level sets (in other words, it goes to infinity at infinity). Unlike previous works, where this condition implies the asymptotic equivalence (1.2) in the weak topology corresponding to the convergence of measures tested against bounded test functions [109,39,115], we show in Section 2 that the LDP holds for the weak topology associated with any cost function κ controlled by Ψ (see Section 2.1 for details). Moreover, the associated rate function I : P(X ) → [0, +∞], also called entropy, reads ∀ ν ∈ P(X ), I(ν) = sup where λ(f ) = lim t→+∞ 1 t log E e t 0 f (Xs) ds , (1.5) is the cumulant or free energy function.
We mention that our strategy relies on the Gärtner-Ellis theorem, according to which the existence and regularity of (1.5) implies the large deviations principle. We actually show that (1.5) is well-defined because it matches the principal eigenvalue of the Feynman-Kac operator (1.6) A key remark for defining the above operator is that the process is a local martingale, as noted by Wu in [115]. This allows to define (1.6) for functions ϕ such that ϕ B ∞ W < +∞, as soon as f is dominated by the function Ψ defined in (1.4). As a result, for any such f , the operator (1.6) can be shown to be compact over the space of functions controlled by W (see [55,47]), and the functional (1.5) is obtained as the largest eigenvalue of the operator (1.6) through a generalized Perron-Frobenius theorem (the Krein-Rutman theorem [27]).
The second part of our work consists in rewriting the rate function I. For this, we first show that ∀ ν ∈ P(X ), I(ν) = sup − X Lu u dν, u ∈ D + (L) , (1.8) where D + (L) is an appropriate domain defined in Section 3. This formula is similar to the one proved in [33], but differs by additional growth conditions in the definition of D + (L). This result leads to a variational formula for the largest eigenvalue λ(f ) of the operator P f t defined on a suitable functional space through We mention that the proof of (1.8) relies on the spectral problem associated with the Feynman-Kac operator (1.6), and uses tools from the recent work [47]. Finally, the variational representation (1.8) allows to generalize the results of [15] by splitting I into two parts. More specifically, denoting by L = L S + L A the decomposition into symmetric and antisymmetric parts of the generator considered on L 2 (µ), we obtain, for any ν µ: 2 Large deviations principle

Setting
This section introduces the main notation used throughout the paper. We consider a diffusion process (X t ) t 0 evolving in X = R d with d ∈ N \ {0}, and satisfying the following stochastic differential equation (SDE): The generator of the dynamics (2.1), denoted by L, reads where σ T denotes the transpose of the matrix σ and · is the scalar product on R d . Moreover, ∇ 2 stands for the Hessian matrix, and for two matrices A, B belonging to R d×d , we write A : B = Tr(A T B). The conditions on b and σ will be made precise in Section 2.2. The function S takes values in the set of symmetric positive matrices (not necessarily definite). We also introduce the carré du champ operator [5] associated with L defined by, for two regular functions ϕ, ψ: We will use the space C ∞ c (X ) (resp. C b (X )) of smooth functions with compact support (resp. continuous and bounded functions), as well as the space of smooth functions growing at most polynomially and whose derivatives also grow at most polynomially: x1 . . . ∂ α d x d with α = (α 1 , . . . α d ). The space of bounded measurable functions, denoted by B ∞ (X ), is endowed with the norm ϕ B ∞ = sup x∈X |ϕ(x)|.

Remark 2.2.
Note that the spaces (2.4) and (2.5) are defined for an arbitrary measurable function W 1. It is possible to weaken the assumption W 1 but we will not need these refinements in this paper.
We denote by τ -topology the weak topology on P(X ) associated with the convergence of measures tested against functions belonging to B ∞ (X ) (we may also use the notation σ(P(X ), B ∞ )); see [30]. This means that for a sequence (ν n ) n∈N in P(X ), ν n → ν in the τ -topology if ν n (ϕ) → ν(ϕ) for any ϕ ∈ B ∞ (X ). Recall that the τ -topology is stronger than the usual weak topology σ(P(X ), C b (X )) on P(X ), which corresponds to the convergence ν n (ϕ) → ν(ϕ) for any ϕ ∈ C b (X ). The τ -topology can be extended to account for convergence of measures tested against the larger class of functions ϕ ∈ B ∞ W (X ). We denote by τ W the associated topology σ(P W (X ), B ∞ W (X )), see [115,76]. We associate to the dynamics (X (x) (2.7) where E stands for the expectation with respect to all realizations of the Brownian motion in (2.1). Let us mention that, with some abuse of notation but for the sake of readability, we will not write out explicitly the dependence of X t on x in the proofs presented in Section 6, see the discussion at the beginning of this section. We say that This implies in particular that µ(Lϕ) = 0 for ϕ ∈ C ∞ c (X ), see [46,Proposition 9.2].
We now follow the path of [75,Chapter 2] for defining other useful functional spaces.
For any probability measure µ ∈ P(X ), let L 2 (µ) = ϕ measurable X |ϕ| 2 dµ < +∞ . (2.8) For ϕ ∈ C ∞ c (X ), we introduce the seminorm (2.9) and the equivalence relation ∼ 1 through: ϕ ∼ 1 ψ if and only if |ϕ − ψ| H 1 (µ) = 0. We denote by H 1 (µ) the closure of C ∞ c (X ) quotiented by ∼ 1 for the norm | · | H 1 (µ) . Note that H 1 (µ) and L 2 (µ) are not subspaces of each other in general, but H 1 (µ) ⊂ L 2 (µ) for instance if µ satisfies a Poincaré inequality and S is positive definite. The difference between L 2 (µ) and H 1 (µ) is however important for degenerate dynamics, see the application in Section 4.2. We now construct a space dual to H 1 (µ) with the same density argument by introducing the seminorm: for ϕ ∈ C ∞ c (X ), Large deviations of empirical measures of diffusions in weighted topologies In particular, when S = Id we have In this case, |·| H 1 (µ) is the standard H 1 (µ) Sobolev seminorm [83]. An in-depth discussion on the space H 1 (µ) and its use for proving central limit theorems for Markov processes is provided in [75,Chapter 2].
We also introduce some notation concerning the growth of functions. A function f : X → R is said to have compact level sets if for any M ∈ R the set x ∈ X f (x) M is compact (with the convention that ∅ is compact). A function g is said to be negligible with respect to f (denoted by g f ) if f /g has compact level sets, and g is said to be equivalent to f (denoted by g ∼ f ) if there exist constants c, c > 0 and R, R ∈ R such Remark 2.4. The above definitions are useful when the state space X is unbounded. A sufficient condition for f to have compact level sets in this case is for this function to be lower semicontinuous and to go to infinity at infinity (i.e. to be coercive). If X was bounded, all these criteria would be automatically met for smooth functions.
Finally, we denote by lim and lim the inferior and superior limits respectively, while for a subset A ⊂ Y of a topological space Y,Å andĀ denote the interior and closure of A for the chosen topology on Y. The function 1 A denotes the indicator function of the set A, i.e. 1 A (x) = 1 if x ∈ A and 1 A (x) = 0 otherwise. For a Banach space E, B(E) refers to the Banach space of bounded linear operators over E with the usual norm. We recall some elements of large deviations theory in Appendix A for the reader's convenience.

Statement of the main results
The large deviations principle relies on three standard assumptions: hypoellipticity of the generator, irreducibility of the dynamics, and a Lyapunov condition.
We start with our hypoellipticity assumption (which could certainly be relaxed for particular applications, see for instance [115]). It will be useful for proving regularity of the Feynman-Kac semigroup in Lemma 6.4. We denote by A † the adjoint of a (closed) operator A considered on L 2 (dx). Assumption 2.5 (Hypoellipticity). The functions b and σ in (2.1) belong to S d and S d×m , respectively, and the generator L defined in (2.2) satisfies the hypoelliptic Hörmander condition. More precisely, L can be written as where (A i ) d i=0 are first order differential operators with coefficients belonging to S such that the family spans R d at any x ∈ X for a finite number of commutators n x ∈ N.
This assumption is natural in practical situations, as illustrated in the applications of Section 4 covering elliptic and hypoelliptic diffusions, see [65,43,97] for details.
Note that excluding the operator A 0 from the first family means that, if L satisfies Assumption 2.5, ∂ t + L is hypoelliptic and the transition kernel of (X t ) t 0 has a smooth density for any t > 0.
The regularity requirement comes together with a controllability condition (recall that σ takes values in R d×m ). Assumption 2.6 (Controllability). For any x, y ∈ X and T > 0, there exists a control is well-defined and satisfies φ(T ) = y. Assumption 2.6 together with Assumption 2.5 implies that the process is irreducible, in the sense that any open set can be reached with positive probability, which will be used in Lemma 6.5. Note that constructing a control u ∈ C 0 ([0, T ], X ) may be difficult in general [70]. However, for the overdamped and underdamped Langevin dynamics we are interested in, building such a control turns out to be guenuinely feasible, see [86,105,97,83,85] and references therein. Let us mention that the above two assumptions are standard for proving LDPs [109,115].
A recurrent idea when studying Markov chain stability and large deviations on an unbounded state space is to reduce the analysis to a compact set and to control the excursions of the dynamics out of this set with a Lyapunov function [87,115]. Our Witten-Lyapunov condition for the dynamics reads as follows (for the terminology, see Remark 2.12 below). Assumption 2.7 (Witten-Lyapunov condition). There exists a function W : X → [1, +∞) of class C 2 (X ), with compact level sets and such that has compact level sets. Moreover, there exists a C 2 (X ) function W : X → [1, +∞) such that, for some constants C 1 > 0, C 2 ∈ R,

Large deviations of empirical measures of diffusions in weighted topologies
In all what follows, we consider an arbitrary function κ : X → [1; +∞) belonging to S such that: • κ Ψ; • either (i) κ bounded, or (ii) κ has compact level sets and there exists c ∈ R such that L(κW ) cκW.
(2.14) Remark 2.8. Note that the condition W 2 C 1 W implies in particular that W W . In addition, since κ Ψ and Ψ ∼ − LW W , it holds κ − LW W . These facts will be frequently used in the proofs. Moreover the conditions (2.13) are not restrictive for exponential-like Lyapunov function as shown in Proposition 2.9 below -the idea being that W can be set to √ W . The condition (2.14) also typically holds because W is chosen of exponential type while κ is a polynomial. In practice, the auxiliary function W is used to obtain some control in the proofs of Lemmas 6.2 and 6.4 (in particular to apply a Grönwall lemma). Assumption 2.7 could certainly be phrased differently, possibly with weaker conditions on the functions at stake.
Although we stated Assumption 2.7 in order to fit standard conditions when considering large deviations on unbounded state spaces [109,115], in practice it can be obtained from a non-linear Lyapunov condition in the spirit of [76] and [39,Condition 2.2]. This is the purpose of the next proposition, whose proof is postponed to Appendix B. Proposition 2.9. Assume that there exists V ∈ S such that: • V has compact level sets; • |σ T ∇V | has compact level sets; • for any θ ∈ (0, 1),

15)
Then Assumption 2.7 is satisfied with for θ ∈ (0, 1) and ε < θ/2 small enough. In this case it holds Moreover, condition (2.14) holds true for any function κ : X → [1, +∞) of class S such that either (i) κ is bounded or (ii) κ has compact level sets, satisfies κ Ψ and there exists C 0 with Lκ Cκ, |σ T ∇ log κ| C. (2.16) Note that (2.15) means that the term −LV coming from the dynamics must compensate the quadratic loss proportional to |σ T ∇V | 2 . We also mention that the condition (2.16) is not restrictive in general since it is typically satisfied by polynomial-like functions κ.
A first consequence of Assumptions 2.5 to 2.7 is the ergodicity of the dynamics, whatever the initial distribution for X 0 .  t ) t 0 admits a unique invariant probability measure µ ∈ P W (X ). This measure has a positive C ∞ (X )-density with respect to the Lebesgue measure: there exists ρ µ ∈ C ∞ (X ) with ρ µ > 0 such that µ(dy) = ρ µ (y) dy. Moreover, the dynamics is ergodic with respect to µ: there exist C, c > 0 such that Proof. The existence of a unique local strong solution is standard when Assumption 2.5 holds, see [96, Chapter IX, Exercise 2.10]. Assumption 2.7 then implies the existence of and global existence can be deduced from the above Lyapunov inequality [97]. The end of the proof is a direct application of [97,Theorem 8.9] since Assumption 2.6 together with Assumption 2.5 ensures irreducibility.
We can now present the large deviations principle associated with the empirical measure of the process (X s ds, (2.17) where δ y denotes the Dirac mass at y ∈ X . When one considers large deviations principles for empirical averages of the form (2.17), the topology on probability measures has to be specified. As mentioned in the introduction, most of the LDPs are stated in topologies associated with bounded measurable functions (resp. continuous bounded), the so-called strong topology or τ -topology (resp. weak topology). We now prove that, in our setting, a LDP holds in the τ κ -topology defined in Section 2.1, for any function κ satisfying Assumption 2.7. The proof of Theorem 2.11 is presented in Section 6.1. We recall that a rate function is said to be good if its level sets are compact.
Theorem 2.11. Suppose that Assumptions 2.5, 2.6 and 2.7 hold true, and consider a function κ as in Assumption 2.7 and x ∈ X fixed. Then, the functional does not depend on x, is well-defined, convex and finite, and (L (x) t ) t 0 satisfies a LDP in the τ κ -topology with the good rate function defined by: More precisely, for any τ κ -measurable set Γ ⊂ P(X ) and any x ∈ X , it holds − inf ν∈Γ I(ν) lim (2.20) where the interior and closure of Γ are taken with respect to the τ κ -topology. Finally, for any ν ∈ P(X ), it holds I(ν) = 0 if and only if ν = µ; and, for any sequence (t n ) n 1 such that t n / log(n) → +∞ as n → +∞, it holds almost surely in the τ κ -topology.
Our conclusion is in essence close to that of [76], but the conditions to reach it seem more natural to us and correspond to usual conditions for proving large deviations principles in an unbounded state space, see [115,39] and [109,Section 9]. In particular, they allow to derive the duality representation (2.19), and we do not need to consider non-linear operators. Our strategy (presented in Section 6.1) relies on the Gärtner-Ellis theorem [55,44,45,28], for which the existence of the free-energy (2.18) is a key element. The originality of our work is to make use of the local martingale (1.7) introduced by Wu [115] in order to solve the spectral problem associated with the Feynman-Kac operator, which proves the existence of the limit in (2.18). This directly provides the LDP in the τ κ -topology by duality. However, there may be cases in which a LDP holds although the conditions of the Gärtner-Ellis theorem are not satisfied, for instance in the framework of the Sanov theorem [111], so our conditions may not be necessary.
Let us also mention that, in addition to (2.21), we also show for completeness in the proof of Theorem 2.11 that (L Another advantage of our approach is to characterize precisely the set of functions for which a LDP holds from the standard condition on Ψ defined in (2.12), like in [31,109].
This condition is also used in [115,Corollary 2.3] for proving a level 1 LDP for Langevin dynamics. We present below a clear connection with a spectral gap condition for the Witten-Schrödinger operator in the reversible case. The comparison with Cramér's condition for independent variables highlights the effect of correlations on fluctuations.
where V : X → R is a smooth potential with compact level sets. The generator of this dynamics is L = −∇V · ∇ + ∆ and its invariant probability measure reads µ(dx) = for some θ ∈ (0, 1). This is a standard choice for obtaining compactness of the evolution operator [97,Section 8], and optimal control representations of rate functions [39], see also Proposition 2.
9. An easy computation shows that However, we also know [112] that the generator L considered on L 2 (µ) is unitarily equivalent to the operator In this case, the condition for (2.22) to have compact level sets when θ = 1/2 is actually equivalent to a confinement condition (or spectral gap condition [63]) for the Witten-Schrödinger operator L defined in (2.23). In that sense, Assumption 2.7 is a natural generalization of a spectral gap condition for the Witten Laplacian in the case of possibly non-reversible dynamics. This is why we call Assumption 2.7 a Witten-Lyapunov condition.
We now compare this Witten-Lyapunov condition to Cramér's exponential moment condition in the case of independent variables of law µ. Consider a smooth potential V (x) ∀ θ ∈ R, X e θκ dµ < +∞.
This simple example shows that considering a correlated system instead of independent variables has a non-trivial effect on the stability of the system. Depending on the confinement potential, the Witten-Lyapunov condition for (2.22) to have compact level sets can be more or less restrictive than Cramér's condition for independent variables distributed according to the invariant measure µ. Finally, we remark that for q ∈ (1, 3/2), the process is heavy-tailed in the sense that 2(q − 1) < 1 and the observable f (x) = x (assuming d = 1) does not satisfy a LDP. In other words, the average position of the process defined by 1 t t 0 X s ds cannot be shown to satisfy a large deviations principle at speed t with our arguments. We finally mention that, in the case where the observable f grows faster at infinity than the potential Ψ, it seems possible to derive a level 1 large deviations principle at a speed smaller than t. We refer to [90] for a recent account dealing with the case of an Ornstein-Uhlenbeck process, and to [16,2] for related issues.
We close this section with a practical corollary of Theorem 2.11 which generalizes the level 1 LDP proved in [ is well-defined and differentiable, and does not depend on x. Moreover, L (x) t (f ) satisfies a large deviations principle in R at speed t with good rate function given by ∀ a ∈ R, I f (a) = inf I(ν), ν ∈ P(X ), ν(f ) = a , (2.25) Large deviations of empirical measures of diffusions in weighted topologies where I is defined in (2.19). Finally, it holds (2.26) Corollary 2.13 is useful for practical applications, since (2.26) is a natural way to estimate the rate function I f associated with an observable f , see for instance [56,101,104,23,48].

Decomposition of the rate function
Our goal in this section is to rewrite I in various ways, which is useful for theoretical understanding and practical purposes. In Section 3.1, we first show an extension of the standard Donsker-Varadhan formulation for I. This result is obtained by making use of the spectral analysis of the operator P f t for f ∈ B ∞ κ (X ), which is presented in Section 6.1. We then apply this result to obtain a variational representation for the principal eigenvalue e tλ(f ) of P f t . Next, in Section 3.2, we split the expression of the rate function according to the symmetric and antisymmetric parts of the dynamics, extending the work [15] to general diffusions. Such a decomposition will prove useful in Section 4 to compare the entropy of overdamped and underdamped Langevin dynamics. Most of the proofs of this section are postponed to Section 6.2.

Donsker-Varadhan variational formula
We start with the variational representation of the entropy. Our proof, which can be found in Section 6.2.2, is an adaptation of [30, Lemma 4.2.35] relying on the Feynman-Kac semigroup and its spectral elements. In order to state the result, we need to make sense of Lu for functions u ∈ B ∞ W (X ). It turns out that the appropriate notion to this end is the extended domain D(L) of the generator L considered as an operator on B ∞ W (X ), defined in the following way: a function ϕ ∈ B ∞ W (X ) belongs to D(L) if and only if there exists a measurable function φ : X → R such that, for any x ∈ X , t 0 P s |φ|(x) ds < +∞, In this case we write φ = Lϕ (with some abuse of notation in view of the definition of L as a differential operator in (2.2), but of course the expressions coincide when ϕ is a smooth test function with compact support).
When the τ -topology is considered, such extended domains were already considered for instance in [114,115,76], see also [26,Chapter I,Definition 14.15]. For the unbounded functions we consider, one should think of φ = Lu as an element of B ∞ κW (X ) (see the proof of Lemma 6.10 below, as well as the comments following Proposition 3.1). The integrability condition (3.1) is reasonable in this context since (P t ) t 0 is a well defined semigroup on B ∞ κW (X ) in view of the Lyapunov condition (2.14).
We can now present the main result of this section.
In particular, the functional defined in (3.3) is equal to +∞ if ν / ∈ P κ (X ) or ν is not absolutely continuous with respect to µ.
This result is standard when X is compact [33], but does not seem to be known for an unbounded space X and for the τ κ -topology we consider. In this situation the space D + (L) has to be designed with some caution. Note that D + (L) is not empty since it contains the functions of the form u = e ψ for ψ ∈ C ∞ c (X ). Note also that the last statement of Proposition 3.1 is consistent with the Fenchel definition (2.19) of the rate function. In order to get some intuition on the formula (3.3), let us mention that the proof formally relies on replacing the maximum over functions u ∈ D + (L) by the supremum over eigenfunctions h f satisfying The above equation rewrites, since h f > 0 (see Lemmas 6.6 and 6.10), By integrating with respect to a measure ν ∈ P κ (X ) we find ( κW (X ) (as the sum of an element in B ∞ W (X ) and the product of a function in B ∞ W (X ) and another one in B ∞ κ (X )), which allows to define Lh f in the weak sense (3.2). A natural consequence of Proposition 3.1 is the following variational representation for the cumulant function. The proof, postponed to Section 6.2.3, relies on the convexity of the cumulant function to invert the Fenchel transform (2.19).

Corollary 3.2.
Suppose that Assumptions 2.5, 2.6 and 2.7 hold true, and consider Corollary 3.2 may seem anecdotal, but it provides a variational representation for the principal eigenvalue of non-symmetric diffusion operators, as pioneered by Donsker and Varadhan in their seminal paper [33] for a compact space X . To the best of our knowledge, this formula had not been shown in an unbounded setting, for which we need to introduce the "generalized domain" D + (L) defined in (3.4). However, our set of assumptions implies that λ(f ) can be thought of as the largest eigenvalue of L + f , and turns out to be isolated for any f (because of the compactness of the resolvent provided by Lemma 6.6), whereas in [33], (3.5) may be the supremum of the essential spectrum of the operator. This suggests that (3.5) holds under weaker assumptions. A possible approach for generalizing our results may be to consider different methods for studying the long time behaviour of unnormalized semigroups, see for instance [20,6,21], or to resort to more subtle spectral analysis tools [113,116,53,13].

Entropy decomposition: symmetry and antisymmetry
Our goal is now to provide refined expressions for the rate function I in terms of symmetric and antisymmetric parts of the dynamics, inspired in particular by [15]. In the following, for any closed operator T , we denote by T * its adjoint on L 2 (µ), where µ is the invariant probability measure of the process, as obtained in Proposition 2.10.
Considering the generator L of the diffusion (2.1), we can always decompose it into symmetric and antisymmetric parts with respect to µ through It is important to note that L A is a first order differential operator (and therefore obeys the chain rule of first order differentiation). We assume here that the operators L, L A , L S admit C ∞ c (X ) as a common core (but the domains of these operators may be different).
The decomposition (3.6) allows to separate the rate function (3.3) into two parts. This is the purpose of the next key result, whose proof can be found in Section 6.2.4. It is inspired by the computations in [15, Proposition 2], which we simplify and generalize here through a variational Witten transform and the use of the Sobolev spaces introduced in Section 2.1. The algebra of the proof also suggests to consider I(ν) for probability measures ν of the form dν = e v dµ.
. Then, the rate function I defined in (3.3) admits the following decomposition: and (3.9) Theorem 3.3 expresses the rate function as the sum of dual norms of the symmetric and antisymmetric parts of the dynamics. Note also that we consider a measure of the form dν = e v dµ, that is the Radon-Nikodym derivative of ν with respect to µ is positive. However, we believe that we can consider more general measures ν, see Remark 6.12 in the proof. Since the measure ν at hand appears both inside the norms and in the definition of the norms themselves, a possibly clearer rewriting is the following: . Moreover, the symmetric part of the rate function (3.8) can be written as a Fisher information for the invariant measure µ, a standard result [55]: denoting by ρ = dν/dµ, it holds The next corollary builds upon (3.9) by rewritting I A using a Poisson equation, which can be manipulated more easily. The proof can be found in Section 6.2.5.

Corollary 3.4.
Suppose that Assumptions 2.5, 2.6 and 2.7 hold true, and consider a measure ν ∈ P κ (X ) such that dν = e v dµ with v ∈ H 1 (ν) and L A v ∈ H −1 (ν). Then, the antisymmetric part of the rate function (3.9) reads where ψ v is the unique solution in H 1 (ν) to the Poisson equation the symmetric matrix S being defined in (2.2) and ∇ denoting the adjoint of the gradient operator in L 2 (ν).
It has been known for a long time [33] that the rate function of a reversible process is a Fisher information as in (3.8). The antisymmetric part of the rate function has been less investigated, although an expression like (3.10) already appears in [55] (see also [98,15]). However, our setting provides natural well-posedness conditions for both parts of the rate function to be finite. Moreover, the uniqueness of ψ v is a consequence of the definition of H 1 (ν) through equivalence classes, see Section 2.1.
Interestingly, the solution ψ v of (3.11) can be formally represented through [83] ψ The stochastic process (X ν t ) t 0 associated with L ν is reversible with respect to ν. Denoting by e −Vν the density of ν with respect to the Lebesgue measure, (X ν t ) t 0 is solution to the following SDE: Finally (3.10) takes the form The antisymmetric part of the entropy is therefore the autocorrelation of L A v along a reversible process that realizes the fluctuation corresponding to the measure ν. From a mathematical point of view, it seems interesting to relate (3.12) to the so-called level 2.5 of large deviations [7,24], since this approach consists in considering joint fluctuations of the empirical measure and the associated empirical current. In this case, the large deviations function is explicit: this reflects the fact that a Markov process is characterized entirely by its density and current. Exploring further the connection between (3.12) and level 2.5 large deviations is an interesting direction for future works.

Remark 3.5.
It is also possible to consider the adjoint L * not with respect to the invariant measure µ (whose analytical expression may be unknown), but instead with respect to a reference measure µ ref with a known analytical expression such that L * = L 1 − L 2 + ξ for some measurable function ξ (with L = L 1 + L 2 ). This leads to an additionnal term − X ξ dν in the expression of the rate function (3.7), as can be readily checked by a straightforward adaptation of the proof. The operators L 1 and L 2 are the counterparts of the symmetric and antisymmetric parts of the generator in this decomposition. A typical situation to apply this strategy is provided by systems subject to a small external nonequilibrium forcing, the reference measure usually being chosen as the invariant measure at equilibrium, in the absence of external forcing. Atom chains in contact with an inhomogeneous heat bath were studied with this approach in [15], µ ref being the Gibbs measure associated with a fixed temperature profile.

Overdamped Langevin dynamics
In this section, we come back to the setting of Remark 2.12 by considering a diffusion process over X = R d subject to This corresponds to (2.1) with σ = √ 2, in which case the generator reads We will treat the reversible case where b = −∇V for a smooth potential V , and b = −∇V + F for a smooth function F such that ∇ · (F e −V ) = 0. In both cases, the invariant probability measure µ of the process is (assuming e −V ∈ L 1 (X ))  The potential V ∈ S has compact level sets, satisfies e −V ∈ L 1 (X ) and, for any θ ∈ (0, 1), it holds This assumption is satisfied for smooth potentials growing like |x| q for q > 1 at infinity, and it also implies that the invariant probability measure µ satisfies a Poincaré inequality [4]. Similar conditions are derived in [76] in the context of large deviations. The next proposition is a direct application of Propositions 2.9 and 2.10, Theorem 2.11 and Corollary 3.4. for any θ ∈ (0, 1) as a Lyapunov function in the sense of Assumption 2.7. For any fixed θ ∈ (0, 1), there exist C, c > 0 such that for any initial measure ν ∈ P W (X ), has compact level sets and, for any κ : X → [1, +∞) belonging to S , bounded or with compact level sets and such that δ Xt ds satisfies a large deviations principle in the τ κ -topology. The good rate function is defined by: for all ν ∈ P κ (X ) with dν = ρ dµ = e v dµ,
In this reversible example, we see that the rate function is only defined through its symmetric part (3.8), as shown in Theorem 3.3. We now consider a modification of this dynamics when a divergence-free drift is added. The next proposition is an extension of the examples proposed in [98] to the unbounded state space case.
with F a smooth vector field such that ∇ · (F e −V ) = 0 and where Ψ is defined in (4.4). Then, with the notation of Section 3.2 it holds L S = −∇V · ∇ + ∆ and L A = F · ∇. Moreover and (X t ) t 0 satisfies a LDP in the τ κ -topology for any function κ belonging to S , bounded or with compact level sets and such that The associated rate function I F reads: for any ν such that dν = e v dµ with v ∈ H 1 (ν) and F · ∇v ∈ H −1 (ν), Proposition 4.3 shows that, in this simple case, the equilibrium and nonequilibrium dynamics admit a LDP for the same class of functions but with different rate functions, the irreversible dynamics producing more entropy. It is therefore an extension of the case treated in [98,Theorem 2.2]. As for this result, Proposition 4.3 can be used to design algorithms with accelerated convergence to equilibrium, see also [66,67,37].
A setting in which Proposition 4.3 typically applies is when V (x) behaves as |x| q for some q > 1 outside an open set centered on the origin, and F = A∇V with A ∈ R d×d such that A T = −A (see [98]). The latter condition implies in particular that F · ∇V = 0 so (4.6) immediately holds.

Underdamped Langevin dynamics
We now apply our framework to the underdamped Langevin dynamics. A first nice feature of our results is that, compared to [115], we obtain a stronger result with similar assumptions -that is our LDP for the empirical measure holds for a finer topology than the one associated with bounded measurable functions. Note however that [115,Corollary 2.3] obtains results similar to ours for a contraction of the rate function. In addition, Theorem 3.3 and Corollary 3.4 allow to obtain precise results on the dependency of the rate function on the friction parameter γ.
We start by describing the Langevin equation in Section 4.2.1, before stating the large deviations principle in Section 4.2.2. Finally Section 4.2.3 provides asymptotics on the rate function depending on the friction.

Description of the dynamics
The dynamics is set on X where γ > 0 is a friction parameter, V : R d → R is a smooth potential, and (B t ) t 0 is a d-dimensional Brownian motion. We could also consider the easier case where the position space is bounded (q ∈ T d ) but leave this simple modification to the reader. The generator of the dynamics is The operator L γ leaves invariant the measure (4.10) The invariant measure (4.10) can be written µ(dq dp) = Z −1 e −H(q,p) dq dp, is the Hamiltonian of the system, and we assume that the normalization constant Z in (4.11) is finite (which is indeed the case when e −V ∈ L 1 (µ)). In (4.9), the Liouville operator L ham corresponding to the Hamiltonian part of the dynamics is antisymmetric in L 2 (µ). On the other hand, the fluctuation-dissipation part with generator L FD is symmetric in L 2 (µ), so that L A = L ham and L S = γL FD with the notation of Section 3.2.
Before turning to the LDP associated with the Langevin dynamics (4.8), we give some intuition on the behaviour of the process as γ varies. First, it is clear that in the small γ limit, (4.8) becomes the Hamiltonian dynamics To be more precise, we introduce the process (Q γ t , P γ t ) = (q t/γ , p t/γ ) where (q t , p t ) t 0 is solution to (4.8). It can then be shown that, in the limit γ → 0, the Hamiltonian H(Q γ t , P γ t ) converges to an effective diffusion on a graph [51,49,50,61]. In particular the relevant time scale in the underdamped limit is γ −1 t. On the other hand, in the limit γ → +∞ and under an appropriate time rescaling, we recover the overdamped dynamics studied in Section 4.1. To see this, we integrate the second line in (4.8) to obtain By introducing now Q γ t = q γt and P γ t = p γt , the latter equality becomes When γ → +∞, we observe that Q γ t converges formally towards the solution of (4.1), see [93,Section 6.5]. The relevant time scale in the overdamped limit is therefore γt.
These remarks will be of interest below when studying the rate function associated with the dynamics (4.8).

Large deviations
In order to obtain a large deviations principle for (4.8), let us make the following classical assumption on the growth of the potential [115,86,77,83].
Assumption 4.4. The potential V ∈ S has compact level sets, satisfies e −V ∈ L 1 (X ) and there exist c We can now find a Lyapunov function for (4.8) by following e.g. [115,105,86], as made precise in Appendix C. Recall that the Hamiltonian H is defined in (4.12).  is a Lyapunov function in the sense of Assumption 2.7. More precisely, for any γ > 0 and θ ∈ (0, 1), there exist ε > 0 and a, b, C > 0 such that The Lyapunov function (4.13) can be adapted in cases where V has singularities, see [64,85]. We can now deduce our main theorem on the Langevin dynamics since Assumptions 2.5 and 2.6 are readily satisfied, see for instance [86]. δ (qs,ps) ds satisfies a LDP in the τ κ -topology. Finally, for any ν ∈ P κ (X ) such that dν = e v dµ with v ∈ H 1 (ν) and L ham v ∈ H −1 (ν), the rate function reads where ψ is the unique solution in H 1 (ν) to the Poisson problem: The proof of Theorem 4.6 is a direct application of the results of Sections 2 and 3. For the expression of the rate function, we use (3.11) and (4.9) together with the fact that in this case, the matrix S defined in Section 2.1 reads While κ can be chosen independently of the friction γ, it is interesting to note the dependency of the rate function (4.14) with respect to this parameter. We discuss more precisely the scaling of the rate function with respect to γ in the next section, depending on the form of ν.

Low and large friction asymptotics of the rate function
The next corollary shows how the decomposition (4.14) allows to identify the most likely fluctuations in the overdamped and underdamped limits. By this we mean that, when γ → 0 or γ → +∞, most fluctuations become exponentially rare in γ or 1/γ, but some of them are associated with rate functions that vanish as γ → 0 and γ → +∞. The expression of these typical fluctuations is motivated by the discussion on the overdamped and underdamped limits in Section 4.2.1, from which the scalings of the rate function appear natural. Recall the definition of the marginal in positionμ in (4.10).

Corollary 4.7.
Suppose that the assumptions of Theorem 4.6 hold true.
The solution to this equation is ψ(q, p) = −p · ∇ q v(q) which indeed belongs to H 1 (ν) since L ham v ∈ H −1 (ν) (in fact we may add to ψ any function depending on q only but the solutions would be equivalent by definition of the space H 1 (ν) in Section 2.1). Plugging this solution into (4.14) leads to (4.16).
As a result, the solution ψ to (4.15) is ψ = 0 (again, up to a function of q only), from which (4.17) follows since v ∈ H 1 (ν). and the rate function is actually that of the limiting overdamped dynamics (4.5) up to a time rescaling in t → γt, which is coherent with the discussion on the overdamped limit in Section 4.2.1. On the other hand, in the Hamiltonian limit γ → 0, the dominant fluctuations are Hamiltonian, with the inverse time rescaling t → γ −1 t. This is consistent with the small temperature limit of Hamiltonian systems [49]. Although Corollary 4.7 provides interesting information, its structure is quite rigid. For instance, in the overdamped limit, we consider only position-dependent perturbations, which is not realistic. We now refine the asymptotics by considering the next order correction in γ for the perturbation in both regimes, which shows the robustness of the analysis. In the result stated below, we consider a family of probability measures ν γ indexed by γ > 0, and simply denote by ν the probability measure ν 0 . Corollary 4.8. Suppose that the assumptions of Theorem 4.6 hold true.
• Overdamped limit γ → +∞: Consider the measure ν γ ∈ P κ (X ) defined by ν whereν = e vμ . • Hamiltonian limit γ → 0: Consider ν γ = e vγ dµ with v γ (q, p) = g(H(q, p)) + γṽ(q, p), (4.20) We believe that it is also instructive to mention the relation between the rate function (4.14) and the asymptotic variance of the Langevin dynamics. Indeed, when considering small perturbations of the invariant measure, Corollary 4.8 shows that I γ ∼ min γ, 1 γ .  Since we expect the asymptotic variance to be the inverse of the rate function around the invariant measure [29,98], the scalings (4.21) and (4.22) are consistent. However, as (4.14) suggests, this scaling is no longer true for general fluctuations. We now present the proof of Corollary 4.8.
Proof. We first consider the overdamped limit γ → +∞. Sinceṽ is bounded we have, for any γ 1 and ψ ∈ H 1 (ν γ ), e infṽ γ |ψ| 2 Thus, the norms H 1 (ν γ ) and H 1 (ν) are equivalent for any fixed γ 1, and the functions of H 1 (ν γ ) and H 1 (ν) coincide (we repeatedly use this fact below, and we will use a similar argument when γ 1). A similar conclusion holds for the corresponding dual norms. This consequence of the boundedness ofṽ makes the analysis simpler.
Recall that we consider v γ (q, p) = v(q) + γ −1ṽ (q, p) in the overdamped limit. The symmetric part of the rate function is easily computed since v only depends on the position variable, namely where we used thatṽ belongs to H 1 (ν) and is bounded to expand the exponential. For the antisymmetric part, by (4.15), we have to consider the solution ψ γ ∈ H 1 (ν γ ) to Corollary 4.7 suggests that at leading order in γ it holds ψ γ = ψ + O(γ −1 ) where ψ(q, p) = p · ∇v(q). In order to make this idea more precise we compute In what follows, we denote by u = L hamṽ + ∇ q v · ∇ pṽ the right hand side of the above equation. Since ∇ q v · ∇ pṽ ∈ H −1 (ν γ ) and L hamṽ ∈ H −1 (ν γ ) by assumption, it holds u ∈ H −1 (ν γ ). Thus, multiplying by ψ γ − ψ and integrating with respect to ν γ we obtain Using the duality between H 1 (ν γ ) and H −1 (ν γ ) (see [ where C is some constant independent of γ. This shows that ψ γ = ψ + γ −1ψ γ with |ψ γ | H 1 (ν) C for a constant C > 0 and all γ 1. Plugging this estimate into (4.14) and using that ∇ p ψ = ∇ q v, we obtain the second term on the right hand side of (4.18). The arguments to prove the limit γ → 0 follow a similar path, so we only sketch the proof. First, the boundedness ofṽ allows again to compare the Sobolev norms associated with ν and ν γ for any γ 1 (by writting the counterpart of (4.23) in this regime). The first term on the right hand side of (4.19) is easily obtained as in Corollary 4.7 using that g(H) ∈ H 1 (ν) andṽ is bounded. Concerning the antisymmetric part, (4.15) now reads − ∆ p + (p − ∇ p v γ ) · ∇ p ψ γ = γL hamṽ , since L ham g(H(q, p)) = 0. Because of the scaling in γ on the right hand side of the above equation, the solution ψ γ can be expanded as ψ γ = γψ + O(γ 2 ) in H 1 (ν), whereψ is solution to −∆ pψ + 1 − g (H(q, p)) p · ∇ pψ = L hamṽ .
This reasoning can be made rigorous by a precise asymptotic analysis as above. Plugging this expansion into (4.14) provides the second term on the right hand side of (4.19).

Conclusion and perspectives
The goal of this paper was twofold. Our first aim was to provide, given a diffusion process, a precise class of unbounded functions for which a large deviations principle holds. This question is answered in Section 2 were we prove a LDP for the empirical measure in a topology associated with unbounded functions, in relation with a Witten-Lyapunov condition. In particular, a comparison with Cramér's condition for independent variables shows the effect of correlations on the stability of the SDE at hand. These results extend in several directions and refine results from previous works [115,76]. However, the necessity of our Lyapunov condition for a LDP to hold is still an open problem -whereas the necessity of a similar condition is known for the Sanov theorem [111]. Our second concern was to provide finer expressions of the rate function governing the LDP, in particular in order to study Langevin dynamics which appear for instance in molecular simulation. We answer to this question in two ways in Section 3. We first provide an alternative variational formula for the rate function in Section 3.1, which gives as a by-product a very general representation formula for the principal eigenvalue of second order differential operators, without symmetry assumption. This extends the important work of Donsker and Varadhan [33] in an unbounded setting. In Section 3.2, we show a general decomposition of the rate function into symmetric and antisymmetric parts of the dynamics based on the computations in [15]. Interestingly, the proof of the result relies on a Witten-like transform in the above mentioned variational representation of the rate function. These results allow us to describe precisely the rate function of an irreversible overdamped Langevin dynamics in Section 4.1, revisiting results from [98] in an unbounded setting. More interestingly we provide in Section 4.2, for Langevin dynamics, asymptotics of the rate function for the overdamped and the underdamped limits. We thus characterize the most likely fluctuations in both regimes with a natural physical interpretation. Considering piecewise deterministic processes [11,41,42] (which lack regularity) instead of the Langevin dynamics is also an interesting problem.
We would like to mention several interesting directions for future works. A first natural issue is to rephrase our results in the optimal control framework developed e.g. in [18,38,39]. This is particularly interesting for numerical purposes, since the optimal control representation can be learned on the fly with stochastic approximation methods [17,9,10,48]. We believe that such results can be obtained by harvesting the contraction principle provided by Corollary 2.13.
On a more theoretical ground, dual Sobolev norms have recently attracted attention in the optimal control community due to the so-called optimal matching problem, see for instance [80,81] and references therein. With these works in mind, the dual Sobolev norm in the antisymmetric part of the rate function described in Section 3.2 could be interpreted as an infinitesimal transport cost related to the antisymmetric part of the dynamics, which is an alluring interpretation of irreversibility. Note that the relations between optimal transport and large deviations theory have a fruitful history, see e.g. [58].
It has been known for some time in the physics literature that the empirical density of a diffusion may not contain enough information to describe its fluctuations in an irreversible regime. It is actually more relevant to consider the fluctuations of both the empirical density and current, a procedure sometimes called level 2.5 large deviations [24,7]. This framework can be used to provide a clear description of the rate function of irreversible dynamics. As shown in [7], such large deviations results can be derived by Krein-Rutman arguments like those used in the present paper. Therefore, we believe that our results can be extended to prove level 2.5 large deviations principles and characterize precisely the class of admissible currents.
Finally, it is important to understand the behaviour of observables which are not covered by our analysis. It has been recently shown [90] in the case of the Ornstein-Uhlenbeck process that observables growing too fast at infinity with respect to the confinement are characterized by a heavy tail behaviour. This leads to a level 1 large deviations principle at an anomalous speed with a localization in time of the fluctuation, and the Krein-Rutman strategy developped in the present paper does not apply. We therefore believe there are several interesting open questions in this direction.

Proofs
In all the proofs below, for conciseness, we write E x , P x , etc, with some abuse of notation, to indicate that the expectations we consider are taken with respect to all realizations of the dynamics (2.1) started from X 0 = x; and do not indicate explicitly the dependence of X t on x, in contrast to the convention used in Section 2.

Proof of the large deviations principle
As mentioned after Theorem 2.11, our proof relies on the Gärtner-Ellis theorem [28], for which we need several preliminary results. The key object is the functional Roughly speaking, the Gärtner-Ellis theorem (Theorem A.1 in Appendix A) states that if this functional is finite and Gateau-differentiable over B ∞ κ (X ) and (L t ) t 0 defined in (2.17) is exponentially tight for the τ κ -topology, then (L t ) t 0 satisfies a LDP in the dual space of B ∞ κ (X ). A reminder of this theorem and some elements of analysis are given in Appendix A.
However, studying the range of functions f for which the functional λ is finite and Gateau-differentiable is not an easy task. Formally, our strategy is to prove that r(f ), the element of the spectrum of the operator L + f with the largest modulus, is a real eigenvalue for any function f ∈ B ∞ κ (X ), and to show that it is actually equal to the cumulant function λ(f ) defined in (2.18). This amounts to showing the well-posedness and regularity of a family of spectral problems. For this, we use several ideas from [47], which shows that under Lyapunov and irreducibility conditions, the eigenvalue problem to which λ is associated is well defined. In order to avoid technical difficulties related to unbounded operators, we study the semigroup (P f t ) t 0 rather than its generator L + f , see Remark 6.11 below for more details. The seminal paper by Gärtner [55,Section 3] provides useful technical tools, as well as [44,115]. In all of this section, we suppose that Assumptions 2.5, 2.6 and 2.7 hold true and consider a function κ : X → [1, +∞) of class S as in Assumption 2.7, i.e. such that κ Ψ and either κ is bounded or has compact level sets and satisfies (2.14). We repeatedly use that κ − LW W in view of (2.13). We start with important properties of key martingales that appear regularly in the proofs of the required technical results. where C 1 > 0 and C 2 ∈ R are the constants from Assumption 2.7.

Proof. First, Itô formula gives
Since W is C 2 (X ) and σ is continuous, M t is a continuous local martingale [71]. Since it is non-negative, it is a supermartingale by Fatou's lemma, and the same conclusion holds for M t . On the other hand, (2.13) shows that which concludes the proof.
The use of the martingale M t is inspired by [115] where it is considered to control return times to compact sets. Here, it allows to define the Feynman-Kac semigroup associated with the dynamics (X t ) t 0 with weight function f ∈ B ∞ κ (X ). Lemma 6.2. Fix f ∈ B ∞ κ (X ). For any t 0, the Feynman-Kac operator is well defined. Moreover, (P f t ) t 0 is a semigroup of bounded operators on B ∞ W (X ). Finally, for any t > 0 and any a > 0, there exist c a,t 0 and a compact subset K a,t ⊂ X such that Proof. We first show that for any f ∈ B ∞ κ (X ), (P f t ) t 0 is a semigroup of bounded operators on B ∞ W (X ), before turning to the proof of (6.4). For a fixed f ∈ B ∞ κ (X ), since κ Ψ, there exists c > 0 such that, for any t > 0, Using Lemma 6.1, the supermartingale property leads to and hence As a result (P f t ) t 0 is a semigroup of bounded operators over B ∞ W (X ). We next prove (6.4) for a fixed f ∈ B ∞ κ (X ), which we assume non-zero without loss of generality. Note that Since Ψ has compact level sets and κ Ψ, for any a > 0 there exists a compact set K a ⊂ X and a constant b 0,a such that where α > 0 is a constant to be chosen later on. This implies that . Therefore (by some standard approximation arguments relying on stopping times, as discussed for instance in [97]) We can now bound the right hand side of the above equation with a technique similar to the one used in [47,Section 2.3]. Indeed, for any x ∈ X , Plugging this estimate into (6.6) and using that W 1 leads to where the last bound is due to Lemma 6.1.
Using this estimate to bound the right hand side of (6.5), we end up with Integrating with respect to time leads to Since W W , there exists a compact set K a,t ⊂ X such thatb a W e −(a+α)t W outside K a,t , so that we have We can now assume that we chose from the begining α > log(2)/t (recall that t is fixed). Setting c a,t =b a sup Ka,t W , this leads to which proves (6.4).
Lemma 6.2 proves crucial to obtain the compactness of the evolution operator P f t , as noted in [47] (a result inspired by [97,Theorem 8.9]). Note however that (P f t ) t 0 is a priori not a strongly continuous semigroup on B ∞ W (X ), see the discussion in [114, Proposition B13] and Remark 6.11 below for more details.
Another key ingredient is the regularization property of the evolution. The following bound on the Feynman-Kac semigroup depending on the weight function f is one element in this direction. Lemma 6.3. Suppose that Assumptions 2.5, 2.6 and 2.7 hold true, and fix f, g ∈ B ∞ κ (X ). Then, for any t > 0, any ϕ ∈ B ∞ W (X ) and any x ∈ X , it holds Proof. Using the inequality |e a − e b | |a − b| e |a|+|b| for a, b ∈ R, we have, for x ∈ X , which is the desired conclusion.
We can now use Lemma 6.3 to show an important regularization property of the Feynman-Kac semigroup. Lemma 6.4. For any f ∈ B ∞ κ (X ), ϕ ∈ B ∞ W (X ), any t > 0 and any compact K ⊂ X , the function P f t (ϕ1 K ) is continuous.
Let us insist on the fact that the statement of Lemma 6.4 is a consequence of Hörmander's theorem [43,Theorem 4.1] when f has polynomial growth and is smooth. However, the result is more difficult to obtain when f is irregular. Note for instance that we cannot rely on the continuity property proved in Section 6.2.3 below since the space of smooth functions with compact support is not dense in B ∞ W (X ). The idea of the proof is to use the local martingales introduced in Lemma 6.1 to show that the regularization property of Hörmander's theorem is preserved when f is irregular but does not grow too fast.
Proof. We use Assumption 2.5 to revisit [55, pages 34-35] in an unbounded setting and with a hypoelliptic flavour. First, we note that for f ∈ C ∞ c (X ), the result is a direct application of Assumption 2.5 combined with Hörmander's theorem, since the evolution operator P f t can be shown to be an integral operator with a transition probability which admits a density p f (t, x, y) belonging to C ∞ ((0, +∞) × X × X ) (see for instance [69] for f = 0, which can easily be extended to f ∈ C ∞ c (X ) with the hypoelliptic result of [43,Theorem 4.1]). In particular, P f t (ϕ1 K ) is continuous.
We now use an approximation argument inspired by [55,Section 3] for a generic κ for any n ∈ N, and such that f n → f almost everywhere as n → +∞ (such a sequence exists by Lusin's theorem, see [102,Chapter 2]). By modifying the proof of Lemma 6.3, and since f n B ∞ κ f B ∞ κ , we have for any ϕ ∈ B ∞ W (X ), n ∈ N and x ∈ X , Our goal is now to show that P fn t (ϕ1 K ) converges uniformly over any compact K to P f t (ϕ1 K ), by proving that the right hand side of (6.8) goes uniformly to 0 over K .
This will conclude the proof since a uniform limit of continuous functions is continuous. We introduce to this end the events Ψ(X s ) ds m , (6.9) and fix a compact set K ⊂ X . The right hand side of (6.8) can then be split into two for which we show convergence to 0, uniformly for x ∈ K , starting with (A). Since κ −LW /W , there exists c > 0 such that a e a , and W 1, so that By definition of M t in (6.1) we have The Cauchy-Schwarz inequality then shows that √ C 1 e C2t/2 W (x). Next, by Tchebychev's inequality and since W 1, As a result, we obtain, for x ∈ K , Therefore, for any ε > 0, we can choose m 0 such that (A) ε.
Using the definition (6.9) we have where B R is the ball of center 0 and radius R > 0 to be chosen. Let us first bound (B ), which retains only the parts of the trajectories performing excursions out of B R . Using κ Ψ, for ε > 0 and m 0 as fixed above, there exist R > 0, C R > 0 such that We fix R > 0 and C R > 0 such that the above inequality holds true. Using again g n 2 f B ∞ κ κ, we are led to where the last line follows from the definition (6.9) of E m . Therefore, once m is fixed, there exists R > 0 such that for any n 1 and x ∈ K , it holds (B ) ε. It remains to control (B ) in order to obtain the uniform convergence to zero of (6.8) over K as n → +∞. In fact, where (P s ) s 0 is the evolution semigroup defined in (2.7). Since (1 B R g n ) n 1 is a sequence of bounded functions converging almost everywhere to zero and the transition kernel P s has a smooth density for s > 0, it follows that (P s (g n 1 B R )) n 1 goes uniformly to zero over compact sets for any s > 0 as n → +∞, see e.g. [55,97]. Moreover, it can be shown which goes to zero when η → 0, uniformly in x ∈ K and n ∈ N. Therefore, for ε > 0, R > 0 and m 0 fixed as above, and choosing there exists n ∈ N such that for all n n and x ∈ K , Then, for any n n , x ∈ K , it holds (B ) ε.
Let us summarize the various approximations: for any ε > 0, we first fix m 0 so that (A) ε. Then, we choose R > 0 large enough so that (B ) ε. Finally, we take η small enough and n large enough in (6.10) so that (B ) ε for n n . As a result, for any ε > 0 there is n 0 such that for n n and x ∈ K , it holds (A) + (B) 3ε. In conclusion, the right hand side of (6.8) goes to zero uniformly as n → +∞ over any compact set K . Therefore P fn t (ϕ1 K ) is continuous and converges uniformly over K to P f t (ϕ1 K ), which is therefore continuous over K . Since the compact K ⊂ X is arbitrary, P f t (ϕ1 K ) is continuous over X , which concludes the proof.
Before presenting the main result concerning the spectral properties of the operator P f t and its consequences on the definition of the cumulant function λ(f ), we need the following "irreducibility" lemma, which relies on Assumption 2.6. Lemma 6.5. For any time t > 0, x ∈ X and any Borel set A ⊂ X with non-empty interior, it holds that Proof. Take x ∈ X and y ∈Å (which is possible since A has non-empty interior). By Assumption 2.6, there exists a C 1 -path (φ s ) s∈[0,t] solving (2.11) such that φ 0 = x and φ t = y. We can then use the proof of the Stroock-Varadhan support theorem, see [97, Moreover, since φ t = y ∈Å and upon reducing ε > 0 we may assume that B(y, ε) ⊂ A, where B(y, ε) denotes the ball of center y and radius ε > 0. Recalling that f ∈ B ∞ κ (X ), we then obtain where we denote by S φ,ε the ε-tube around the path (φ s ) s∈[0,t] , namely Since S φ,ε is a bounded set and κ is continuous over X , it holds sup S φ,ε κ < +∞.
At this stage, we follow the spectral analysis path developed in [47]. However, we have to prove that the assumptions used in [47] are fulfilled in our context. In particular the irreducibility is granted by Lemma 6.5. Lemma 6.6. For any f ∈ B ∞ κ (X ) and any t > 0, the operator P f t considered over B ∞ W (X ) has a real largest eigenvalue e tr(f ) with eigenspace of dimension one, and an associated continuous eigenvector h f ∈ B ∞ W (X ) such that h f (x) > 0 for any x ∈ X . Moreover, h f is the only positive eigenvector of P f t (up to multiplication by a positive constant).
Finally, r(f ) is equal to the cumulant function defined in (2.18): 14) The result of Lemma 6.6 is twofold: it entails the well-posedness of the principal eigenproblem associated with P f t for any f ∈ B ∞ κ (X ) and t > 0, and then identifies this principal eigenvalue with the free energy function (2.18). Another consequence of this lemma is that h f is in fact the principal eigenvector of L + f , see Lemma 6.10 below for a more precise statement.
Proof. We follow the general strategy of [47] and split the proof into several steps.
Step 1: Compactness of the evolution operator We first show that, for given t > 0 and f ∈ B ∞ κ (X ), the operator P f t defined in Lemma 6.2 is compact when considered on B ∞ W (X ). For any compact set K ⊂ X we have the decomposition We first consider the compact sets K a from (6.4) for a > 0 and time t/3 (omitting the dependence on t in the notation since the time is fixed here) and note that 1 K c a P f t/3 converges to 0 in operator norm as a → +∞. Indeed, for any ϕ ∈ B ∞ W (X ), (6.4) leads to → 0 when a → +∞.
We next show that P f t/3 1 K P f t/3 1 K is compact over B ∞ W (X ) for any compact set K ⊂ X . Consider a sequence (ϕ k ) k∈N bounded in B ∞ W (X ). Following the first step of the proof of [47, Lemma 2] and using our strong Feller result, Lemma 6.4, we see that P f t/3 1 K is a strong Feller operator, so P f t/3 1 K P f t/3 1 K is ultra-Feller (see [47,Lemma 6]). This means that the operator P f t/3 1 K P f t/3 1 K is continuous in total variation norm, so that the family (P f t/3 1 K P f t/3 1 K ϕ k ) k∈N is uniformly equicontinuous. We used here that since ϕ ∈ B ∞ W (X ) and W is continuous, it holds 1 K ϕ ∈ B ∞ (X ). The sequence (P f t/3 1 K P f t/3 1 K ϕ k ) k∈N therefore converges in B ∞ (X ) up to extraction by the Ascoli theorem [102,Theorem 11.28], and in B ∞ W (X ) since W 1. Therefore, the operator P f t/3 1 K P f t/3 1 K sends a bounded sequence into a convergent one (up to extraction), so it is compact in B ∞ W (X ) [95]. The decomposition (6.15) and the bound (6.16) then show that P f t is the limit in operator norm of the compact operators P f t/3 1 Ka P f t/3 1 Ka P f t/3 as a → +∞, so it is compact in B ∞ W (X ) (see e.g. [95, Theorem VI.12]).
Step 2: Existence of the principal eigenvalue We can now use the Krein-Rutman theorem on the (closed) total cone K W = {ϕ ∈ B ∞ W | ϕ 0} (see [27,47] for definitions). For t > 0, it is clear that P f t leaves this cone invariant. We next show that P f t has a non-zero spectral radius . To this end, fix a compact set K with non-empty interior. We have shown in Lemma 6.5 Since P f t 1 K is continuous by Lemma 6.4, this shows that α K := min x∈K P f t 1 K (x) > 0. (6.17) Therefore, for any x ∈ K, Iterating the procedure for any n 1 we get As a result, since 1 inf K W < +∞, we obtain in the large n limit the following lower bound for the spectral radius: Step 3: Properties of h f For the remainder of the proof, we write for simplicity r := r(f ) and h := h f (the function f being fixed). We show here that h is continuous and positive. For any compact K ⊂ X and t > 0, (6.18) leads to Using Lemma 6.2 we obtain that, for any a > 0, there exists a compact set K a such that so that h is continuous as the uniform limit of continuous functions (since P f t (1 Ka h) is continuous by Lemma 6.4). Finally, since h 0 and h is not identically equal to 0, there exists x 0 ∈ X such that h(x 0 ) > 0. Moreover h is continuous, so there is ε > 0 for which h > 0 on B(x 0 , ε). By (6.18) it holds, for any x ∈ X , ε) )(x) > 0 for any x ∈ X by Lemma 6.5, so the previous lower bound shows that h(x) > 0 for all x ∈ X .
Step 4: Properties of eigenspaces and eigenfunctions We now show that the eigenspace associated with h is of dimension one, and that any other eigenvector vanishes somewhere in X . For this, we introduce the so called h-transform [76,101,23,47].
A key element here is the fact that h(x) > 0 for all x ∈ X , which allows to define the following Markov operator, for an arbitrary time t > 0: where h and h −1 refer here to the multiplication operators by the functions h and h −1 respectively. We now prove that Q h is ergodic by first noting that Q h admits W h −1 as a Lyapunov function (using (6.4) and the normalization h B ∞ W = 1 which implies that W h −1 1). Using Assumption 2.7, we can also show that W h −1 has compact level sets, see [47, Appendix E] for details.
Moreover, we can prove that Q h satisfies a minorization condition on any compact set. For this, we first use that P f Then, for any t > 0 and α 0, the operator P −ακ t has a smooth transition density by hypoellipticity (because κ and the coefficients of L belong to the class S , see [43,Theorem 4.1]). We next rely on [86, Lemma 2.3] to guarantee the existence of a minorization measure for P −ακ t . We can indeed use this result since Lemma 6.5 ensures that any open set can be reached with positive probability. Therefore, for any K ⊂ X compact with non-empty interior, there is a K > 0 and a probability measure η K such that, for any measurable set A ⊂ X , Since h is continuous, this implies that, for any measurable ϕ 0, where both the minimum and maximum above are finite and non-zero (recall that |K| > 0 is the Lebesgue measure of K). This shows that Q h satisfies a minorization condition [60] over any compact set.
Therefore, the Markovian dynamics with kernel Q h admits a unique invariant probability measure µ h , with respect to which it is ergodic in B ∞ W h −1 (X ). By this we mean that (in view of [60,Theorem 1.2]) there existᾱ > 0 and C > 0 such that for any ϕ ∈ B ∞ W h −1 (X ), 20) and it holds µ h (W/h) < +∞.
We can now use this ergodic behaviour to show that the eigenspace associated with r has dimension one and that P f t cannot have another positive eigenvector with norm 1 in B ∞ W (X ). Indeed, if there were another eigenvectorh ∈ B ∞ W (X ) associated with r, then the fact thath/h ∈ B ∞ W h −1 (X ) together with (6.20) ensure that This shows that h andh would be proportional, and answers the claim that the eigenspace associated with r has dimension 1. Assume now that there is another real eigenvaluẽ r < r with real eigenvectorh ∈ B ∞ W (X ) such thath(x) > 0 for all x ∈ X . Noting again thath/h ∈ B ∞ W h −1 (X ) and sinceh > 0, (6.20) shows that, for any x ∈ X , However it now holds, for any x ∈ X , where we used that h > 0 andr < r. Combining the two equations above shows that which contradicts (6.21). As a result, there cannot be another eigenvalue with a positive eigenvector.
Step 5: The principal eigenvalue is the cumulant function Proving (6.14) now follows by a simple rewriting. For x ∈ X and t 0 > 0 fixed, it holds, for any n ∈ N * , We have chosen to work with an arbitrary time t 0 > 0 for convenience, so a priori the above limit depends on t 0 . To conclude the proof, it remains to show that the limit actually does not depend on the specific choice of t 0 and that This extension from t 0 > 0 fixed to any t > 0 follows by standard arguments not reproduced here (see e.g. [64,47]).
An important ingredient for the lower bound of the LDP is the Gateau-differentiability of the cumulant functional, which we prove below.
is convex and Gateau-differentiable.
Proof. The convexity of λ is a standard consequence of Hölder's inequality. Concerning Gateau-differentiability, we follow the strategy of [55, Section 3] for a compact state space, relying on results of Kato [72]. For this, we interpret the cumulant function (6.22) as the largest eigenvalue of the tilted generator, r(f ), as shown in Lemma 6.6. More precisely, for f, g ∈ B ∞ κ (X ) and α ∈ R, λ(f + αg) is associated with the largest eigenvalue of the operator P f +αg so that derivability in α can be shown through the differentiability of the spectrum of a bounded operator. We thus show that the operator-valued function α → P f +αg t is differentiable in operator norm.
To this end, we fix C > 0, and prove that for |α| C, there exists K ∈ R + such that Note that the operator Q f,g t is bounded on B ∞ W (X ) by the same martingale estimate used to prove Lemma 6.3. In order to prove (6.23), we use the identity to obtain, for any ϕ ∈ B ∞ W (X ) and x ∈ X , where we used the inequality z 2 /2 e z for z 0 in the last line. By manipulations similar to the one used to prove Lemma 6.3, we can bound the latter expectation by e ct W (x) for some constant c > 0, which leads to (6.23) with K = e ct . Equation (6.23) shows that α → P f +αg t is differentiable in operator norm, and that Thus, the principal eigenvalue λ(f + αg), which is always isolated, is differentiable, see [72,Chapter II,Theorem 5.4] and [72,Chapter IV,Theorem 3.5]. This concludes the proof of Gateau-differentiability.
Remark 6.8. By pursuing further the Taylor expansion (6.23) in the proof of Lemma 6.7, we can actually show that, for any f, g ∈ B ∞ κ (X ), the function is analytic (this analyticity was already proven in [76] using a different argument that can be simplified with our tools). This relies on the simple inequality a n /n! e a for any a 0, together with the series expansion of the exponential and martingale estimates as in the proof of Lemma 6.7. Indeed, our proof, based on martingales, shows that for any t > 0, is analytic. Moreover, it is finite on R and converges pointwise to a finite valued function as t → +∞, as shown in Lemma 6.6. Therefore, the convergence holds uniformly on any compact as t → +∞ (see [45,Theorem VI.3.3]). Since a locally uniform limit of analytic functions is analytic (see [102,Theorem 10.28]), the function α → λ(f + αg) is analytic.
The last step before proving the large deviations principle itself is an exponential tightness result, see [28,Section 1.2]. At this stage, the finiteness of λ(f ) together with the Gateau-differentiability of f ∈ B ∞ κ (X ) → λ(f ) already provides the upper bound over compact sets and the lower bound in (2.20). In order to extend the upper bound to all closed sets, we prove exponential tightness in the τ κ -topology, see Appendix A for some definitions (this exponential tightness is not explicitely stated in [76]). Lemma 6.9. The family of probability measures t → P x (L t ∈ · ) over P(X ) is exponentially tight in the τ κ -topology.
For N > 0, the sets Γ N are subsets of P κ (X ) since κ Ψ. We show that they are actually precompact in the τ κ -topology.
Let us first show that Γ N is precompact in the usual weak topology for any N > 0. Consider for this the compact sets K β = {x ∈ X | Ψ(x) β} ⊂ X for β > 0 (recall that Ψ has compact level sets). Then, for any ν ∈ Γ N , we have N. This shows that for any β > 0 and any ν ∈ Γ N , hence (upon choosing β sufficiently large) for any N > 0 the family of measures Γ N is tight, so it is precompact for the weak topology by the Prohorov theorem [12]. Now, if κ is bounded, Γ N is tight for the τ κ -topology and the theorem is shown, so we may assume that κ has compact level sets (see Assumption 2.7). For proving compactness in our weighted topology, we show that κ is uniformly integrable over Γ N in order to use [110,Theorem 7.12]. Since κ Ψ, the set is compact for any n 1. Moreover, since we assume κ to be continuous with compact level sets, for any n 1 there exists m n n such that Ψ κ n ⊂ {κ m n }, with m n → +∞ when n → +∞. Therefore, for any ν ∈ Γ N and n 1, Taking the supremum over ν ∈ Γ N in the above equation and recalling that m n → +∞ when n → +∞ we obtain lim m→+∞ sup ν∈Γ N {κ>m} κ dν = 0. (6.24) We can then conclude that Γ N is precompact for the τ κ -topology. Consider indeed a sequence (ν n ) n∈N ⊂ Γ N . By Prohorov's theorem, (ν n ) n∈N has a subsequence weakly converging towards a measure ν, i.e. ν n (ϕ) → ν(ϕ) for any ϕ ∈ C b (X ). Then, by [110,Theorem 7.12], (6.24) ensures that ν ∈ P κ (X ) and for any f ∈ B ∞ κ (X ), ν n (f ) → ν(f ) as n → +∞. In other words, Γ N is precompact for the τ κ -topology.
We can now prove the τ κ -exponential tightness of the empirical distribution (L t ) t 0 in P(X ). Indeed, for any N, t > 0, Tchebytchev's inequality leads to Renormalizing at log scale leads to The right hand side of the above quantity may look infinite since Ψ grows faster than κ. However, using again the martingale M t defined in Lemma 6.1 we obtain, for any t > 0, Thus it holds As a result, (6.25) becomes Since Γ N is precompact in the τ κ -topology for any N > 0, and N can be chosen arbitrarily large, this proves the exponential tightness of the family of empirical distributions in the τ κ -topology.
We are now in position to prove Theorem 2.11.
Proof of Theorem 2.11. The previous lemmas make it possible to apply the Gärtner-Ellis theorem (recalled in Appendix A). The function Λ in Theorem A.1 of Appendix A is the cumulant function is the set of measures over X integrating κ (see [102,76] and [30, Lemma 3.3.8] for details). We have proved that λ is well defined, Gateau-differentiable, and that the family of measures t → π t ( · ) := P x (L t ∈ · ) , is exponentially tight in the τ κ -topology. Therefore, (π t ) t 0 satisfies a large deviations principle in the τ κ -topology with good rate function given by Note first that I(ν) 0. We next observe that I(ν) = +∞ if ν is not normalized to 1 (take f to be constant in the supremum (6.26)), so we may consider I over P(X ).
Moreover, choosing f = κ in (6.26) and noting that λ(κ) < +∞ by Lemma 6.6, we get I(ν) = +∞ if ν / ∈ P κ (X ). If ν is not absolutely continuous with respect to µ, there exists a measurable set A ⊂ X such that µ(A) = 0 and ν(A) > 0. Since µ has a positive density with respect to the Lebesgue measure, this means that A has zero Lebesgue measure. Consider then f a = a1 A ∈ B ∞ κ (X ) for a ∈ R. Since A has zero Lebegue measure and (X t ) t 0 has a smooth density for all t > 0 (as a consequence of Assumption 2.5) it holds, for all t > 0, E x f a (X t ) = aP x X t ∈ A = 0.
Therefore, the process satisfies E x [Z t ] = 0 for all t > 0. Since Z t 0, it holds Z t = 0 almost surely, for any t > 0.
As a consequence we obtain This shows that λ(f a ) = 0, so that from (6.26) we obtain I(ν) aν(A), with ν(A) > 0. By letting a → +∞ we are led to I(ν) = +∞. Finally, we show that I(ν) = 0 if and only if ν = µ, and that (L tn ) n 0 converges almost surely to µ in the τ κ -topology for any sequence (t n ) n 0 such that t n / log(n) → +∞ (see [28,Appendix B] for the definition of this almost-sure convergence). Define I = ν ∈ P(X ) I(ν) = inf P(X )

I .
Since I has compact level sets (because it is a good rate function, see Theorem A.1), I is a non-empty closed subset of P(X ) for the τ κ -topology. Moreover, in order for the LDP upper bound to make sense, it holds inf P(X ) I = 0. If I δ denotes an open neighborhood of I , the lower semicontinuity of I implies that inf I c δ I > 0.
Therefore, by the large deviations upper bound we have, for any t 0, I , (6.27) for some constant C > 0. Consider now a sequence (t n ) n 1 such that t n / log(n) → +∞ as n → +∞. In particular, there exists n ∈ N such that t n inf I c δ I 2 log(n) for n n , which implies n 0 P x L tn / ∈ I δ n + C n n 1 n 2 < +∞.
This shows that (L tn ) n 0 converges almost surely to I in the τ κ -topology, by the Borel-Cantelli lemma (and by definition of convergence in a topological space [28,Appendix B]).
However, we know by Proposition 2.10 that the only possible limit for (L tn ) n 0 is µ, hence I = {µ} and (L tn ) n 0 almost surely converges to µ. We finally show for completeness that (L t ) t 0 almost surely spends a finite Lebesgue time outside I δ . For this we introduce the random subset of R + of times t 0 for which L t does not belong to I δ , namely we have, by Fubini's theorem, for any t > 0, By using (6.27) and the dominated convergence theorem, we obtain As a result, |T | < +∞ almost surely. This means that, for any neighborhood I δ of I in the τ κ -topology, the empirical measure (L t ) t 0 almost surely spends a finite Lebesgue measure time outside I δ , and this concludes the proof.

Proofs of Section 3
We start by providing a preliminary technical result in Section 6.2.1, which shows that the eigenvectors h f considered in Lemma 6.6 belong to the generalized domain D + (L) defined in (3.4). We then turn to the proofs of Proposition 3.1 (see Section 6.2.2) and Corollary 3.2 (see Section 6.2.3).

A preliminary technical result
The function h f ∈ B ∞ W (X ) defined in Lemma 6.6 belongs to D + (L) and satisfies

−
Lh (6.28) Proof. We already know by Lemma 6.6 that h f ∈ C 0 (X ) and h f > 0. It therefore suffices to show that h f ∈ D(L) and to obtain the representation ( Therefore, where the last equality comes from Fubini's theorem and Note that we can indeed apply Fubini's theorem since there exist K, c > 0 such that and (since we are integrating nonnegative functions) where the last expression is finite by manipulations similar to the ones performed in the proof of Lemma 6.1.
We can next use (6.29) at initial time s ∈ [0, t] together with a conditioning argument to write This finally shows that (6.30) becomes as the product of functions in B ∞ W (X ) and B ∞ κ (X )) and (P t ) t 0 is a semigroup of bounded operators on B ∞ κW (X ) by (2.14), it holds Remark 6.11. It is actually possible to make more general statements about the domains of the generators of (P f t ) t 0 for f ∈ B ∞ κ (X ), similarly to [114,115]. For this, one considers the (closed) subset of functions ϕ ∈ B ∞ W (X ) for which P f t ϕ → ϕ in B ∞ W (X ) when t → 0, see [96,Exercice 1.16]. We can then define a generator L f with domain D(L f ) for this semigroup. By manipulations similar to those of Lemma 6.10, we can show that D(L f ) ⊂ D(L) when we define D(L) as in (3.2). In this case we obtain the representation L f = L − f which could be expected. This procedure allows to define a common domain for the operators L f with f ∈ B ∞ κ (X ).
Here we bypass the approach sketched above because, for the proof of Proposition 3.1 given below, we can restrict our attention to the eigenvectors h f for f ∈ B ∞ κ (X ). In this case, it is clear that

Proof of Proposition 3.1
For the proof, which is partly inspired by [30, Lemma 4.1.36], we denote by I F the rate function given by the Fenchel transform in (2.19) and I V for the Varadhan functional on the right hand side of (3.3). We repeatedly use the results of Lemmas 6.6 and 6.10.
We first show that I V (ν) = +∞ if ν is not absolutely continuous with respect to µ or does not belong to P κ (X ). Assume first that ν µ does not hold: there exists a set A ⊂ X such that ν(A) > 0 and µ(A) = 0. For any a ∈ R we introduce f a = a1 A and denote by h a the eigenvector associated with the principal eigenvalue e tλ(fa) of P fa t for some t > 0. Recall that h a ∈ D + (L) by Lemma 6.10. As shown in the proof of Theorem 2.11, it holds λ(f a ) = 0, so that (6.28) can be rewritten as − Lh a h a = a1 A . Therefore, By letting a → +∞, we conclude that I V (ν) = +∞ when ν is not absolutely continuous with respect to µ. Next, if ν / ∈ P κ (X ), since κ 1 it holds ν(κ) = +∞. We may then choose f = κ ∈ B ∞ κ (X ). By Lemma 6.10, the principal eigenvector h κ belongs to D + (L) with λ(κ) < +∞, so we have i.e. I V (ν) = +∞ if ν / ∈ P κ (X ). This shows that I F (ν) = I V (ν) when ν is not absolutely continuous with respect to µ or ν / ∈ P κ (X ). We next show that I F = I V when ν µ and ν ∈ P κ (X ), which we assume until the end of the proof.
Let us first show that I F I V . For this, we consider u ∈ D + (L) and introduce Because of the definition (3.4) of D + (L), we know that f u ∈ B ∞ κ (X ). We can then write, since ν ∈ P κ (X ), I F (ν) ν(f u ) − λ(f u ). (6.31) We now show that λ(f u ) 0. By computations similar to the ones in the proof of Lemma 6.1, and using the continuity of u ∈ D + (L) (see also [115,Corollary 2.2]), we obtain by the local martingale property that 0 P fu t u u. (6.32) Therefore, recalling the definition (6.19) of the h-transformed evolution operator with a time t > 0 fixed (with r(f u ) = λ(f u ) in view of Lemma 6.6), and denoting by h u > 0 the eigenvector associated with f u in Lemma 6.6, (6.32) becomes where the limit n → +∞ follows from (6.20) (noting that u/h u ∈ B ∞ W h −1 u (X )). The latter limit is positive since u/h u is continuous and positive, which implies that λ(f u ) 0.
Therefore, (6.31) leads to Since u ∈ D + (L) is arbitrary, taking the supremum shows that I F (ν) I V (ν) for any ν ∈ P κ (X ) with ν µ. We finally turn to the inequality I F I V . Consider for any arbitrary f ∈ B ∞ κ (X ) the eigenvector h f ∈ B ∞ W (X ) defined in Lemma 6.6. By Lemma 6.10, this eigenvector belongs to D + (L) and satisfies Lh f = (λ(f ) − f )h f . Thus, since ν ∈ P κ (X ), we have Given that, in the above equation, f is an arbitrary function belonging to B ∞ κ (X ), taking the supremum leads to This finally shows that I F (ν) = I V (ν) for all ν ∈ P κ (X ) with ν µ and concludes the proof.

Proof of Corollary 3.2
Since I is the Fenchel transform of λ, the result follows if we can show that the application λ defined on B ∞ κ (X ) is stable by bi-Fenchel conjugacy. The convexity and finiteness of λ show that a (necessary and) sufficient condition for λ to be bi-Fenchel stable is for the functional f → λ(f ) to be lower-semicontinuous (see [8,Theorem 2.22]). We show below that it is actually continuous: for any sequence (f n ) n 0 in B ∞ κ (X ) such that f n − f B ∞ κ → 0 for some f ∈ B ∞ κ (X ), it holds λ(f n ) → λ(f ) as n → +∞. We shall use for this a stability result from [22].
Consider a sequence (f n ) n 0 converging to f in B ∞ κ (X ). Using Lemma 6.3, for any ϕ ∈ B ∞ W (X ), t > 0, x ∈ X and n ∈ N, it holds (using again the inequality a e a for a 0) (6.33)

A Tools for large deviations principles
In this section, we remind some large deviations concepts (using the abuse of notation discussed at the beginning of Section 6 for denoting expectations and probabilities).
For a Polish space Y, we denote by Y its topological dual (the set of continuous linear functionals over Y). We first recall the definition of an exponentially tight family of measures. A family of measures (π t ) t 0 over a Polish space Y is called exponentially tight if for any N < +∞, there exists a (pre)compact set Γ N ⊂ Y such that lim t→+∞ 1 t log π t Γ c N < −N.
In words, exponential tightness means that the measures (π t ) t 0 concentrate exponentially fast over compact sets. This property is used in large deviations to turn an upper bound over compact sets into an upper bound over all closed sets.
We now define the cumulant function. Consider a family of measures (π t ) t 0 over a Polish space Y. The logarithmic moment generating function is defined as in [28,Section 4.5]: for any t 0, f ∈ Y and a random variable Z t distributed according to π t , Λ t (f ) = log E e f,Zt Y ,Y = log Y e f,y Y ,Y π t (dy). On the other hand, f belongs to a space of functions, typically Y = M(X ) = B ∞ (X ) when the τ -topology is considered. In practice we may restrict ourselves to probability measures because the rate function is infinite otherwise. We see that considering L t ∈ P κ (X ) leads to choosing f ∈ B ∞ κ (X ). In any case the duality relation (A.1) reads in this case Λ t (f ) = log so thatΛ t (f ) coincides with the argument of the limit in (2.18). With these preliminaries, we are in position to state the key theorem for the results in this work, which goes back to [55,45] and is presented for instance in [28,Corollary 4.6.14]. We recall that a rate function is said to be good if its level sets are compact for the considered topology.
Theorem A.1 (Projective limit -Gärtner-Ellis). Let (π t ) t 0 be an exponentially tight family of probability measures on a Polish space Y. Assume that Λ(·) = lim t→+∞Λ t (·) is finite valued over Y and Gateau-differentiable. Then (π t ) t 0 satisfies a large deviations principle over Y with good rate function Λ * , the Legendre-Fenchel transform of Λ.

B Proof of Proposition 2.9
The proposition is a consequence of the equality Since |σ T ∇V | has compact level sets and Ψ ∼ |σ T ∇V | 2 by (2.15), Ψ has compact level sets. Since V has compact level sets, for ε < θ/2 it holds W W and W 2 C 1 W for some constant C 1 > 0. Moreover, outside a compact set, the function is bounded above and below since the numerator and denominator are both equivalent to |σ T ∇V | 2 , so the second condition in (2.13) holds. Finally, Since Ψ ∼ |σ∇V | 2 , we may choose ε small enough so as to obtain for some constant C 2 ∈ R. This proves the third item of (2.13).
As a result, Assumption 4.4 leads to The claim follows for θ ∈ (0, 1) by choosing η < 2c V /γ and ε > 0 sufficiently small. version of the manuscript as well as the first preprint, and providing useful comments; as well as the referees, whose suggestions helped us making more precise various aspects of this work. The authors are grateful to Ofer Zeitouni for an interesting discussion about scalings in large deviations theory, as well as to Jianfeng Lu for pointing out the work [15]. We also thank Julien Reygner for general discussions on large deviations. The PhD of Grégoire Ferré was supported by the Labex Bézout ANR-10-LABX-58-01. The work of Gabriel Stoltz was funded in part by the Agence Nationale de la Recherche, under grant ANR-14-CE23-0012 (COSMOS), and by the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement number 614492. We also benefited from the scientific environment of the Laboratoire International Associé between the Centre National de la Recherche Scientifique and the University of Illinois at Urbana-Champaign.