IDENTIFICATION OF THE RATE FUNCTION FOR LARGE DEVIATIONS OF AN IRREDUCIBLE MARKOV CHAIN

For an irreducible Markov chain (Xn)n≥0 we identify the rate function governing the large deviation estimation of empirical mean 1 n n−1 k=0 f (Xk) by means of the Donsker-Varadhan’s entropy. That allows us to obtain the lower bound of large deviations for the empirical measure 1 n n−1 k=0 δXk in full generality.


Introduction
Large deviations of Markov processes were opened by Donsker-Varadhan [8] (1975-1983) under strong regularity conditions.Generalizations of their fundamental works are very numerous and various.In this paper we are interested in general irreducible Markov processes.In this direction the first general results were obtained by Ney and Nummelin [14] (1987).de Acosta [5] (1988) derived a universal lower bound for bounded additive functionals valued in a separable Banach space, and the boundedness condition was finally removed by de Acosta-Ney [6] (1998) : that is a definite work for the lower bound.For related works see also Jain [10] (1990) and some recent works [17,18,20] by the second named author, and Kontoyiannis-Meyn [11] and Meyn [12].However the rate function, a basic object describing the exact exponential rate in a large deviation estimation, is expressed by means of the convergence parameter, a quantity proper to the theory of irreducible Markov processes.It is not related to the Donsker-Varadhan entropy, unlike the work of Deuschel-Stroock [7] in the more classical strong mixing case.The main purpose of this note is to identify the rate function appearing in the large deviation results in [14,5,6] etc by means of the Donsker-Varadhan entropy.The key is to prove the lower semi-continuity of the Cramer functional associated with the convergence parameter of the Feynman-Kac semigroup.This result is based mainly on the increasing continuity of the convergence parameter due to de Acosta [5].This paper is organized as follows.In the next section we present necessary backgrounds and calculate the Legendre transform of the Cramer functional.We prove in section 3 the lower semi-continuity of the Cramer functional, which gives us the desired identification of the rate function of de Acosta-Ney [6] and as corollary we obtain the lower bound of large deviations for the occupation measures.Finally in section 4 we present some remarks about the upper bound of Ney-Nummelin [14] and provide a very simple counter-example to the upper bound of large deviations for unbounded additive functionals.

Preliminaries and Legendre transform of the Cramer functional 2.1 Preliminaries
We recall some known facts about irreducible Markov chains from Nummelin [15] and Meyn-Tweedie [13].
Let K(x, d y) be a nonnegative kernel on a measurable space (E, ) where is countably generated.It is said irreducible if there is some non-zero nonnegative σ-finite measure µ such that Such measure µ is said to be a maximal irreducible measure of K, if µK µ.All maximal irreducible measures of K are equivalent.Below µ is some fixed maximal irreducible measure of K.A couple (s, ν) where s ≥ 0 with µ(s) := E s(x)dµ(x) > 0 and ν a probability measure on E is said K-small, if there is some m 0 ≥ 1 and constant c > 0 such that A real measurable function s ≥ 0 with µ(s) > 0 is said to be K-small, if there is some probability measure such that (s, ν) is K-small.A subset A ∈ with µ(A) > 0 is said K-small if 1 A does.By Nummelin [15, Theorem 2.1], any irreducible kernel K has always such a small couple (s, ν).A non-empty set F ∈ is said K-closed, if K(x, F c ) = 0 for all x ∈ F .For every K-closed F , µ(F c ) = 0 by the irreducibility of K.According to Nummelin [15, Definition 3.2, Proposition 3.4 and 4.7], the convergence parameter of K, say R(K), is given by : for every K-small couple (s, ν), Lemma 2.1.For every K-small couple (s, ν), Here and hereafter esssup x∈E is taken always w.r.t µ.
Its proof is quite easy, so omitted.

Cramer functional
Let (X n ) n≥0 be a Markov chain valued in E, defined on (Ω, , ( n ), ( x ) x∈E ).Throughout this paper we assume always that its transition kernel P(x, d y) is irreducible, and µ is a fixed maximal irreducible measure of P.
For every V ∈ r (the space of all real -measurable functions on E), consider the kernel P V (x, d y) := e V (x) P(x, d y).We have the following Feynman-Kac formula, It is obvious that P V is irreducible with the maximal irreducible measure µ.Define now our Cramer functional Since R(P V ) ∈ [0, +∞) by [15, Theorem 3.2], Λ(V ) > −∞.By (5) and Lemma 2.1, we have for every P V -small couple (s, ν).From the above expression we see by Hölder's inequality that Λ : r → (−∞, +∞] is convex.
We can now recall the universal lower bound of large deviation in de Acosta [5] and de Acosta-Ney [6].
Theorem 2.2.( [5,6]) Let f : E → be a measurable function with values in a separable Banach space ( , • ).Then for every open subset G of and every initial measure ν, where Λ f ( y) := Λ(〈 y, f 〉), y ∈ (the topological dual space of ), and Here 〈 y, z〉 denotes the duality bilinear relation between and .
The main objective is to identify Λ * f by means of the Donsker-Varadhan entropy.

Legendre transform of the Cramer functional on a weighted space
At first we recall the Donsker-Varadhan's entropy J : 1 (E) → [0, +∞], where 1 (E) is the space of all probability measures on (E, ).For any ν ∈ 1 (E), where = {u ∈ b : inf x∈E u(x) > 0} (b being the space of all real and bounded measurable functions on (E, )).Consider the modified Donsker-Varadhan's entropy ( [17]) Let us now fix some reference measurable function Φ : E → such that Φ ≥ 1 everywhere.Introduce the weighted functions space The main result of this section is Then we have for every ν ∈ M b,Φ (E), We begin with Lemma 2.4.(de Acosta [5, lemma 6.1] ) Define for ν ∈ 1 (E), Proof of Theorem 2.3.
Taking the supremum over all 1 ≤ u ∈ b and recalling that Varadhan's formula), we get the desired result.
by de Acosta's Lemma 2.4 we have As λ > Λ(V ) is arbitrary we have so proved It yields the desired claim since Step 2, we obtain (13).
As an application, we have the following result.

Thus we have
Letting n → +∞, we get the result.

Identification of the rate function
That is not completely exact, but not far.The main result of this paper is Theorem 3.1.For any measurable function f : E → where is a separable Banach space, the rate function Λ * f given in Theorem 2.2 (due to de Acosta and Ney [6]) is exactly the lower semi-continuous regularization J f µ of z → J f µ (z).
It is based on the following lower semi-continuity of Λ which is of independent interest.
As Λ(V ) is the rate of the exponential growth of the Feynmann-Kac semigroup (P V ) n , its identification is a fundamental subject in heat theory : formula of type ( 16) is the counterpart of the famous Rayleigh principle (for the maximal eigenvalue of + V where is the generator of a symmetric Markov semigroup) and it follows from the LDP of Donsker-Varadhan (when this last holds) by Varadhan's Laplace integral lemma.
Proof of Theorem 3.1 assuming Proposition 3.2.Let Φ(x) := f (x) +1.For every y ∈ (the dual space of ), V (x) := 〈 y, f (x)〉 ∈ b Φ .Then by Proposition 3.2, Let us prove that J f µ is convex on .We have only to prove that Given any > 0, there exist ν 1 and ν 2 such that which completes the proof of the convexity of J f µ , for > 0 is arbitrary.Consequently by the famous Fenchel's theorem in convex analysis, Proposition 3.2 is mainly based on the following continuity of the convergence parameter due to de Acosta.
Lemma 3.3.(de Acosta [5, Theorem 2.1]) For each j ≥ 1 let E j ∈ and assume E j ↑ E. Let j = {A ∈ : A ⊂ E j }.Let K be an irreducible kernel on E j ∈ j and for j ≥ 1, let K j be an irreducilbe kernel on E j ∈ j .Assume that for all x ∈ E, A ∈ , We also require the following general result.Lemma 3.4.Let µ be a σ-finite measure on (E, ) and Λ : b Φ → (−∞, +∞] be a convex function such that Then Λ is lower semi-continuous (l.s.c. in short) w.r.t. the weak topology σ(b Φ , b,Φ (E)) iff for every non-decreasing sequence In other words we may assume without loss of generality that Φ = 1.Because of condition (i), Λ is well defined on L ∞ (µ) (which is the dual of L 1 (µ)), and the l.s.c. of . By taking an equivalent measure if necessary we may assume that µ is a probability measure.For any L ∈ , since [Λ ≤ L] is convex, by the Krein-Smulyan theorem(see [4] is closed for every R > 0, where B(R) := {V ∈ L ∞ (µ); V ∞ ≤ R}.Since B(R) equipped with the weak * -topology σ(L ∞ , L 1 ) is metrizable (see [2, Chap.IV, p.111]), for the desired l.s.c., it remains to prove that if Taking a subsequence if necessary, we may assume that l : As µ is a probability measure, V n → V in the weak topology of L 2 (µ).By the Mazur theorem ([22, Chap.V, §1, Theorem 2]), there exists a sequence By the assumed increasing continuity of Λ we get finally Proof of Proposition 3.2.By Theorem 2.3, the right hand side (r.h.s.) of ( 16) is exactly Λ * * , the double Legendre transform of Λ basing on the duality between b Φ and b,Φ (E).By Fenchel's theorem, ( 16) is equivalent to the l.s.c. of Λ w.r.t.σ(b Φ , b,Φ (E)), which holds true by Lemma 3.4 and de Acosta's Lemma 3.3.
We end this section by two corollaries.We begin with the rate function governing the lower bound of large deviation of the occupation measure This corollary was proved in de Acosta [5,Theorem 6.3] under an extra assumption guaranteing J µ = J.This last assumption is also crucial in Jain's work [10], but difficult to verify out of the classical absolutely continuous framework of Donsker-Varadhan [8].
Proof.Let β 0 be an arbitrary element of G but fixed.By the definition of the τ-topology, there is a bounded and measurable function f : E → d (for some d ≥ 1) and δ > 0 such that Here | • | is the Euclidian norm on d .Hence by Theorem 2.2, lim inf where z 0 = β 0 ( f ).By Theorem 3.1, Therefore we get the l.h.s. of (17 where ( 17) follows for β 0 ∈ G is arbitrary.
Example 3.7.Let (B t ) t≥0 be a Brownian Motion on a connected complete and stochastic complete Riemannian manifold M with sectional curvature less than −K (K > 0) and let P(x, d y) = x (B 1 ∈ d y).P is irreducible with a maximal irreducible measure given by the Riemann volume measure µ = d x.It is well known that P L 2 (M , d x) < 1 ( [16]).It is obvious (from (3)) that 1 R(P) ≤ P L 2 (M , d x) , and the converse inequality holds by [20,Lemma 5.3] and the symmetry of P on 4 Some remarks on the upper bound of large deviations for all compact subsets F of d , where (a) either C is P-small set if f is bounded; (b) or for all λ > 0, there are some constant c(λ) > 0 and ν ∈ (19) holds for all closed subsets F of d .
Indeed for every V = 〈 f (x), y〉 where y ∈ d , if f is bounded then every P-small set C is also P V -small; if f is unbounded and C satisfies (20), then C is again P V -small.Thus by Lemma 2.1, where the Theorem 4.1 follows by Gärtner-Ellis theorem (see [9]).Baxter et al. (1991) [1] found for the first time a Doeblin recurrent Markov chain which does not verify the level-2 LDP (but it satisfies the level-1 LDP for L n ( f ) for bounded f by Theorems 2.2 and 4.1 since the whole space E is P-small in such case).Furthermore Bryc and Smolenski (1993) [3] constructed an exponentially recurrent Markov chain for which even the level-1 LDP of L t ( f ) for some bounded and measurable f fails.We now present a counter-example for which (19) does not hold for P-small C but unbounded f .Assume that (ξ k ) k≥0 is a sequence of independent and identically distributed nonnegative random variables and (ξ 0 > t) = e −h(t) , t ≥ 0. Then {X k = (ξ k , ξ k+1 )} k≥0 is a Markov chain valued in E = ( + ) 2 , which is Doeblin recurrent, i.e., E is P-small.Let us show that (19) may be wrong with C = E for some unbounded f .Indeed let f (X k ) := ξ k+1 − ξ k , ∀ k ≥ 1, then 1 n S n ( f ) := 1 n n k=1 f (X k ) = 1 n (ξ n − ξ 0 ).We have for all r > 0, e −h((n+1)r) (1 − e −h(r) ) = (ξ n+1 > (n + 1)r, ξ 1 < r) Consequently for all r > 0, lim sup

Let L n := 1 n n− 1 k=0
δ X k be the empirical measure.Assuming the existence of invariant probability measure µ, then for every ν µ, ν (L n ∈ •) satisfies the weak * LDP on ( 1 (E), τ) with rate function J µ (by [17, Theorem B.1, Theorem B.5 and Proposition B.9]), where τ is the topology σ( b (E), b ) restricted to 1 (E).Then for a measurable function f : E → , inspired by the contraction principle, the rate function governing the LDP of 1