The Principle of Large Deviations for Martingale Additive Functionals of Recurrent Markov Processes 1

We give a principle of large deviations for a generalized version of the strong central limit theorem. This generalized version deals with martingale additive functionals of a recurrent Markov process.


Introduction
This paper presents a natural extension of the (ASCLT) due to Brosamler [1], [2] and Schatte [27].In the last few years the Almost Sure Central Limit Theorem (ASCLT) has emerged as an area of probability theory in which an intensive research activity has taken place.In this context we should in particular mention the work of Lacey & Philipp [17], Berkes & Dehling [3], Csörgö & Horváth [6], Rodzik & Rychlik [26] and Touati [29].
The aim of this paper is to establish the Large Deviations Principle (LDP) for a generalized version of the (ASCLT) for Martingale Additive Functionals (MAF's).This result can be regarded as an extension of the (ASCLT) for (MAF's), proved by the second named author (see Maâouia [21]) as well as an extension of the (LDP) for the (ASCLT) for i.i.d.random variables, proved by the first named author (see Heck [14]).For a slightly weaker version of the (LDP) for the (ASCLT) for i.i.d.random variables see also March and Seppäläinen [22].

Notation, terminology and data
X = {Ω, F, (P x ) x∈E , F = (F k ) k∈N , (X k ) k∈N } denotes the canonical version of a homogeneous Markov process indexed by N (non negative integers) with values in a measurable space (E , E); F being its natural filtration and P x its law starting from x.
We remind that an additive functional (AF) A = (A k ) k∈N of X is an F-adapted process, vanishing at 0, such that: (1-7) for any initial law ν.Here (θ k ) k∈N are the standard translation operators on (Ω , F).
A martingale additive functional (MAF) , M = (M k ) k∈N of X is an (AF) which is also an (F, P ν ) martingale, for any initial law ν or equivalently Next, we will use the following notation and terminology.
For a ∈ ]0, 1] we introduce the function Using these functions ϑ a we call a measure . Furthermore, for two probability measures η, ρ on a measurable space, we denote by H(η|ρ) the relative entropy of η relative to ρ, i.e.

Now we define the rate function
where W is the Wiener measure on C 0 ([0,1]) and | [a,1] denotes the restriction operator.That H is well defined has already been shown in Heck [14,15].

ASCLT for MAF of a recurrent Markov process
The second named author proved the following general version of the (ASCLT) (see Maâouia [21]).
) be a positive recurrent Markov chain.Then every (MAF) M of X satisfying the assumption: satisfies a functional ASCLT (FASCLT) under P x for all initial states x.More precisely, P xalmost-surely for every x, we have the following properties: is defined by:

Main results
Our results are stated for (MAF), M = (M k ) k∈N of the Markov process X which satisfies the assumption (2-1) below. (2-1) For every (MAF) M satisfying the assumptions (2-1) we consider the processes (Ψ n ) n and the measures (W n ) n , defined as in Theorem A.
The results we present could easily be generalized to the continuous time parameter case.However for the proof of the continuous parameter case we would need rather technical oscillation estimates very similar to those used in Heck [15] in order to reduce the continuous case to the discrete time case.These lengthy technical estimates would increase the size of the paper considerably without presenting any new ideas.Therefore we decided to restrict ourselves to the discrete time parameter case.

The identification of an autoregressive process
In this section we shall apply our result Theorem 2.1 to autoregressive models.The latter models have a great interest in mathematical finance (for example: risk management, derivative securities like options, stochastic volatility,..., see e.g.Hull [16], section 19.6 ) .
On a probability space (Ω, F, P) we consider a sequence β = (β n ) n∈N * of i.i.d.real random variables with mean 0 and variance σ 2 > 0; called white noise.To this sequence β and a given random variable X 0 we associate the first order autoregressive process (AR1): where α and θ are unknown real parameters.These parameters α and β are to be estimated.
In the following we shall assume that the random variables β satisfies the moment condition For the (AR1), defined by (3-1), the least squares estimator of θ: Under the hypothesis E β 2 1 < ∞, θ n has the following asymptotic properties (see [10] for more details).
(3-5) θ n n∈N * is a strongly consistent estimator of the arbitrary unknown real parameter θ.
In the stable case (|θ| < 1), θ n satisfies: Under the hypothesis (3-3) and in the stable case, the following result hold under P x for all starting state x : and "=⇒" denotes weak convergence.
The property (3.8) is a consequence of the FASCLT for the martingales obtained by Chaâbane [5].It is also consequence of Theorem A above, if we assume that the noise β satisfy (3.3) and the distribution of β 1 has a non vanishing density part.In fact, under these hypotheses, we prove the existence of a small set for the AR(1) Markov chain X (see Lemma 4.8).
The next Proposition gives the LDP associated with the property (3-8).
For the (AR1) model, defined by (3-2), we can estimate α and θ by: These estimators satisfy and they have the following asymptotic properties: (3-13) θ n n≥1 and ( α n ) n≥1 are strongly consistent estimators of the arbitrary unknown parameters θ and α.
In the stable case (|θ| < 1), θ n n≥1 and ( α n ) n≥1 satisfy: Finally we define random measures satisfies the (LDP) with constants (ln n) n≥1 and rate function H. ♦ Remark 4. 2 We shall remark that in the special case (ξ k ) k∈ N * i.i.d. and τ k ≡ 1, i.e.N n ≡ n, the above proposition states the (LDP) for the (ASCLT) for i.i.d.random variables.This result is exactly the contents of Theorem 1.2 in Heck [14].
In order to prove Proposition 4.1 we shall recall for the readers convenience some simple facts: satisfies the (LDP) with constants (ln n) n∈N * and the same rate function H. ♦ Lemma 4.3 is a minor modification of Lemma 2.7 in Heck [14].Details shall be omitted.a) For all β ≥ 1 and all p , q > 1 with 1 p + 1 q = 1 ( βq > 1) there exists C 1 > 0 such that Let p , q > 1 such that 1 p + 1 q = 1.By Hölder's inequality and Chebychev's inequality In order to prove Part a) we shall use the following inequalities and r = 1 in eq.(4-3), In order to prove part b) we shall take r = 1 2 in eq.(4-3), proceed as in a) and use Burkholder-Davis-Gundy inequality to estimate We shall remark that one can in particular choose for (M n ) n∈N the partial sums of independent random variables with expectation 0.
b) For all α ∈ ] 1 2 , 1[ and γ > 0 there exists C 5 > 0 such that for all sufficiently large n ∈ N * , In order to prove part a) we observe that by Lemma 4.4 and Chebychev's inequality for For the proof of part b) we note that again Lemma 4.4 and Chebychev's inequality imply for sufficiently large n where we used that for n ∈ N * sufficiently large and k Hence we conclude the proof of part b) by choosing β sufficiently large.
Proof of Proposition 4.1 For the special case τ ≡ 1 Proposition 4.1 has already been proved in Heck [14]  Here W X n denotes the random measure W n constructed from the sequence X .Similar we shall use the notations S * X n and Ψ X n , to indicate that the functions are constructed from the sequence X .By Skorokhod's representation theorem there exits a probability space Ω, F, P , a random variable B: Ω → C 0 ([0, ∞[) and P -a.s.finite stopping times 0 (See e.g.Chapter 1, Theorem 117 in Freedman [11] and Brosamler [1], p. 570 regarding the moments for the stopping times.) Now let Ω = Ω × R N * , F the corresponding product-σ-field and Here η i denotes the conditional distribution , then (4.5) still hold for B and R i replaced by B and R i .
Hence by scaling properties of Brownian motion , if we let and finally y n ≡ 1 then obviously it remains to prove (4-4) for this special choice for X and Y.
By Lemma 4.3 part b) the proof of (4-4) is complete if we show that for all ε > 0 . Hence the definition of Ψ n via inter- Hence the proof of (4-6) is complete if we show that for all ε > 0 Using Lemma 4.4 part b) we conclude that for sufficiently large n ∈ N * and k ∈ 1, ..., [n 3/4 ] (4-11) For the following we assume that n ∈ N * is sufficiently large and k ∈ [n 1/4 ], ..., n .Observing that .
Obviously Lemma 4.5 part b) implies that Keeping in mind that B is a Brownian motion, the symmetry properties of Brownian motion and .

Proof of Theorem 2.1
The proof of Theorem 2.1 is divided into three steps.First we shall consider the case where the small set is also an atom for the Markov chain, second we shall prove the theorem under the additional assumption that (1-2) already holds for the transition function Π it self instead of pR p .And finally in the third and last step we shall prove the general case.
First of all we shall prove Lemma 4-6 Let X = ( Ω, F, (P x ) x∈E , F = (F k ) k∈N , (X k ) k∈N )be a Riemannian recurrent Markov chain of order k, for each k ∈ N with invariant measure µ.We have: To prove this lemma we shall use Theorem 2 of [24].Indeed in order to apply Theorem 2 we have to verify that under our assumptions X is a positive Harris recurrent chain with an irreducible kernel Π, a maximal irreducible measure µ and convergence parameter 1 (see e.g.[24]).By Theorem 2.1 of [25] we know that there exist an integer m 0 ≥ 1, a positive function s (called small function) satisfying µ(s) ∈]0, ∞[ and a bounded positive measure η on (Ω, F) (called small measure) such that the following minoration condition holds Then by the Theorem 2 of [25] we can see that letting h = g and using the fact that µ(g) < ∞ and µ is Π-invariant, we have and then sup So the first part of the Lemma is proved.
In order to prove the second part we simply apply part a) to the function

Case I: Atomic chains
For the following we shall assume that X not only has a small set A but also that A is an atom for the Markov chain X.
Let T A denote the first entry time into A, i.e.T A = inf {k > 0, X k ∈ A} and let T 0 ≡ 0, and Since the chain is positive recurrent, it is well known that the invariant distribution is given by Further, since A is an atom, the Markov property implies that Ξ = (ξ k , τ k ) k∈N * is a sequence of independent random variables and (ξ k , τ k ) k≥2 are identically distributed w.r.t.P x for all x ∈ E.
Keeping in mind that the chain is Riemannian recurrent of order k for all k ∈ N * the Markov property shows that for x ∈ E and β > 0 By Proposition 8.3.23 in Duflo [10] we conclude that x ∈ E. This together with the identical distribution for k ≥ 2 implies Using Lemma 4.4 part a) we conclude for µ − a.a.x ∈ E, This together with the identical distribution of the We have that for µ − a.a.x ∈ E Moreover by the Martingale property of M n , the Markov property of X and (4-18) |M t − M Tn |, and as in the prior section N n = inf {k ≥ 0, T k+1 > n}, then it is easy to see that By Doob's inequality and (4-21) for µ − a.a.x ∈ E and β > 1 So by Lemma 4.5 part a) This together with Chebychev's inequality and (4-25) implies This concludes the proof of Theorem2.1 for the special case of atoms.

Case II: Chains with minoration property
We shall proof in this section Theorem 2.1 under the additional assumption, that there exist a set C ∈ E, b∈]0, 1[ and a probability measure ν ∈ M 1 (E) with ν(C) = 1 such that We shall remark that in particular C is a small set (see e.g.Duflo [10] p. 286).Using this small set we construct (as in [21] for example) a new chain called split chain, i.e. a canonical version of a homogeneous Markov process with values in E = E×{0, 1} and transition probability b) If we denote the invariant distribution (which obviously exists by part a) of this remark) by µ, then µ is related to the invariant distribution µ of the original Markov chain through For details on the above construction and the remark we refer to Duflo [10], section 8.2.4.
By Remark 4.3 b) we conclude Since Theorem 2.1 has already been proved for chains with atoms, we conclude by (4-31) and satisfies the (LDP) with constants (ln n) n>0 and rate function H w.r.t.
Here W M n denote the empirical measure defined as in (1-16) with (M k ) k>0 replaced by M k k>0 .It is not hard to see that W M n is the lift of W n .We therefore conclude by Remark 4.7 part c) and hence (W n ) n>0 satisfies the (LDP) with constants (ln n) n>0 and rate function H w.r.t.P x for µ − a.a.x ∈E.

Case III: General case
In this section will shall finish the proof of Theorem 2.1.
By enlarging the space if necessary, we may assume without loss of generality that there exits a sequence of i.i.d.random variables (ρ k ) k>0 with P x (ρ 1 = 0) = p 0 and P x (ρ for k ∈ N * and x ∈ E which in addition are independent of the Markov chain.Then Hence, since C is a small set for X (i.e.(1-2) holds) C is a small set for X which satisfies in addition .Further µ is also the invariant distribution for X .

Moreover, if we let M
We shall show now that X and M satisfy the assumption of Theorem 2.1.By Lemma 4.4 part a) and the fact that For β = 1 this is exactly part 4) of Proposition 8.2.13 in Duflo [10].The general case is proved by a straight forward modification of the proof for β = 1 given in Duflo.Details shall be omitted.
We therefore obtain from the previous part of the proof of Theorem 2. are equivalent w.r.t. the (LDP).The proof of the equivalence however is a straight forward modification of the proof of (4-4).For the readers convenience we shall sketch the proof below.
This completes the proof of Theorem 2.1.

Proof of Proposition 3.1 and Proposition 3.2
We shall prove only Proposition 3.1 because the proof of Proposition 3.2 is a straight forward modification of the proof of Proposition 3.1 and contains no new ideas.
We shall denote by X x = (X x k ) k∈N the (AR1) given through (3-1) with X 0 ≡ x.We observe first that if X = ( Ω, F, ( P x ) x∈E , F = (F k ) k∈N , (X k ) k∈N ) is a standard Markov chain on R with transition probability Π(x, •) = P(θx + β 1 ∈ •), then: (4-40) The distribution of X x under P is equal to that of X under P x .
It is well known that in the stable case the Markov chain has an invariant measure µ, which is equal to the distribution of We shall prove next Proof.For the proof of part a) we may assume without loss of generality that δ ∈ 2N * .Using Hölder's inequality and the identical distribution of the random variables (β n ) n∈IN * and letting β 0 = x we obtain In order to prove part b) it obviously suffices to show that there exist m ∈ N * , q ∈ ]0, 1[ and C 31 > 0 such that for all n ∈ N * (4-41) Using (3-1) we obtain inductively for k, n ∈ N * with k < n For m ∈ N * with 4|θ| m (|a| + |b| + 1) < ε we conclude for x ∈ C ε We dropped the parameter x in α n (0), since the distribution of Z 0,i m , i ∈ N * under P x is independent of x.In the following we fix x ∈ R. Analogously to (4-42) we obtain We observe next that Z (i−1)m,i m , i ∈ N * are i.i.d. and that the distribution of Z 0,m converges (for m → ∞) weakly to the invariant measure µ.Hence by the Portmanteau Lemma lim sup For the following fix r ∈ ]µ (R\C) , 1[ and m ∈ N * such that P Indeed, using (4-43) and the independence of the Z (i−1) m,i m , i ∈ N * , we conclude We used also the fact that (by the choice of Using (4-44) an easy induction argument shows that Observing that by part a) and Chebychev's inequality α 1 (i) ≤ C 32 (2|θ| m ) i for some C 32 > 0, we conclude Next we shall show that A simple application of Borel-Cantelli Lemma shows that there exists a n 0 ∈ N * such that for all n ≥ n 0 and x ∈ R and hence the distribution of X 2n w.r.t.P x has a non vanishing density part with a continuous density, say f x 2n , such that Moreover since the invariant measure µ is equal to the distribution of Z it is easy to see that by (4-45) and (4-47) µ has a non vanishing density part say g with inf ]a By (4-47) we see that [a hence we conclude the proof by applying the Burkholder-Davis-Gundy inequality to the martin-  The proof of the equivalence is again very similar to that of (4-42), so that it suffices to give only a sketch of the proof.Fix x = 0.A simple calculation show that Using again the same arguments as in the proof of (4-4) it remains to verify (4-50) lim sup n→∞ P max k=1,...,n Observing that U x n ≥ (X x 0 ) 2 = x 2 > 0 we obtain by Chebychev's inequality, Hölder's inequality and Lemma 4.10 and Lemma 4.4 analogously to (4-9) that for γ > 0, n sufficiently large and k ≤ n 1/8 (4-51)

Remarks
We shall conclude this paper with some remarks.a) Theorem 2.1 implies (LDP) for further a.s.limit theorems, like the (ASCLT) on the real line, a.s.versions of arcsine law (Compare Corollary 2.10 and Examples 2.11 in Heck [14]).
b) The (LDP) for (ASCLT) implies in particular easily the (ASCLT) itself.Therefore, for random variables satisfying the assumption of Theorem 2.1, Theorem 2.1 can be regarded as an generalization of Theorem A (see Corollary 2.12 in Heck [14]).

variance σ 2 satisfying the hypothesis ( 3 - 3 ) 4 Proofs 4 . 1
for all δ > 1, such that the distribution of β 1 has a non vanishing density part and an unknown real parameters θ ∈] − 1, 1[ and α.Then for all X 0 ≡ x = 0 the following result holds for the least squares estimator: (3-18) W θ n n and (W α n ) n satisfy the (LDP) with constants (ln n) n and rate function H w.r.t.P x .♦ An ASCLT for i.i.d.random variables The proof of Theorem 2.1 is essentially based on a reduction to a version of the (ASCLT) for i.i.d.random variables.In order to formulate this version we introduce some notations.For random variables (ξ n , τ n ) n∈N * as in Proposition 4.1 below we denote by S n and T n the corresponding partial sums, i.e. S n = n k=1 ξ k and T n = n k=1 τ k and let for t ≥ 0 N t = inf{k ≥ 0 : T k+1 > t}.Further let S * n = S Nn .As in the introduction we define random functions Ψ n ∈ C 0 ([0, 1]) by

Lemma 4 . 4
Let (M n ) n∈N be random variables with M 0 ≡ 0 and let η be a random variable with values in N.

Lemma 4 . 9
There exists a small set C such that the Markov chain is Riemannian of any order k.♦ Proof.Since the distribution of β i has a non vanishing density part, the distribution of β i +θβ i−1 has a non vanishing density part with a continuous density say h.Hence the exists a < b such that inf ]a,b[ h > 0.

− 1 n 8 .of Proposition 3 . 1 . 2 M = E µ M 2 1 = σ 4 1 −θ 2 .
and the part a) of Lemma 4.Proof We observe first that Mn = n k=1 X k−1 β k , n ∈ N * is a (MFA)with σ Hence, by Lemma 4.9 we can apply Theorem 2.1 to M and the Markov chain X. Letting M x n = n k=1 X x k−1 β k we conclude by Theorem 2.1 and (4-40) that Ψ M x n n∈N * satisfies the (LDP) with constants (ln(n)) n∈N * and rate function H w.r.t.P x for µ − a.a.x ∈ R.

Furthermore by ( 4 -
42) M x n − M y n = (x − y) n k=1 θ k−1 β k .Since with M x n and M y n also M x n − M y n is a martingale, Chebychev's inequality and Doob's maximal inequality imply easily for γ ∈ N * P max k=1,...,n|M x n − M y n | > ε √ n ≤ C 34 ε −2γ n −γ |x − y| 2γ E x n k=1 θ k−1 β k 2γ ≤ C 35 n −γ .This in turn implies that W M x n n∈N * and W M y n n∈N * are equivalent w.r.t. the (LDP) (see the proof of (4-4) and in particular (4-6) and (4-8)).Therefore it remains to prove that for all initial states x = 0 W θ n n∈N * and W M x n n∈N * are equivalent w.r.t.(LDP).