MODERATE DEVIATIONS FOR STABLE MARKOV CHAINS AND REGRESSION MODELS

: We prove moderate deviations principles for 1) unbounded additive functionals of the form S n = P nj =1 g ( X ( p ) j − 1 ), where ( X n ) n 2 N is a stable R d -valued functional autoregressive model of order p with white noise, and g is an R q -valued Lipschitz function of order ( r; s ); 2) the error of the least squares estimator (LSE) of the matrix (cid:18) in an R d -valued regression model X n = (cid:18) t (cid:30) n − 1 + (cid:15) n , where ( (cid:15) n ) is a \generalized Gaussian" noise. We apply these results to study the error of the LSE for a stable R d -valued linear autoregressive model of order p .


Introduction
• This work is composed of two parts. In the first one we obtain Chernov-type upper bounds and a moderate deviations principle (shortened to MDP) for unbounded continuous additive functionals of a specific class of Markov chains appearing mostly in statistics, namely for stable autoregressive models of order p ≥ 1 with white noise (results are introduced and stated in Section 2, the proofs are performed in Section 4).
Let (X j ) j>−p be such an R d -valued stable model, and denote by µ the unique invariant probability measure of the model (X j−p+1 , ..., X j−1 , X j ) j≥0 ; let g : R dp → R q be a continuous function (with a growth rate related to moments assumptions on the noise), and let Proposition 1 states the exponential convergence in probability of S n /n to 0. In Theorem 2 we achieve, for any given speed (a n ) ↑ ∞ such that a n = o(n), a large deviations principle (LDP) for the sequence ((S [nt] / √ na n ) 0≤t≤1 ) n∈N , in the space of càd-làg functions from [0, 1] to R q . In Section 2.1 we give references of works in which such MDP are achieved for a large class of Markovian models, but always with the assumption that the function g is bounded.
• In the second part of this paper we provide a criteria of obtaining a MDP for the error of the least squares estimator (LSE) in multidimensional linear regression models (Theorem 3). In particular we deal with stable linear autoregressive models (Theorem 4), with noises which are not necessarily white but are assumed to be "generalized Gaussian" (white Gaussian distributed noises and bounded noises satisfy this condition). Moreover (for the autoregressive case), an exponential rate of convergence in probability of the empirical covariance is obtained.
For such prediction errors, the only LDP the author is aware of concerns stable, unidimensional, linear, and first order autoregressive models with Gaussian white noise: it is obtained in [BerGamRou96], which uses results of [BrySmo93] and [BryDem95]. The proofs in [BerGamRou96] rely strongly on spectral theoretic arguments, and problems had to be solved concerning the steep property of the moment generating function of interest. It does not look very clear whether or not such a LDP can be extended to vector valued models of order greater than 1, and therefore the moderate deviations results obtained in Section 3 can be viewed as a reasonable compromise in this framework.
Notations: * In this paper, the letter C denotes a generic positive constant whose exact value is unimportant. A vector x ∈ R d will often be assimilated with the column matrix (x 1 , . . . , x d ) t , and a vector u ∈ R dp to (u t 1 , . . . , u t p ) t (where each u i is in R d ). If (x n ) is a sequence in R d , and p ∈ N * , then x (p) n denotes the element (x t n , x t n−1 , . . . , x t n−p+1 ) t of R dp .
noise is said to be white when ( n ) is an independent identically distributed (i.i.d.) sequence such that n is independent of F n−1 . Unlike those in Section 3, noises in Section 2 are always assumed to be white. Generalized Gaussian noises will be considered and defined in Section 3. * In the sequel, a speed is a sequence v = (v n ) ↑ ∞ of positive numbers. If E is a Polish space provided with its Borel sigma field E, a rate function is a lower semi-continuous function We say that (Z n ) satisfies an upper-LDP if only (2) is satisfied. As we will often deal with only one speed, n, we set the following notation, which is used in both Sections 2 and 3: for any sequence (A n ) ⊂ E, ld < 0 means lim sup 1 n log P[A n ] < 0. Remark 1: Frequently, assumptions in this paper will be of this form: where (v n ) and (a n ) are speeds such that a n = o(v n ), and (A n ) is a sequence of events. It may be worth noting beforehand that such a relation is obviously implied by the following stronger one lim sup 1 v n log P [ A n ] < 0.

Statement of the results for stable Markov chains 2.1 Introduction a) Empirical measures of Markov chains
Before stating our results, let us have a look at previous related works. Let E be some Polish state space, provided with its Borel sigma field E, and P(E) denote the set of probability measures on (E, E). Let (X n ) be a time-homogeneous discrete-time Markov chain with state space E and Fellerian transition kernel π. We say that the chain (X n ) is stable if there exists an unique πinvariant probability measure µ such that, for any initial distribution and almost every path, the sequence of the occupation measures converges weakly to µ. When π is strongly Fellerian, this property implies recurrence, but in the general case it does not imply irreducibility (see [MeyTwe], [Duf], or [Bor91] for details on stability).
Under various assumptions which almost always include the irreducibility of the chain, LDP have been proved for the sequence (Λ n ) (see, among other references, [DonVar75a], [DonVar75b], [DonVar76], [Aco90], [DinNey95], [Lip96], [Jai90], [BryDem96], [Ell88]), but stability and the Feller property are sufficient conditions for the sole upper LDP. In this section we are concerned with large and moderate deviations results for empirical means of the type where g is an R q -valued function on E. Using the results cited above, one may achieve the LDP for (Λ n (g)) if g is a bounded but not necessarily continuous function; a reference for the case of continuous and unbounded g is [GulLipLot94]. On the other hand, under various weaker conditions, CLT of the following form are proved (where S 2 (g) is some q × q covariance matrix related to g). Given a speed (a n ) such that a n = o(n), it is natural to associate to this CLT a LDP of speed (a n ) and rate I(x) = 1 2 x t (S 2 (g)) −1 x for the sequence ( n/a n (Λ n (g) − µ(g))).
In this section we achieve this type of MDP for a class of unbounded continuous functions g and a particular type of Markov chain, and assumptions of stability of the chain. During the recent years, several works have been devoted to the study of moderate deviations principles related to Markovian models, the main references being [Wu95], [Gao96], [Aco97], and [AcoChe98]: a large class of models is covered by these works, but (with the exception of [Wu95]) the boundedness of the function g is always assumed. See [Aco97] for a discussion on the various conditions proposed in these references.

b) Functional autoregressive models and Sanov theorem
In Section 2.2, where our results are stated, we study stable functional autoregressive models of order p ≥ 1, but in this introductory paragraph, we consider the model of order 1 defined by The transition π is such that π(x, ·) is the distribution of f (x) + σ(x) , and, if ν is the initial distribution and g : R d → R is measurable, we will denote by Λ n,ν the occupation measure defined in the previous paragraph, and set Λ n,ν (g) = g dΛ n,ν .
It is known that the stability of this model is achieved under one of the following two commonly encountered assumptions (see [Duf]): -case (i) irreducibility of the transition kernel ; -case (ii) a Lipschitz condition (which implies (4)): With property (4), the results from [DupEll], chapter 8, p.299, which are very close to those of Donsker and Varadhan, apply (indeed, the "exponential tightness" Condition 8.2.2 p.280 is satisfied; cf lemma 1 part 1 in paragraph 4.2.1). The following upper LDP for (Λ n,ν ) is thus valid.

Theorem 1 (Donsker-Varadhan, Dupuis-Ellis)
In one of the two cases stated above, and with assumptions (H) and (4), the Markov chain defined by (3) is stable. If µ denotes its unique invariant probability measure, and if we set, for any ξ ∈ P(R d ), where R(.|.) denotes the usual relative entropy between two probability measures, then 1. I is a good rate function and I(ξ) = 0 ⇔ ξ = µ.
Consequently, for any bounded continuous g :

Convergence with exponential speed
Relation (5) is this "convergence in probability with exponential speed" of Λ n,ν (g) to µ(g) we generalize for some unbounded g and the following functional autoregressive model of order p in the following framework: functions f : R dp → R d and σ : R dp →]0, +∞[ satisfy assumption (H) above, as well as one of the following two conditions, with α 1 , . . . , α p being ≥ 0 and such that 0 < α 1 + . . . + α p < 1: Case (i) the noise's distribution has a strictly positive density w.r.t. the Lebesgue measure (irreducibility), and f satisfies for some norm |.| on R d , c ≥ 0, and any x ∈ R dp : Case (ii) (Lipschitz model ) for some norm |.| on R d , f and σ satisfy the following relation for β ≥ 1 and any x, y ∈ R dp This assumption provides the existence of a norm |. | on R dp such that, if X (p)x n denotes the vector X (p) n for the initial state x, we have for any where 0 < α < 1 and c p > 0 is a constant. The proof that (L) implies (7) is given in Section 4.1.
If ν denotes the distribution of X (p) 0 , we will denote by (X (p) n,ν ) the corresponding Markov chain. Under one of these two sets of conditions, this Markov chain is stable because (R) ensures the a.s. tightness of (Λ n ), whereas the uniqueness of limit points is given by the irreductibility of (i) or by (L) (see e.g. [BenMetPri], [Duf], and [Bor91] for more details): hence Theorem 1 applies to this model.
With . denoting the Euclidean norm on R dp and R q , |.| the norm on R d for which f satisfies (R) or (L), and β in (8) matching with β in (L), we achieve the following result (proved in Sections 4.1 and 4.2).
Proposition 1 ("exponential convergence in probability") In one of these two frameworks, assume that τ > 0, β > 1, and a family Φ ⊂ P(R dp ) are such that (1 + X 2) If g : R dp → R q is a continuous function satisfying j,ν ) converges to µ(g) with exponential speed, uniformly for ν ∈ Φ.
3) For any speed (a n ) such that a n = o(n), For instance Φ can be taken as a set of deterministic initial states located in a compact subset of R dp .

Moderate deviation principle
If (X n ) is a Markov chain with transition π and invariant distribution µ, and g is a µ-integrable function such that the Poisson equation admits a solution G, then the sequence defines a martingale such that, when G is µ-square-integrable, Hence a CLT is at hand, as well as a MDP for ( √ n(Λ n (g) − µ(g))) as soon as G is bounded (as in [Gao96]) and applying Dembo's result in [Dem96].
Let (X n ) denote the autoregressive model defined in 2.2.1 in the Lipschitz model framework (ii). We adapt the method outlined above to achieve this MDP, for a function g : R dp → R q in the set Li(r, s) of Lipschitz of order (r, s) functions (with s > 0 and r ≥ 0), i.e. that satisfies (where . denotes at the same time the Euclidean norm on R dp and on R q ) Indeed, under the "Lipschitz mixing property" (7) and some assumptions (which are largely satisfied in our framework) on r, s, and the moments of the noise, it is proved in [BenMetPri] (part II, chapter 2) or in [Duf] (chapter 6) that the Poisson equation admits a solution G which is also in Li(r, s).
Let D 1 (R q ) denote the set of càd-làg functions from [0, 1] to R q , provided with the usual Skorohod topology. We achieve the following MDP, which proof relies mainly on Puhalskii's criterion developped in [Puh94] and is achieved in Section 4.3 (and 4.1).
Theorem 2 Let (X n ) denote the functional autoregressive model (6) satisfying the Lipschitz condition (L) above and the assumptions of Proposition 1, for some β > 2.
Let g ∈ Li(r, s). If 1 ≤ r + s < β/2, then, for any given speed (a n ) such that a n = o(n), the sequence ( n an (Λ n (g) − µ(g))) n∈N satisfies in R q a LDP with speed (a n ) and rate Moreover the sequence of processes ((X satisfies in D 1 (R q ) a LDP with speed (a n ) and the associated functional rate defined by otherwise. (11) where S 2 (g) − is the "generalized inverse" of S 2 (g).

Preliminary definitions
• Generalized Gaussian noise: in this section, MDP are achieved under the assumption that the noises ( n ) handled are generalized Gaussian, dominated by a centered Gaussian distribution of covariance L, i.e such that there exists a d × d covariance matrix L satisfying, Examples of generalized Gaussian noises are centered bounded noises, and white centered noises with a Laplace transform φ(θ) finite for every θ ∈ R d and such that lim sup See [Pet] lemma 2.5. Moreover, to achieve not only the upper MDP but the full MDP, we will be led to assume that the generalized Gaussian noises satisfy the following property: with ψ ± (0) = 0, ∇ψ ± (0) = 0, and D 2 ψ ± (0) = Γ, and Γ being some symmetric positive definite invertible d × d matrix.
The map vec is one to one from M δ,d to R δd , and we denote by mat : R δd → M δ,d its inverse.
If C ∈ M δ,δ and L ∈ M d,d , we define the Kronecker product of C and L as the δd × δd matrix C ⊗ L such that

General linear regression models
We first deal with the d-dimensional regression model (X n ) defined on the space (Ω, A, P) provided with a filtration (F n ), by . The errorθ n of the least squares estimatorθ n of θ is The sequences (C n ) and (Q n ) are both previsible ones. In Theorem 3 below, the results are stated for the vector sequence (vecθ n ) in order to avoid any confusion with handling MDPs in spaces such as M δ,d .
Let (v n ) and (a n ) be speed sequences such that a n = o(v n ). The assumptions on the noise and the explicative variables are the following.
(N1) the noise ( n ) is generalized Gaussian, dominated by a Gaussian distribution of invertible covariance L.
(N2) the noise ( n ) satisfies property (P Γ ) (defined in Section 3.1). (C1) there exists some invertible C ∈ M δ,δ such that, for every r > 0, (C2) (φ n ) satisfies the exponential Lindeberg condition: The following theorem relies on a moderate deviations result for regressive sequences such as (M n ) above, which proof may be found in [Wor98a] (see also [Wor98b]) and is based on the method of cumulants developped in [Puh94] by Puhalskii. The proof is performed in Section 5.
Theorem 3 Letθ n =θ n − θ denote the error of the least squares estimator of θ in the regression model defined by (14), and (a n ) be a speed satisfying a n = o(v n ).
a) Under (N1) and (C1), ( vn an vecθ n ) n∈N satisfies in R δd an upper-LDP of speed (a n ) and rate b) In addition, if the noise is white with distribution N (0, Γ), or if (N2) and (C2) are also valid, then ( vn an vecθ n ) n∈N satisfies a LDP of speed (a n ) and rate Corollary 1 The latter theorem applies with v n = n when (φ n ) is an i.i.d. square-integrable sequence independent of the noise ( n ), with previous assumptions on this noise.
Remark 3: It is straightforward to check that the following "exponential Lyapunov condition" implies the "exponential Lindeberg condition" (C2): there exist a β > 2 and a C > 0 such that lim sup 1 Both conditions are naturally satisfied if (φ j ) is a bounded sequence.

Stable linear autoregressive models
Let (X n ) be the stable autoregressive model of order p (AR(p)) and dimension d defined by with an initial state X (p) 0 = (X 0 , X −1 , . . . , X −p+1 ) independent from the noise. By stable model we mean that all roots of the polynomial z → det(I − A 1 z − · · · A p z p ) have their modulus > 1. For the noise ( n ), the assumptions may be of two types, with Γ below designing an invertible d × d covariance matrix: Case 1: ( n ) is a white generalized Gaussian noise with covariance Γ, and for some τ > 0, Case 2: ( n ) is a generalized Gaussian noise such that E [ n t n | F n−1 ] = Γ (∀n ≥ 1) and for some β > 2, τ > 0, and E > 0, Moreover, we assume that for every r > 0 and its "stationary covariance" is Moreover, there exists R > 0 sufficiently large such that and, in case 1, for any r > 0, 2. We assume that C is invertible (e.g. A p is invertible). Letθ n =θ n − θ be the error of the least squares estimatorθ n of θ = [A 1 A 2 · · · A p ] t for the regression model (I) For any given speed (a n ) such that a n = o(n), the sequence ( v n /a n vecθ n ) n∈N satisfies in R dp×d an upper LDP with speed (a n ) and rate (15) (with δ = dp), and it obeys the full LDP with rate (16) if in addition ( n ) satisfies property (P Γ ) (which is the case in case 1).
In order to establish Theorem 4, we shall prove the following result (which is a version of part 1 of Theorem 4 when p = 1).
For the noise ( n ) and the initial state X 0 , we take the assumptions of theorem 4 (case 1 or case 2). Then for every r > 0, where C is the stationary covariance of the model, and there exists R > 0 sufficiently large such that and, in case 1, we have for every r > 0 lim sup R→∞ lim sup n→∞ If we apply this proposition to model (II), we achieve part 1 of Theorem 4, and then part 2 comes easily by application of theorem 3 to the regression model (X n ) with explicative variables sequence (X (p) n ). We prove Proposition 2 ( and therefore Theorem 4) in Section 6. The most simple framework in which Theorem 4 applies is when ( n ) is a generalized Gaussian white noise with covariance Γ (i.e. case 1). Another simple application is when ( n ) is a bounded martingale increment sequence with conditional covariance Γ.

Proof of proposition 1 and theorem 2 4.1 Restriction to the case p = 1
• We want to induce the general case p ≥ 1 from the case p = 1. Let us consider the following associated autoregressive model (X and construct a norm on R dp for which F would satisfy the assumptions (either (i) or (ii)). As in 2.2.1 (L) implies (R) and we only need relation (R) to prove Proposition 1, we will deal with (R) first. We will then turn to Theorem 2, which applies only if the model is "Lipschitz-mixing" (that is, in case (ii)), as it needs the existence of a solution (in Li(r, s)) of the Poisson equation associated to g, relying on relation (7).
• If |.| denotes the norm on R d for which either (R) or (L) is satisfied, we adopt the notations |x| = (|x 1 |, . . . , |x p |) t ∈ R p for x ∈ R dp , and for v and w vectors of As α 1 + . . . α p < 1, it is known that C is an irreducible positive matrix to which the Perron Froebenius theorem applies, i.e. the spectral radius ρ(C) of C is an eigenvalue of C and there exists an eigenvector v > 0 associated to λ = ρ(C) < 1. We can assume v 1 = 1, and consequently v = (1, λ −1 , . . . , λ −p+1 ). Hence, for x ∈ R dp , if we set |x | = sup i=1...p λ i−1 |x i |, then |x| ≤ |x |v hence C|x| ≤ λ |x |v and we get As 0 < λ < 1, property (R) is valid for F instead of f . Therefore, for Proposition 1, we can restrict the proof to the case p = 1.
• Now, we have to prove that (L) implies (7) (and consequently that Theorem 2 is proved for p ≥ 1 if it is for p = 1). To the notations above, we add the following ones: and consequently for every n ≥ 1 Moreover, (7) is proved with c p = pλ (p−1)(1−β) .

Proof of proposition 1 (with p = 1)
It is straightforward to see that the results of Proposition 1 extend to q > 1 if they are proved for q = 1. We will therefore assume that q = 1 in this proof. On the other hand, as we restricted the proof to the case p = 1, we will use the norm |.| for which (R) or (L) holds as the norm on R dp = R d (instead of the Euclidean norm appearing in each of the relations (8), (9), and (10)). Finally, to lighten the notations, in this and later proofs, we will often omit the subscript ν in X n,ν .

Proof of 1)
• First note that (L) implies relation (R) with c = |f (0)|. Hence, in either case, if we set Applying the Hölder inequality to the measure on {0, . . . , n} giving the weight α n−k to k, and to the function ψ defined by ψ(0) = |X 0 | and ψ(k) = η k , we get (as 0 < α < 1) As for • Let τ > 0. In [DonVar76], [GulLipLot94], and other papers related to Sanov theorem for Markov chains (see 2.1.1), the following functions are introduced (with π(x, dy) denoting the transition probability of the Markovian model (X n ), and |.| the norm for which (R) holds) We state the following lemma, part 2 of which will also prove useful in Section 4.3.2.
b) Proof of 2. The first assertion is easy to check (by the definition of V ), and, as 1 ≤ γ < β and R ≥ 1, we have for every

Proof of 2)
It comes easily from relation (9) and from Sanov-type results such as Theorem 1 (part 3) stated in the introduction. Indeed, by the assumption on g, there exists some constant A > 0 such that For every R > 0, we define the following real valued bounded continuous function g (R) on R d We have |(g − g (R) )(
Remark 4: On the basis of the upper LDP stated in Theorem 1 and of relation (9), we could easily follow the approach of [GulLipLot94] and establish an upper LDP for the sequence (Λ n (g)), using Lemma 2.1.4 of [DeuStr].

Proof of 3)
Let (a n ) be such that a n = o(n). By relation (20) and easy calculations on the initial state, we see that (10) will result from the following statement, which we prove in this paragraph: if R > 0, lim sup 1 a n log P 1 √ na n sup where c 2 denotes the variance of the white noise (η k ).
j=1 η j . It is known that the sequence LDP of speed (a n ) and rate function is absolutely continuous and φ(0) = 0, and J(φ) = ∞ otherwise. Moreover, it is shown in [DesPic79] that Consequently, as Hence, (23) comes as we have

Preliminary calculations
• Let g ∈ Li(r, s) with r, s being such that 1 ≤ r + s < β 2 . Thanks to property (7), there exists a solution G to the Poisson equation associated to g, with G ∈ Li(r, s) as well. Thus, with . designing the Euclidean norm on R q , there exist some positive constants A and A such that For every R > 0, we denote by G R the following functional • We now prove the following relation: there exists some constant C > 0 such that where δ R and R = R (R, G) are real numbers such that We set γ = r + s and S(R, where C 1 , C 2 , C 3 are positive constants independent from R and x. As we have the desired result comes with C = max{C 1 + C 2 , C 3 } and • Associated to (9), relation (24) implies the following property and the same is obviously true replacing the functional π(G − G R ) by (G − G R ). Moreover, as G ∈ Li(r, s), the functions are all Lipschitz of order (r, 2s + r): as 2(r + s) < β, it is therefore easy to check that we also achieved lim sup as well as the corresponding relations with All these properties will prove useful in the next paragraph, and we will refer to all of them simply by "relation (25)" or "lation (26)"ote that these relations remain true if we remove "lim sup R→+∞ " and replace R by R(n), where R(n) stands for any sequence such that R(n) ↑ ∞. Li(r, 2s + r) and

Proof of the MDP
H(X j−1 ).
In order to achieve the required LDP for the sequence (X (n) • ) introduced in the statement of the theorem, the first step is to prove the so-called C-exponential contiguity (see [LipPuh] for details on this subject, or [Puh97]) in D 1 (R q ) between the sequences (X • For any r > 0, we have But r + s < β, hence (10) implies that, for every r > 0, lim sup n→∞ 1 an log P 1 √ nan sup k≤n |η k | r+s > r = −∞, and consequently, as n/a n n→∞ −→ + ∞, the first C-exponential contiguity is achieved: • The second contiguity we have to prove is the following relation: for any r > 0, lim sup 1 a n log P sup We have If we apply part 2 of Lemma 1 to v n = √ na n , γ = r + s, and R = R(n) = (n/a n ) δ , we obtain 1 a n log P 1 √ na n n−1 i=0 (1 + |X i,ν | r+s ) I {|X i,ν | r+s >R(n)} > r ≤ − rτ 2 n a n δ + n a n V * + log E(Φ) a n with δ = 1 2 + δ β−(r+s) r+s ; as r + s < β 2 , we may choose δ such that r+s 2(β−r−s) < δ < 1 2 , hence δ > 1, and thus (29) results from (24), (30), and (31).
• Next we show that (27) extends to the martingale (N (n) ). If we set τ n = T r < P (n) > n , as (< M > k ) k and (< P (n) > k ) k are both increasing sequences, we have (see [Duf] Proposition 4.2.11). Hence for any r > 0 there exists a r > 0 such that as soon as we prove that P [ τ n /n > r ] ld < 0 (∀r > 0). This is the case thanks to relation (26) (with R(n) instead of R), as we have • It finally remains to prove that the processes sequence (N (n) [n•] / √ n) satisfies the LDP stated in the theorem. The proof relies heavily on Puhalskii's criterion of obtaining a LDP for càd-làg processes sequences (see [Puh94] Theorem 2.1), as it is the case in [Dem96], where the MDP for martingales with bounded jumps is proved.

Proof of theorem 3
• Let us first prove the following result.
Lemma 2 Let (M n ) be an adapted sequence with values in M δ,d , C an invertible δ × δ covariance matrix, and (C n ) a sequence of δ × δ symmetric random matrices such that, for some speed sequences (v n ) and (a n ) with a n = o(v n ), we have for every r > 0 a) If Q n = I d + C n , then for any r > 0 b) Let I be a rate on R δd such that lim x →∞ I(x) = +∞. If (vec M n / √ a n v n ) satisfies a LDP (resp. an upper LDP) of speed (a n ) and rate function I, then ( v n /a n vec(Q −1 n M n )) satisfies a LDP (resp. an upper LDP) of same speed and of rate function defined by

Proof:
a) Let A be a symmetric positive-definite matrix, with eigenvalues λ 1 , . . . , λ δ , and an orthonormal base of eigenvectors e 1 , . . . , e δ . We then have Hence, as AB ≤ A B , and setting R n = C −1/2 Qn vn C −1/2 , for all r > 0, as hence, as λ min R n ≤ 1/2 implies that R n − I ≥ 1/2, we obtain b) Let us study the case of a full MDP (as the following argument only relies on contiguity in large deviations, the case of an upper MDP follows easily). The function x ∈ R δd → vec (C mat x) ∈ R δd is continuous and one to one, of inverse y → vec (C mat y): the contraction principle thus entails that the sequence (vec (C −1 M n )/ √ a n v n ) satisfies a LDP of speed (a n ) and rate function J(·). Therefore it remains to prove the exponential contiguity of speed (a n ) between (vec (C −1 M n )/ √ a n v n ) and ( v n /a n vec (Q −1 n M n )); in other words, we must prove that, for every ρ > 0 , lim 1 a n log P v n a n Q −1 n M n − If ρ > 0 and r > 0, we have As by assumption (vec M n / √ a n v n ) satisfies a LDP of rate I(·) and speed (a n ), we have lim sup 1 a n log P M n √ a n v n ≥ ρ r ≤ −I(A r ) for every given r > 0, where A r = {x ∈ R δd / x ≥ ρ/r}. By assumption, we have lim r→0 I(A r ) = +∞, hence by a), making r → 0, (34) comes and the proof is complete.
• The following result is the central result of [Wor98a]: Theorem 5 Let (Y n ) be an adapted sequence with values in R δ , and ( n ) a subGaussian noise of dimension d, dominated by a centered Gaussian distribution of covariance L. We suppose that (Y n ) satisfies, for some δ × δ covariance matrix C = 0 and some speed sequences (v n ) and (a n ) such that a n = o(v n ), the exponential convergence We consider the regressive series a) The sequence (vec M n / √ a n v n ) n∈N satisfies in R δd an upper-LDP of speed (a n ) and rate if C and L are invertible, then N (0, Γ), or if the noise satisfies property (P Γ ) and (Y n ) satisfies the exponential Lindeberg condition (C2), then the sequence (vec M n / √ a n v n ) n∈N satisfies a LDP of speed (a n ) and rate

) In addition, if the noise is white with distribution
and • Theorem 3 is now ready to be proved. Part a) of Theorem 5 above applies with Y j = Φ j , hence the sequence (vec M n / √ a n v n ) satisfies an upper-LDP of speed (a n ) and rate I C,L (·) defined in (36): the application of part b) of Lemma 2 entails that ( v n /a n vec (Q −1 n M n )) satisfies the upper-LDP of speed (a n ) and rate In order to transfer this LDP to the sequence ( v n /a n vecθ n ), it thus suffices to prove the exponential contiguity of speed (a n ) between this sequence and ( v n /a n vec (Q −1 n M n )). This contiguity is provided by part a) of Lemma 2: indeed, for ρ > 0 and r > 0, there exists some n 0 = n 0 (ρ, r) such that for n ≥ n 0 , and the latter tends "(a n )-exponentially fast" to 0 thanks to part a) of Lemma 2.
The proof of the full MDP (part b)) follows the same scheme as above, just by replacing L by Γ at each occurence, and using part b) of Theorem 5.

Proof of theorem 4
As it was outlined at the end of 3.2.2, it suffices to prove Proposition 2. As a matter of fact, when the latter is applied to the AR(1) model (II), then part 1) of Theorem 4 is achieved, and part 2) results form the application of Theorem 3 to the regression model (I), whose sequence of explicative variables is the sequence (X

Preliminaries
• As the spectral radius ρ(A) of A is such that 0 < ρ(A) < a < 1 for some a, there exists a norm |.| that satisfies |Ax| ≤ |A||x| ≤ a|x|.
This norm will be much more convenient to use here, and we have the relations where . is the usual Euclidian matrix norm. Note that for some k, A k < a < 1.
• Let C denote the stationary covariance of this stable AR(1) model, i.e.
(in case 1, Γ is the noise's covariance and C the covariance of the stationary distribution of the Markov model, whereas in case 2, Γ = E [ n t n | F n−1 ]). Here we set C n = n j=0 X j X t j and have to show that In 6.2 we prove the latter relation (which is a rewriting of (17)) and relation (18), the proof being valid in both case 1 and case 2. At the end of the section, we will finaly prove, in case 1 (white noise setting with β = 2), that relation (19) is a consequence of (17).
Remark 5: If, in case 1, we had assumed an exponential moment of order β strictly greater than 2, then (17) would have resulted from the application of Proposition 1 proved in Section 4.2 (with g(x) = xx t , Λ n (g) = C n /n, and µ(g) = C).

Proof of proposition 2
• Exponential upper bound of X n / √ n: we have for every θ and n, The noise being generalized Gaussian, ∃Λ > 0 such that E [ exp < u, > |F j−1 ] ≤ exp Λ u 2 , and we have for every θ a). Therefore the Gärtner-Ellis theorem implies that the sequence 1 √ n n j=1 A n−j j satisfies the upper LDP of speed (n) and rate defined by J(x) = 1 4Λ x 2 ; hence, for every r > 0, We thus have established the following result.
Hence (18) is proved, as well as the following relation, taking β = 2 in the previous calculations: • A splitting of C n /n − C: let R > 0 such that (39) is verified, and > 0 be given. We have If we choose k sufficiently large such that C(k) − C < 3 and A k 2 < 3R , then by (39), as all we need to prove to obtain (17) is that for every r > 0 • Exponential upper bound of the distance from (C n − AC n A t )/n to Γ We have X j X t j = AX j−1 X t j−1 A t + j t j + AX j−1 t j + j (AX j−1 ) t , hence, taking the sum from j = 1 to n, . By our assumption on ( n t n − Γ) in case 2 (which is automatically satisfied in case 1, by the Cramer Theorem), and by (38). we have for any r > 0, Let us show that it is also true with D n . If u, v ∈ R d , we have = n j=1 (< u, AX j−1 >< v, j > + < u, j >< v, AX j−1 >); but < v, AX j−1 > u 2 ≤ u 2 v 2 A 2 T r(X j−1 X t j−1 ), and the noise is generalized Gaussian, hence if Λ > 0 is such that E [ exp < u, > |F j−1 ] ≤ exp Λ u 2 (∀u ∈ R d ), then exp(u t D n v − 2 u 2 v 2 A 2 ΛT r(C k−1 )) k≥1 is a positive supermartingale for every u, v ∈ R d . We take u = e i and v = te j where t > 0 and (e i ) n i=1 denotes the canonical base of R d , and consider the supermartingale (Y k ) k≥1 = exp(te t i D n e j − 2t 2 A 2 ΛT r(C k−1 )) k≥1 and the stopping time τ (n) = inf{k ≥ 1; 1 n T rC k ≥ R}.
It is indeed a stable autoregressive model of noise (η (k,r) n ) n≥1 still generalized Gaussian, with C(k) as its stationary covariance, and to which we can apply relation (41): in other words we have P C k,r n − A k C k,r n (A k ) t n − C(k) > ρ ld < 0 (∀ρ > 0).
But relation (38) of Proposition 3 implies that and [n/k]k n n→∞ −→ 1, hence we finally achieve, putting (42) and (43) together, relation (40) and put an end to the proof of (17). The proof is thus completed in case 2.

• Proof of (19) in case 1
We assume that the noise is white, and denote by µ the stationary distribution of the stable Markov model (X n ), which satisfies the conditions detailed in Section 1.2.1 (case (ii)). We end the proof of the theorem in case 1 by proving the following statement.
We have 0 ≤ F (x) I {F (x)≥R} ≤ F (R) (x) ≤ F (x), hence 0 ≤ Λ n (F I F ≥R ) ≤ Λ n (F (R) −F (R) ) + Λ(F (R) ) − µ(F (R) ) + µ(F (R) ) where Let r > 0. The functionsF (R) and F − (F ∧ R) being both continuous, bounded by F (hence µ-integrable), and with pointwise limit 0 as R → +∞, by Lebesgue's theorem there exists some R > 0 sufficiently large such that µ(F (R) ) + µ(F − (F ∧ R)) < r 4 , and consequently it comes from the bounds above I would like to thank the referee for his careful reading of the paper, as well as for drawing my attention on an error which appeared in the first version of the paper. Furthermore I am grateful to Professor Marie Duflo for her very helpful discussion, and her suggestions leading to a substantially improved presentation.