Strong approximation for additive functionals of geometrically ergodic Markov chains

Let (⇠i)i2Z be a stationary Harris recurrent geometrically ergodic Markov chain on a countably generated state space (E,B). Let f be a bounded and measurable function from E intoR satisfying the conditionE(f(⇠0)) = 0. In this paper, we obtain the almost sure strong approximation of the partial sums Sn(f) = Pn i=1 f(⇠i) by the partial sums of a sequence of independent and identically distributed Gaussian random variables with the optimal rate O(log n).


Introduction and main result
This paper focuses on a Komlós-Major-Tusnády type strong approximation for additive functionals of Markov chains.We first recall the famous Komlós-Major-Tusnády theorem (1975 and 1976): let (X i ) i>0 be a sequence of independent and identically distributed (iid) centered real-valued random variables with a finite moment generating function in a neighborhood of 0. Set σ 2 = Var X 1 and S n = X 1 + X 2 + • • • + X n .Then one can construct a standard Brownian motion (B t ) t≥0 in such a way that where a, b and c are positive constants depending only on the law of X 1 .From this result, the almost sure approximation of the partial sum process by a Brownian motion holds with the rate O(log n).It comes from the Erdös-Rényi law that this result is unimprovable.This result has been later extended to the multivariate case by Einmahl (1989), who obtained the rate O((log n) 2 ) in the almost sure approximation of partial sums of random vectors with finite moment generating function in a neighborhood of 0 by Gaussian partial sums.Next Zaitsev (1998) removed the extra logarithmic factor and obtained (1.1) in the case of random vectors.We refer to Götze and Zaitsev (2009) for a detailed review of the results on this subject.
We now come to the framework of this paper.Let (ξ n ) be an irreducible and aperiodic Harris recurrent Markov chain on a countably generated measurable state space (E, B).We will consider only chains which are positive recurrent and π will exclusively denote the (unique) invariant probability measure of (ξ n ).In that case the transition probability P (x, .) of the Markov chain satisfies the following minorization condition: there exists some positive integer m, some measurable function h with values in [0, 1] with π(h) > 0, and some probability measure ν on E, such that P m (x, A) ≥ h(x)ν(A) . (1.2) In order to avoid additional difficulties, we will assume throughout the paper that the above condition holds true with m = 1.Let then Q(x, •) be the sub-stochastic kernel defined by Under assumption (1.2), proceeding as in Nummelin (1984), we can define an extended chain ( ξn , U n ) in E×[0, 1] as follows.At time 0, U 0 is independent of ξ0 and has the uniform distribution over [0, 1]; for any nonnegative integer n, and U n+1 is independent of ( ξn+1 , ξn , U n ) and has the uniform distribution over [0, 1].Then the kernel P of the extended chain is equal to P ⊗ λ (here λ denotes the Lebesgue measure on [0, 1]).This extended chain is also an irreducible and aperiodic Harris recurrent chain, with unique invariant probability measure π ⊗ λ.It can easily be seen that ( ξn ) is an homogenous Markov chain with transition probability P (x, .).Define now the set C in E × [0, 1] by For any (x, y) in C, P( ξn+1 ∈ A | ξn = x, U n = y) = ν(A).Since π ⊗ λ(C) = π(h) > 0, the set C is an atom of the extended chain, and it can be proven that this atom is recurrent.Everywhere in the paper, we shall use the following notations: P π (respectively P C ) will denote the probability measure on the underlying space such that ξ0 ∼ π (resp.( ξ0 , U 0 ) ∈ C), and E π (•) will denote the P π -expectation (resp.E C (•) the P C -expectation).
Define now the stopping times (T k ) k≥0 by and the return times (τ k ) k>0 by τ k = T k − T k−1 . (1.7) Then T 0 is almost surely finite and the return times τ k are iid and integrable.Moreover, from the strong Markov property, it is well known that the random vectors ( ξT k +1 , . . ., ξT k+1 ) (k ≥ 0) are identically distributed and independent.Their common law is the law of ( ξ1 , . . ., ξT 0 ) under the probability P C .Let then From the above property, for any measurable function f from E into R, the random vectors (τ k , S T k (f ) − S T k−1 (f )) k>0 are independent and identically distributed.This fact was used in Csáki and Csörgö (1995) to get strong approximation results for the partial sums S n (f ) under moment assumptions on the return times τ k .Let us recall their result.Assume that the chain satisfies (1. Var S n (f ) and a n = n 1/p log n .(1.9)Note that the above result holds true for any bounded function f only if the return times have a finite moment of order p.The proof of Csáki and Csörgö (1995) is based on the regeneration properties of the chain, on the Skorohod embedding and on an application of the results of Komlós, Major and Tusnády (1975) to the partial sums of the iid random variables S T k+1 (f ) − S T k (f ), k > 0. Since the moments of the return times essentially play the same role as the moments of the random variables in the case of iid random variables, it seems clear that such a result is optimal, up to a possible power of log n.However this result has not been extended to the case p > 4. By contrast the strong approximation of the renewal process associated to the chain holds with the optimal rate O(n 1/p ) if E(τ p 1 ) < ∞, for any p > 2. Furthermore, if the chain is geometrically ergodic, then the strong approximation of the renewal process holds with the rate O(log n) (see Corollaries 3.1 and 4.2 in Csörgö, Horváth, and Steinebach (1987) for these results).
We now recall some possible methods to get strong approximation results.Some of these methods are based on the ergodicity properties of the Markov chain.For positive measures µ and ν, let µ − ν denote the total variation of µ − ν.Set (1.10) The coefficients β n are called absolute regularity (or β-mixing) coefficients of the chain.Then, as proved by Bolthausen (1980 and1982), for any p > 1, The second part of (1.11) is also called a weak dependence condition.Under a mixing condition which is more restrictive than (1.11) in the context of Markov chains, Shao and Lu (1987) obtained (1.9) with the rate a n = O(n 1/p (log n) c ) for some c > 1 for p in ]2, 4].Their proof was based on the so-called Skorohod embedding.Recently, using a direct method based on constructions via quantile transformations, as in Major (1976), Merlevède and Rio (2012) improved the results of Shao and Lu (1987).For p in ]2, 3[, they obtained (1.9) under the ergodicity condition (1.11) with the better rate a n = n 1/p (log n) (p−2)/(2p) .The results of Merlevède and Rio (2012) involve more general weak dependence coefficients than the coefficients β n , so that their result applies also to non irreducible Markov chains and to some dynamical systems.In the context of dynamical systems, Gouëzel (2010) used spectral methods to construct coupling with independent random variables and applied then strong approximation results for partial sums of independent random vectors to get rates of the order of n 1/p for p in ]2, 4[ in (1.9).The techniques used in these papers are suitable for Markov chains or non trivial dynamical systems, including the Liverani-Saussol-Vaienti map.Nevertheless the applied tools limit the accuracy to the rate O(n 1/4 ).
Recently, for stationary processes that are functions of iid innovations, Berkes, Liu and Wu (2014) obtained (1.9) with the rate O(n 1/p ) for any p > 2 provided that the inovations have finite moments of order p and the process has a fast enough arithmetically decay of some coupling coefficients.Moreover they give some application to nonlinear time series (see Example 2.2).However their condition (2.15) is too restrictive (even for functional autoregressive processes) and they do not give estimates of their coupling coefficients for more general Markov chains.
In this paper we are interested in general Harris recurrent Markov chains.Our aim is to obtain the optimal rate O(log n).Recall that, in the dependent case the rate o(n 1/p ) has never been surpassed.In order to get better rates of approximation, we will assume thoughout the paper that the Markov chain is geometrically ergodic, which means that (see Theorem 2.1 in Nummelin and Tuominen (1982)) where β n is defined in (1.10).Note now that P(τ where Q is defined by (1.3).Therefore, condition (1.12) together with Corollary 2.4 and Lemma 2.8 in Nummelin and Tuominen (1982) imply that both P(τ 1 > n) and P π (T 0 > n) decrease exponentially fast.Hence, if (1.12) holds there exists a positive real δ such that E e tτ 1 < ∞ and E π e tT 0 < ∞ for any |t| ≤ δ . (1.13) We will use this fact together with a strategy inherited from the papers of Bolthausen (1980 and1982) to get the optimal rates of strong approximation in that case: we will apply a strong approximation result of Zaitsev (1998) to the multidimensional partial sum process (T n − T 0 , S Tn (f ) − S T 0 (f )) rather than the initial theorems of Komlós, Major andTusnády (1975 and1976).This method enables us to get the optimal rate of convergence.Let us now give our main result.
then there exists a standard Wiener process (W t ) t≥0 and positive constants a, b and c depending on g and on the transition probability P (x, •) such that, for any positive real x and any integer n ≥ 2, We now give in a separate corollary the application of this result to additive functionals of the initial chain.The proof, being immediate, will be omitted.
Corollary 1.1.Let (ξ n ) be a stationary, irreducible and aperiodic Harris positive recurrent Markov chain on E, with invariant probability measure π.Assume that the chain satisfies (1.2) with m = 1 and the geometric ergodicity condition (1.12).Let f be any bounded measurable function from E to R such that π(f ) = 0 and let then there exists a standard Wiener process (W t ) t≥0 and positive constants a, b and c depending on f and on the transition probability P (x, •) such that, for any positive real x and any integer n ≥ 2, 2 Proof of Theorem 1.1 Before proving our main result, we give an idea of the proof.The constants v, ṽ, λ and γ appearing below will be specified in Subsection 2.3.For any i ≥ 1, let g( ξℓ , U ℓ ) .
The random variables (X i , τ i ) i>0 are independent and identically distributed.Let then α be the unique real such that Cov(X k − ατ k , τ k ) = 0. Applying the multidimensional extension of the results of Komlós Major and Tusnády (1976), which is due to Zaitsev (1998), we obtain that there exist two independent standard Brownian motions (B t ) t and ( Bt ) t such that Next, using the Komlós-Major-Tusnády strong approximation theorem, one can construct a Poisson process N with parameter λ from B in such a way that For this Poisson process, The processes (B t ) t and (N t ) t appearing here are independent.From the above result one can deduce that If v = 0, which corresponds to the case of renewal processes, then Up to some multiplicative constant, the process on right hand is a partial sum process associated to iid random variables with exponential law.Hence, using the Komlós-Major-Tusnády strong approximation theorem again, one can construct a Brownian motion W such that which leads to the expected result.Notice that the Brownian motion W depends only on the Poisson process N and on some auxiliary atomless random variables independent of the σ-field generated by the processes B and N .
If v = 0 and α = 0, (2.1) ensures that As noted by Csörgö, Deheuvels and Horváth (1987), since the renewal process of the Poisson process is the partial sum process associated to independent random variables with exponential law, the above compound process is a partial sum process associated to iid random variables with a finite Laplace transform, and consequently, one can construct a Brownian motion W such that which leads to the expected result.However the Brownian motion W depends on N .It follows that, in the case α = 0 and v = 0, the so constructed processes W and W are not independent.
Then the construction of Csörgö, Deheuvels and Horváth (1987) cannot be used to prove our theorem.
In order to perform the exact rate in the case α = 0, it will be necessary to construct a Brownian motion W * independent of N in such a way that Since W * is independent of N , it will also be independent of W .Then, using (2.1) and (2.2), we will get that Sn (g) = W * n + W n + O(log n) a.s.which will imply our strong approximation theorem.The proof of (2.3) will be done in Subsection 2.2.Then, starting from this fundamental result, we will prove the main theorem.

Some technical lemmas
Lemma below follows from the classical Cramér-Chernoff calculation (see also, for instance, Lemma 1 in Bretagnolle and Massart (1989)).

Lemma 2.1. Let Z be a real-valued random variable with Poisson distribution of parameter m.
Then, for any positive x and any sign ε, we have where Next lemma follows once again from the classical Cramér-Chernoff calculation together with the Doob maximal inequality.Lemma 2.2.Let (N (t) : t ≥ 0) be a real-valued homogeneous Poisson process of parameter m.Then, for any positive reals x and s, we have where h(•) is defined by (2.4).Lemma 2.3 below is due to Tusnády in his Phd-thesis (see Bretagnolle and Massart (1989) for a complete proof of it).
Lemma 2.3.Let ξ be a random variable with law N (0, 1), Φ its distribution function and Φ m the distribution function of a Binomial law Then the following inequality holds:

A fundamental lemma
The main new tool for proving Theorem 1.1 is the lemma below.
Lemma 2.4.Let (B t ) t≥0 be a standard Brownian motion on the line and {N (t) : t ≥ 0} be a Poisson process with parameter λ > 0, independent of (B t ) t≥0 .Then one can construct a standard Brownian process (W t ) t≥0 independent of the Poisson process N (•) and such that, for any positive integer n ≥ 2 and any positive real x, where A, B and C are positive constants depending only on λ.Furthermore (W t ) t≥0 may be constructed from the processes (B t ) t≥0 , N (•) and some auxiliary atomless random variable δ independent of the σ-field generated by the processes (B t ) t≥0 and N (•).
Note that (ẽ j,k ) j∈Z,k≥0 is a total orthonormal system of ℓ 2 (R).Hence for any t ∈ R + , B t can be written as (2.5) )}, and notice that ( fj,k ) j∈Z,k∈E j is an orthonormal system whose closure contains the vectors 1 ]0,N (t)] for t ∈ R + and then the vectors Since conditionally to N (•), ( fj,k ) j∈Z,k∈E j is an orthonormal system and (Y j,k ) is a sequence of iid standard Gaussian random variables, independent of N (•), one can easily check that, conditionally to N (•), (W ℓ ) ℓ≥0 is a Gaussian sequence such that Cov(W ℓ , W m ) = ℓ∧m.Therefore this Gaussian sequence is independent of the Poisson process N (•).By the Skorohod embedding theorem, there exists a standard Wiener process (W t ) t which coincides with the Gaussian sequence (W ℓ ) at integer values.Furthermore this Wiener process can be constructed from the Gaussian sequence and an auxiliary atomless random variable δ independent of the σ-field generated by the processes (B t ) t≥0 and N (•).
Let c 1 and c 2 be two positive reals such that and Let n 0 be the smallest integer such that n 0 ≥ c 1 and The lemma will be proven if we can show that there exist positive constants a and b depending only on λ, such that for any n ≥ max(2 5 , n 0 ), (2.10) Indeed, for any integer n in [2, max(2 5 , n 0 )], it can be easily shown that the conclusion of the lemma holds.From now on, n is a positive integer such that n ≥ max(2 5 , n 0 ).To prove (2.10), we first define j 0 as the smallest integer such that where c 1 and c 2 are positive reals satisfying (2.7) and (2.8) respectively.Now, let K be the integer such that But, by Lévy's inequality, Using the definition (2.11) of j 0 , it follows that Using once again Lévy's inequality and the definition (2.11) of j 0 , we get On another hand, by Lemma 2.2, P sup Hence, by using (2.11), So, overall, Therefore, starting from (2.12) and considering the upper bounds (2.13) and (2.14), we derive that to prove the lemma, it suffices to show that there exist positive constants A 1 and B 1 depending only on λ, such that for any n ≥ max(2 5 , n 0 ), In the rest of the proof, we shall prove the inequality above.Taking into account (2.5) and (2.6), we first write that, for any ℓ ∈ N * , Notice that if ℓ2 j 0 / ∈]k2 j , (k + 1)2 j [ then Therefore setting we get Setting where (2.17) It follows that Recall now that (Y j,k ) j>0,k≥1 is a sequence of standard centered Gaussian random variables that are mutually independent.In addition this sequence is independent of (N (t), t ≥ 0).Therefore, So, overall, by using (2.11) and the fact that 2 K ≤ 2n, where Θ a,ℓ,j 0 = {a j,ℓ j ≤ 3 2 (λ2 j−1 ) for all j ≥ j 0 } ∩ {a j,ℓ j ≥ 1 2 (λ2 j−1 ) for all j ≥ j 0 } , and Θ b,ℓ,j 0 = {b j,ℓ j ≤ 3 2 (λ2 j−1 ) for all j ≥ j 0 } ∩ {b j,ℓ j ≥ 1 2 (λ2 j−1 ) for all j ≥ j 0 } .
We have P(Θ c a,ℓ,j 0 ) ≤ Hence, by Lemma 2.1, Therefore, A similar bound is valid for P(Θ c b,ℓ,j 0 ).Hence by (2.11), (2.20) Starting from (2.18) and taking into account the upper bound (2.20), we infer that to prove (2.15) and then the lemma, it suffices to show that there exist two positive constants A 2 and B 2 such that, for any n ≥ max(2 5 , n 0 ), for any c 1 ≥ c1 and any c 2 ≥ c2 where c1 and c2 are defined in (2.7) and (2.8) respectively.
To prove the inequality (2.21), we first notice that, by definition of U j,ℓ j and V j,ℓ j , and that if k ∈ {j 0 , . . ., K} with ℓ ∈ [2 k−j 0 , 2 k+1−j 0 [ ∩ N, then ℓ j = 0 for any j ≥ k + 1, and t j ≤ 1/2 for any j ≥ k + 2. Therefore, In the rest of the proof, if it is not specified, k and ℓ are two integers such that k ∈ {j 0 , . . ., K} and ℓ ∈ [2 k−j 0 , 2 k+1−j 0 [.On the set Θ ℓ,j 0 , Hence, for any y > 2 −4 , and taking into account (2.11), So, overall, starting from (2.22) and taking into account the considerations above, we get, for any n ≥ 10, We prove now that, for any n ≥ 2 5 , and any c 1 ≥ c1 where c1 is defined in (2.7), Clearly taking into account the restriction on c 1 and the fact that c 2 ≥ 1765(2+ √ 2) 2 ( √ 2−1) 2 , the inequality (2.21), and then the lemma, will follow from (2.23) and (2.24).To prove (2.24), we first write the following decomposition Set, for any j > 0 and k ≥ 0, Recalling the definition (2.16) of U j,k and noticing that a j,ℓ j + b j,ℓ j = Π j,ℓ j , we then get Whence, using the fact that, for t j ∈]0, 1/2], N (ℓ2 j 0 ) − N (ℓ j 2 j ) ≤ a j,ℓ j , we infer that Moreover using the fact that, on the set Θ ℓ,j 0 , a j,ℓ j ≥ λ2 j−2 and Π j,ℓ j = a j,ℓ j + b j,ℓ j ≥ λ2 j−1 , we get that, on the set Θ ℓ,j 0 , On another hand, permuting the roles of a j,ℓ j and of b j,ℓ j in (2.25), we get where we recall that Π j,ℓ j = a j,ℓ j + b j,ℓ j .Since it follows that for any Whence, using the fact that, for Since, on the set Θ ℓ,j 0 , b j,ℓ j ≥ λ2 j−2 and Π j,ℓ j ≥ λ2 j−1 , we get that, on the set Θ ℓ,j 0 ,
To handle the terms in the inequality above we shall introduce the following double indexed sequence (ξ j,k ) j>0,k≥0 of Gaussian random variables.Let Φ be the distribution function of a standard real-valued Gaussian random variable and Φ n be the distribution function of the Binomial law B(n, 1/2).Let (δ j,k ) j>0,k≥0 be a sequence of iid random variables with uniform law on [0, 1], independent of the Poisson process N (•).For any j ∈ N * and k ∈ N, let where we recall that the Π j,k 's have been defined in (2.26).Note that, conditionnally to the sigma algebra, say F j , generated by the random variables {Π j,k : k ≥ 0} and {δ i,k : i < j, k ≥ 0} the random variables (ξ j,k ) k≥0 are independent with law N (0, 1).By recurrence, it follows that for any positive integer m 0 , (ξ j,k ) j≤m 0 ,k≥0 is a sequence of independent random variables with law N (0, 1), and therefore (ξ j,k ) j>0,k≥0 is a sequence of iid standard real-valued Gaussian random variables.Moreover according to Lemma 2.3, Since lim m→∞ 2 −m Π m,ℓm = λ almost surely, we have

Proof of Theorem 1.1
Notice first that it suffices to prove the result for any positive real x such that x ≤ 2n g ∞ .Indeed since | Sk (g)| ≤ k g ∞ for any positive integer k, it follows, by Lévy's inequality, that for any standard Wiener process (W t ) t≥0 and any real x > 2n g ∞ , Therefore, to prove the theorem, it suffices to show that there exists a standard Wiener process (W t ) t≥0 such that (1.15) holds for any positive real x satisfying x ≤ 2n g ∞ .From now on, x will be a positive real satisfying the latter condition.
For any i ∈ N * , let With this notation k i=1 X i = ST k (g) − ST 0 (g).Let τ k be defined by (1.7).Notice that (τ k , X k ) k≥1 forms a sequence of iid random vectors.In addition for any k, E(X k ) = 0 since π ⊗ λ(g) = 0. We can assume without loss of generality that Var(τ 1 ) > 0. Indeed if Var(τ 1 ) = 0 then τ 1 is almost surely equal to some positive integer d.Then τ i = d almost surely for any positive integer i, which implies that T k = kd + T 0 almost surely.The result follows then easily from the Komlós-Major-Tusnády theorem applied to the above sequence (X i ) i>0 and the fact that T 0 has a finite Laplace transform in a neighborhood of 0.
Taking into account all the considerations above mentioned, we can apply Theorem 1.3 in Zaitsev (1998) to the multivariate sequence of iid random variables (τ k , X k −α(τ k −E(τ k ))) k>0 to conclude that there exists a sequence and satisfying, for some positive constants C 1 , A 1 and B 1 depending on g and on the transition probability P (x, •), the following inequalities: for any integer n ≥ 2, where Note now that the random variables Γ k defined by k≥1 forms a sequence of iid random variables with exponential law of parameter λ.Therefore, according to Theorem 1(i) in Komlós, Major and Tusnády (1975), there exists a standard Wiener process ( W t ) t≥0 such that, for any integer n ≥ 2, (2.53) where C 5 , A 5 and B 5 are positive constants depending on λ.Notice that the so constructed Wiener process W depends only on the process N −1 and on some auxiliary atomless random variable U independent of the σ-field generated by the processes B, N and the auxiliary random variable δ of Lemma 2.4.On another hand, since (B t ) t≥0 is independent of (N (t) : t ≥ 0), according to Lemma 2.4, there exists a standard Brownian process (W * t ) t≥0 independent of the Poisson process N (•) and such that, for any integer n ≥ 2, where C 6 , A 6 and B 6 are positive constants depending on λ.Moreover (W * t ) t is measurable with respect to the σ-field generated by the processes B, N and the auxiliary random variable δ of Lemma 2.4, which ensures that (W * t ) t is independent of the σ-field generated by N (•) and U .Hence the Wiener processes W and W * are independent.
In what follows we shall prove that (1.15) holds true with

2 ) with m = 1 . 2 k
If the random variables S T k (|f |) − S T k−1 (|f |) have a finite moment of order p for some p in ]2, 4] and if the return times τ k satisfy E(τ p/) < ∞, then one can construct a standard Wiener process