The notion of $\psi$-weak dependence and its applications to bootstrapping time series

We give an introduction to a notion of weak dependence which is more general than mixing and allows to treat for example processes driven by discrete innovations as they appear with time series bootstrap. As a typical example, we analyze autoregressive processes and their bootstrap analogues in detail and show how weak dependence can be easily derived from a contraction property of the process. Furthermore, we provide an overview of classes of processes possessing the property of weak dependence and describe important probabilistic results under such an assumption.


Mixing vs. weak dependence
For a long time mixing conditions have been the dominating type of conditions for imposing a restriction on the dependence between time series data. Such conditions were introduced at the end of the fifties by Rosenblatt (1956) and by the Saint Petersburg school, due to Ibragimov (1962). The notion was mainly used in a systematic way by statisticians since this notion fits quite well with nonparametric techniques; see for example Rosenblatt (1985) for details. The monograph edited by Eberlein and Taqqu (1986) describes the state of the many processes which are not mixing. Borovkova, Burton and Dehling (2001) is also an interesting advance in this area, those authors deal with functions of mixing sequences. A slightly simplified version of Doukhan and Louhichi (1999)'s definition is given here: Definition 1.1. A process (X t ) t∈Z is called ψ-weakly dependent if there exists a universal null sequence (ǫ(r)) r∈N such that, for any k-tuple (s 1 , . . . , s k ) and any l-tuple (t 1 , . . . , t l ) with s 1 ≤ · · · ≤ s k < s k + r = t 1 ≤ · · · ≤ t l and arbitrary measurable functions g : R k → R, h : R l → R with g ∞ ≤ 1 and h ∞ ≤ 1, the following inequality is fulfilled: |cov (g(X s1 , . . . , X s k ), h(X t1 , . . . , X t l ))| ≤ ψ(k, l, Lip g, Lip h) ǫ(r).
(i) In Bickel and Bühlmann (1999), another type of weak dependence, called ν-mixing, was introduced. Similarly to Definition 1.1, uniform covariance bounds over classes of functions with smooth averaged modulus of continuity are required. Usually, examples of processes obey both notions of weak dependence. We think that it is sometimes easier to verify a condition of weak dependence as in Definition 1.1 which we prefer for this reason. (ii) Dedecker and Prieur (2005) introduced another related notion, called ϕweak dependence, which is particularly adapted to expanding dynamical systems. (iii) For the special case of causal Bernoulli shifts with i.i.d. innovations, that is X t = g(ε t , ε t−1 , . . .), Wu (2005) introduced other measures of dependence which are somewhat connected to the coupling idea below. The notion of weak dependence considered here is more general and seems also to include processes which cannot be represented as Bernoulli shifts (e.g. associated processes perhaps cannot be written as Bernoulli shifts).
Remark 2. (Some classes of weak dependence) Specific functions ψ yield variants of weak dependence appropriate to describe various examples of models: • κ-weak dependence for which ψ(u, v, a, b) = uvab; in this case we simply denote ǫ(r) as κ(r). • κ ′ (causal) weak dependence for which ψ(u, v, a, b) = vab; in this case we denote ǫ(r) as κ ′ (r). This is the causal counterpart of κ coefficients which we recall only for completeness.
It turns out that the notion of weak dependence is more general than mixing and allows to treat, for example, also Markovian processes driven by discrete innovations as they appear with time series bootstrap. In the next section we consider as an instructive example linear autoregressive processes of finite order and a corresponding bootstrap version thereof. We will demonstrate that the desired property of weak dependence readily follows from a contraction property which is typical for such models under standard conditions on the parameters. The approach described there is also applicable to proving weak dependence for many other classes of processes. Section 3 contains further examples of processes for which some sort of weak dependence has been proved. In Section 4 we give an overview of available tools under weak dependence. In particular, we provide a Donsker invariance principle and asymptotics for the empirical process. Furthermore, we also give Lindeberg Feller central limit theorems for triangular arrays. Finally, we provide probability and moment inequalities of Rosenthal and Bernstein type. Proofs of the new results in Section 2 are deferred to a final Section 5.

Autoregressive processes and their bootstrap analogues
In this section we intend to give a brief introduction to the basic ideas commonly used for verifying weak dependence. Most parts in this section are specialized to autoregressive processes of finite order and their bootstrap analogues.
We consider first a general real-valued stationary process (X t ) t∈Z . A simple and in many cases the most promising way of proving a property of weak dependence is via contraction arguments. For probability distributions P and Q on (R d , B d ) with finite mean, we define the metric For d = 1 and the L 2 instead of the L 1 distance, we obtain Mallows distance; see Mallows (1972). It is well known that such distances are suitable for metrizing weak convergence, that is, d(P n , P ) −→ n→∞ 0 implies P n =⇒ P ; see e.g. Bickel and Freedman (1981). Similar distances have also been used in the context of Markov processes to derive convergence of stationary distributions from convergence of the conditional distributions; see e.g. Dobrushin (1970) and Neumann and Paparoditis (2007). The following lemma shows that closeness of the conditional distributions in the above metric gives rise to estimates for covariances.
Lemma 2.1. Suppose that (X t ) t∈Z is a real-valued stationary process. Furthermore, let s 1 ≤ · · · ≤ s k ≤ t 1 ≤ · · · ≤ t l be arbitrary and let g : This lemma shows that a property of weak dependence follows from a convergence of the conditional distributions as the time gap to the lagged variables tends to infinity. The latter property can often be shown by appropriate coupling arguments. To get reasonably tight bounds for the covariances, one has to construct versions of the process, (X t ) t∈Z and (X ′ t ) t∈Z where the 's k -histories' X s k and X ′ s k are independent but where the corresponding next process values X t and X ′ t are close; see the example below. Note that there is a close connection to the notion of τ -dependence introduced by Dedecker and Prieur (2004). According to their Lemma 5, the infimum of Ed(P Xt|Xs , P X ′ t |X ′ s ) is actually equal to their coefficient τ (t − s). Dedecker and Prieur (2004) used such coupling arguments to derive exponential inequalities and other interesting results, with applications to density estimation.
In the rest of this section we restrict our attention to a real-valued autoregressive process (X t ) t∈Z , which obeys the equation (2.1) The innovations (ε t ) t∈Z are assumed to be independent and identically distributed with Eε t = 0. Furthermore, we make the standard assumption that the characteristic polynomial θ(z) = 1 − θ 1 z − · · · − θ p z p has no zero in the unit circle. It is well known that there exists then a stationary solution to the model equation (2.1). We will assume that the process (X t ) t∈Z is in the stationary regime. Then this process can be represented as a causal linear process, where α k = j≤k k1+···+kj =k θ k1 · · · θ kj . Denote by ξ 1 , . . . , ξ p the roots of the characteristic polynomial θ and let ρ = min{|ξ 1 |, . . . , |ξ p |}. Then, for any ǫ > 0, there exists a K ǫ < ∞ such that, with ρ ǫ = (1 + ǫ)/ρ, see e.g. Brockwell and Davis (1991, p. 85).
Convergence of Ed(P Xt 1 ,...,Xt l |Xs k , P X ′ t 1 ,...,X ′ t l |X ′ s k ) as t 1 − s k → ∞ can now be shown by a simple coupling argument. For this purpose, we consider a second (stationary) version of the autoregressive process, (X ′ t ) t∈Z , where X ′ s k is independent of X s k . Note that (X ′ t ) t∈Z can also be written as a linear process, Independence of X ′ s k and X s k is equivalent to the fact that ε s k , ε s k −1 , . . . and ε ′ s k , ε ′ s k −1 , . . . are independent. On the other hand, we have some freedom to couple the innovations after time s k . Here we only have to take care that both sequences (ε t ) t∈Z and (ε ′ t ) t∈Z consist of independent random variables. A reasonably good coupling is obtained by feeding both processes after time s k with one and the same sequence of innovations, that is, 3), the following assertion: Lemma 2.1 and Lemma 2.2 imply the following weak dependence property.
Corollary 2.1. Suppose that (X t ) t∈Z is a stationary process satisfying the above conditions. Furthermore, let s 1 ≤ · · · ≤ s k ≤ t 1 ≤ · · · ≤ t l be arbitrary and let g : Now we define the autoregressive bootstrap. We assume that observations X 1−p , . . . , X n are available. Let θ n = ( θ n,1 , . . . , θ n,p ) ′ be any consistent estimator of θ = (θ 1 , . . . , θ p ) ′ , that is, θ n P −→ θ, as n → ∞. (The least squares and the Yule-Walker estimator are even √ n-consistent.) Let X t = (X t−1 , . . . , X t−p ) ′ be the vector of the p lagged observations at time t. We define residuals ε t = X t − X ′ t θ n and re-center them as Now we draw independent bootstrap innovations ε * t from the empirical distribution P n given by the ε t . A bootstrap version of the autoregressive process is now obtained as For simplicity, we assume that (X * t ) t∈Z is in its stationary regime. (This will be justified by (i) of the next lemma.) Before we state weak dependence of the bootstrap process, we show that it inherits those properties from the initial process which were used for proving weak dependence.
(i) With a probability tending to 1, (X * t ) t∈Z can be written as a stationary causal linear process, Armed with the basic properties stated in Lemma 2.3, we can now easily derive properties of weak dependence of the bootstrap process just by imitating the proof for the initial process. In complete analogy to Lemma 2.2 above, we can state the following result.
Lemma 2.4. Suppose that the initial process (X t ) t∈Z satisfies the above conditions and that the bootstrap process (X * t ) t∈Z is in its stationary regime. Let (X * ′ t ) t∈Z be another version of the bootstrap process, where X * . For any ǫ > 0, let ρ ǫ = (1 + ǫ)/ρ and K ǫ < ∞ be an appropriate constant. Then there exists a sequence of events Ω n such that P( From Lemma 2.1 and Lemma 2.4 we can now derive the desired property of ψ-weak dependence for the bootstrap process.
Corollary 2.2. Suppose that the conditions of Lemma 2.4 are fulfilled. If the event Ω n occurs, then the following assertion is true: Let s 1 ≤ · · · ≤ s k ≤ t 1 ≤ · · · ≤ t l be arbitrary and let g : R k −→ R and h : Besides the useful property of weak dependence of the bootstrap process, asymptotic validity of a bootstrap approximation requires that the (multivariate) stationary distributions of the bootstrap process converge to those of the initial process. Often, and in the case of the autoregressive bootstrap in particular, one has no direct access to these stationary distributions. However, according to Lemma 4.2 in Neumann and Paparoditis (2007), convergence of the stationary distributions can be derived from an appropriate convergence of conditional distributions. The latter, however, follows directly from θ n P −→ θ and ε * t d −→ ε t . Therefore, consistency of the autoregressive bootstrap can be shown by simple arguments which were already used for proving weak dependence of the bootstrap; for details see Section 4.2 in Neumann and Paparoditis (2007).
Remark 3. Motivated by the desire to have some sort of mixing for a smoothed sieve bootstrap for linear processes, Bickel and Bühlmann (1999) considered a condition called ν-mixing which is similar to the notion of weak dependence in our Definition 1.1. Although strong mixing follows for linear processes from a result of Gorodetskii (1977), it seems to be unclear whether even a smoothed version of the bootstrap process has such a property. However, it was shown in Theorems 3.2 and 3.4 in Bickel and Bühlmann (1999) that it is ν-mixing with polynomial or exponential bounds on the corresponding coefficients to hold in probability. In the proofs of these theorems, however, they make use of the property of decaying strong mixing coefficients which holds at least for sufficiently large time lags; see in particular their Lemma 5.3.
In contrast, the approach described here is fundamentally different. We intend to prove weak dependence for processes driven by innovations with a possibly discrete distribution and achieve this goal by exploiting a contraction property of the initial and the bootstrap process.
Remark 4. Arguing in the same way as above we could also establish the property of ψ-weak dependence for nonlinear autoregressive processes, where (ε t ) t∈Z is a sequence of independent and identically distributed innovations. If Lip m < 1, then we have obviously a contraction property being fulfilled which immediately yields ψ-weak dependence.
It is interesting to note that such a contraction property can still be proved if Lip m < 1 is not fulfilled. To this end, define the local Lipschitz modulus of continuity which implies weak dependence by Lemma 2.1.

Some examples of weakly dependent sequences
Note first that sums of independent weakly dependent processes admit the common weak dependence property where dependence coefficients are the sums of the initial ones. We now provide a non-exhaustive list of weakly dependent sequences with their weak dependence properties. Further examples may be found in Doukhan and Louhichi (1999). Let X = (X t ) t∈Z be a stationary process.
1. If this process is either a Gaussian process or an associated process and lim t→∞ | cov(X 0 , X t )| = 0, then it is a κ-weakly dependent process such 2. ARM A(p, q) processes and more generally causal or non-causal linear processes: X = (X t ) t∈Z are defined by the model equation where (a k ) k∈Z ∈ R Z and (ξ t ) t∈Z is a sequence of independent and identically distributed random variables with Eξ t = 0. If a k = O(|k| −µ ) with µ > 1/2, then X is an η-weakly dependent process with η(r) = O 1 r µ−1/2 . In the general case of dependent innovations, properties of weak dependence are proved in Doukhan and Wintenberger (2007). 3. GARCH(p, q) processes and more generally ARCH(∞) processes: X = (X t ) t∈Z is a such that with a sequence (b k ) k depending on the initial parameters in the case of a GARCH(p, q) process and a sequence (ξ t ) t∈Z of independent and identically distributed innovations. Then, if E(|ξ 0 | m ) < ∞, with the condition of stationarity, ξ 0 2 m · ∞ j=1 |b j | < 1, and if: • there exists C > 0 and µ ∈]0, 1[ such that ∀j ∈ N, 0 ≤ b j ≤ C · µ −j , then X is a θ-weakly dependent process with θ(r) = O(e −c √ r ) and c > 0 (this is the case of GARCH(p, q) processes).

Causal and non-causal Volterra processes write as
Assume ∞ p=0 j 1 < j 2 < · · · < j p j 1 , . . . , j p ∈ Z a j1,...,jp m ξ 0 p m < ∞, with m > 0, and that there exists p 0 ∈ N \ {0} such that a j1,...,jp = 0 for p > p 0 . If a j1,...,jp = O max 1≤i≤p {|j i | −µ } with µ > 0, then X is an η-weakly dependent process with η(r) = O 1 r µ+1 (see Doukhan (2002)). Finite order Volterra processes with dependent inputs are also considered in Doukhan and Wintenberger (2007): again, η-weakly dependent innovations yield η-weak dependence and λ-weakly dependent innovation yields λ-weak dependence of the process. 7. Very general models are the causal or non-causal infinite memory processes X = (X t ) t∈Z such that where the functions F defined either on R N\{0} × R or R Z\{0} × R satisfy with a = j =0 a j < 1. Then, works in progress by Doukhan and Wintenberger as well as Doukhan and Truquet, respectively, prove that a solution of the previous equations is stationary in L m and either θ-weakly dependent or η-weakly dependent with the following decay rate for the coefficients: inf p≥1 a r/p + |j|>p a j .
This provides the same rates as those already mentioned for the cases of ARCH(∞) or LARCH(∞) models.

Some probabilistic results
In this section, we present results derived under weak dependence which are of interest in probability and statistics (see also Dedecker et al. (2007) for reference). This collection clearly shows that this notion of weak dependence, although being more general than mixing, allows one to prove results very similar to those in the mixing case.

Donsker invariance principle
We consider a stationary, zero mean, and real valued sequence (X t ) t∈Z such that µ = E|X 0 | m < ∞, for a real number m > 2. (4.1) We also set

2)
W denotes standard Brownian motion and We now present versions of the Donsker weak invariance principle under weak dependence assumptions.
Remark 5. The result for κ ′ -weak dependence is obtained in Bulinski and Shashkin (2005). Results under κand λ-weak dependences are proved in Doukhan and Wintenberger (2007); note that η-weak dependence implies λweak dependence and the Donsker principle then holds under the same decay rate for the coefficients. The result for θ-weak dependence is due to Dedecker and Doukhan (2003). A few comments on these results are now in order: • The difference of the above conditions under κ and κ ′ assumptions is natural. The observed loss under κ-dependence is explained by the fact that κ ′ -weakly dependent sequences satisfy κ ′ (r) ≥ s≥r κ(s). This simple bound directly follows from the definitions. • Actually, it is enough to assume the θ-weak dependence inequality for any positive integer u and only for v = 1. Hence, for any 1-bounded function g from R u to R and any 1-bounded Lipschitz function h from R to R with Lipschitz coefficient Lip(h), it is enough to assume that the following inequality is fulfilled: cov g X i1 , . . . , X iu , h X iu+i ≤ θ(i)Lip(h), for any u-tuple i 1 ≤ i 2 ≤ · · · ≤ i u .

Empirical process
Let (X t ) t∈Z a real-valued stationary process. We use a quantile transform to obtain that the marginal distribution of this sequence is the uniform law on [0, 1]. The empirical process of the sequence (X t ) t∈Z at time n is defined as Note that E n = n (F n − F ) if F n and F denote the empirical distribution function and the marginal distribution function, respectively. We consider the following convergence result in the Skohorod space D([0, 1]) when the sample size n tends to infinity: Here (B(x)) x∈[0,1] is the dependent analogue of a Brownian bridge, that isB denotes a centered Gaussian process with covariance given by Note that for independent sequences with a marginal distribution function F , this turns intoB(x) = B(x) for some standard Brownian bridge B; this justifies the name of generalized Brownian bridge. We have: Remark 6. Under strong mixing, the condition ∞ r=0 α(r) < ∞ implies convergence of the finite-dimensional distributions. The empirical functional convergence holds if, in addition, for some a > 1, α(r) = O(r −a ) (see Rio (2000)). In an absolutely regular framework, Doukhan, Massart and Rio (1995) obtain the empirical functional convergence when, for some a > 2, β(r) = O(r −1 (log r) −a ). Shao and Yu (1996) and Shao (1995) obtain the empirical functional convergence theorem when the maximal correlation coefficients satisfy the condition ∞ n=0 ρ (2 n ) < ∞.
To prove the result, we introduce the following dependence condition for a stationary sequence (X t ) t∈Z : where F = {x → I 1 s<x≤t , for s, t ∈ [0, 1]}, 0 ≤ t 1 ≤ t 2 ≤ t 3 ≤ t 4 and r = t 3 − t 2 (in this case a weak dependence condition holds for a class of functions R u → R working only with the values u = 1 or 2).
Proposition 4.1. Let (X n ) be a stationary sequence such that (4.5) holds. Assume that there exists ν > 0 such that (4.6) Then the sequence of processes

Central limit theorems
First central limit theorems for weakly dependent sequences were given by Corollary A in Doukhan and Louhichi (1999) and Theorem 1 in Coulon-Prieur and Doukhan (2000). While the former result is for sequences of stationary random variables, the latter one is tailor-made for triangular arrays of asymptotically sparse random variables as they appear with kernel density estimators. Using their notion of ν-mixing Bickel and Bühlmann (1999) proved a CLT for linear processes of infinite order and their (smoothed) bootstrap counterparts. Below we state a central limit theorem for general triangular schemes of weakly dependent random variables. Note that the applicability of a central limit theorem to bootstrap processes requires some robustness in the parameters of the underlying process since these parameters have to be estimated when it comes to the bootstrap. A result for a triangular scheme is therefore appropriate since the involved random variables have themselves random properties. An interesting aspect of the following results is that no moment condition beyond Lindeberg's is required. The proof of the next theorem uses the variant of Rio of the classical Lindeberg method.
(iii) Condition (4.9) is also related to Gordin (1969)'s condition under which central limit theorems are often proved for stationary processes. Such a theorem for a sequence of stationary ergodic random variables was proved by Hall and Heyde (1980, pp. 136-138); see also Esseen and Janson (1985) for the correction of a detail.
The following very simple multivariate central limit theorem, easily applicable to triangular schemes of weakly dependent random vectors, was derived in Bardet, Doukhan, Lang and Ragache (2007). In view of condition (4.11), it is applicable in cases where dependence between the observations declines as n → ∞. This is a common situation in nonparametric curve estimation where the so-called "whitening-by-windowing" principle applies.
Theorem 4.4. (Theorem 1 in Bardet, Doukhan, Lang and Ragache (2007)) Suppose that (X n,k ) k∈N , n ∈ N, is a triangular scheme of zero mean random vectors with values in R d . Assume that there exists a positive definite matrix Σ such that n k=1 Cov(X n,k ) −→ n→∞ Σ and that, for each ǫ > 0, where · denotes the Euclidean norm. Furthermore, we assume the following condition is satisfied: n k=2 cov(e it ′ (Xn,1+···+X n,k−1 ) , e it ′ X n,k ) −→ n→∞ 0. (4.11) Then, as n → ∞, Remark 8. One common point of these two results is the use of the classical Lindeberg assumption. Note that this assumption is often verified by using a higher order moment condition. A main difference between the two results is that the first one yields direct applications to partial sums while the second one is more adapted to triangular arrays where the limit does not write as a sum. In this setting Doukhan and Wintenberger (2007) use Bernstein blocks to prove a CLT for partial sums.

Probability and moment inequalities
In this section we state inequalities of Bernstein and Rosenthal type. In the case of mixing, such inequalities can be easily derived by the well-known technique of replacing dependent blocks of random variables (separated by an appropriate time gap) by independent ones and then using the classical inequalities from the independent case; see for example Doukhan (1994) and Rio (2000). The notion of ψ-weak dependence is particularly suitable for deriving upper estimates for the cumulants of sums of random variables which give rise to rather sharp inequalities of Bernstein and Rosenthal type which are analogous to those in the independent case. Based on a Rosenthal-type inequality, a first inequality of Bernstein-type was obtained by Doukhan and Louhichi (1999), however, with √ t instead of t 2 in the exponent. Dedecker and Prieur (2004) proved a Bennett inequality which can possibly be used to derive also a Bernstein inequality. A first Bernstein inequality with var(X 1 + · · · + X n ) in the asymptotically leading term of the denominator of the exponent has been derived in Kallabis and Neumann (2006), under a weak dependence condition tailor-made for causal processes with an exponential decay of the coefficients of weak dependence. The following result is a generalization which is also applicable to possibly non-causal processes with a not necessarily exponential decay of the coefficients of weak dependence.
We assume that there exist constants K, M, L 1 , L 2 < ∞, µ, ν ≥ 0, and a nonincreasing sequence of real coefficients (ρ(n)) n≥0 such that, for all u-tuples (s 1 , . . . , s u ) and all v-tuples (t 1 , . . . , t v ) with 1 ≤ s 1 ≤ · · · ≤ s u ≤ t 1 ≤ · · · ≤ t v ≤ n the following inequalities are fulfilled: and (4.14) Then, for all t ≥ 0, where A n can be chosen as any number greater than or equal to σ 2 n and A first Rosenthal-type inequality for weakly dependent random variables was derived by Doukhan and Louhichi (1999) via direct expansions of the moments of even order. Unfortunately, the variance of the sum did not explicitly show up in their bound. Using cumulant bounds in conjunction with Leonov and Shiryaev's formula the following tighter moment inequality was obtained in Doukhan and Neumann (2007).
Theorem 4.6. (Theorem 3 in Doukhan and Neumann (2007)) Suppose that X 1 , . . . , X n are real-valued random variables on a probability space (Ω, A, P) with zero mean and let p be a positive integer. We assume that there exist constants K, M < ∞, and a non-increasing sequence of real coefficients (ρ(n)) n≥0 such that, for all u-tuples (s 1 , . . . , s u ) and all v-tuples (t 1 , . . . , t v ) with 1 ≤ s 1 ≤ · · · ≤ s u ≤ t 1 ≤ · · · ≤ t v ≤ n and u + v ≤ p, condition (4.12) is fulfilled. Furthermore, we assume that Then, with Z ∼ N (0, 1), where B p,n = (p!) 2 2 p max 2≤k≤p {ρ p/k k,n }, ρ k,n = n−1 s=0 (s + 1) k−2 ρ(s) and To avoid any misinterpretation, we note that condition (4.12) with u + v ≤ p and E|X i | p−2 ≤ M p−2 only requires finiteness of moments of order p. This is in contrast to the conditions imposed in Theorem 4.5 where in particular all moments of the involved random variables have to be finite.
Using the Markov property we see that which yields the assertion.