Adaptive semiparametric wavelet estimator and goodness-of-fit test for long memory linear processes

This paper is first devoted to study an adaptive wavelet based estimator of the long memory parameter for linear processes in a general semi-parametric frame. This is an extension of Bardet {\it et al.} (2008) which only concerned Gaussian processes. Moreover, the definition of the long memory parameter estimator is modified and asymptotic results are improved even in the Gaussian case. Finally an adaptive goodness-of-fit test is also built and easy to be employed: it is a chi-square type test. Simulations confirm the interesting properties of consistency and robustness of the adaptive estimator and test.


Introduction
The long-memory processes are now a subject area well studied and often applied (see for instance the book edited by Doukhan et al, 2003).The most famous long-memory stationary time series are the fractional Gaussian noises (fGn) with Hurst parameter H and FARIMA(p, d, q) processes.For both these time series, the spectral density f in 0 follows a power law: f (λ) ∼ C λ −2d where H = d + 1/2 in the case of the fGn.This behavior of the spectral density is generally a definition adopted for a stationary long memory (or long range dependent) process even if this definition requires the existence of a second order moment.There are a lot of statistical results relative to the estimation of the long memory parameter d.First and main results in this direction were obtained for parametric models with the essential papers of Fox and Taqqu (1986) and Dahlhaus (1989) for Gaussian time series, Giraitis and Surgailis (1990) for linear processes and Giraitis and Taqqu (1999) for non linear functions of Gaussian processes.However and especially for numerical applications, parametric estimators are not really robust and can induce no consistent estimations.Thus, the research is now rather focused on semiparametric estimators of the d.Different approaches were considered: the famous and seminal R/S statistic (see Hurst, 1951), the log-periodogram estimator (see Moulines and Soulier, 2003), the local Whittle estimator (see Robinson, 1995) or the wavelet based estimator (see Veitch et al, 2003, Moulines et al, 2007 or Bardet et al, 2008).All these estimators require the choice of an auxiliary parameter (frequency bandwidth, scales,...) but adaptive versions of original estimators are generally built for avoiding this choice.In a general semiparametric frame, Giraitis et al (1997) obtained the asymptotic lower bound for the minimax risk of estimating d, expressed as a function of the second order parameter of the spectral density expansion around 0. Thus, several adaptive semiparametric are proved to follow an oracle property up to multiplicative logarithm term.But simulations (see for instance Bardet et al, 2003Bardet et al, or 2008) ) show that the most accurate estimators are local Whittle, global log-periodogram or wavelet based estimators.
The use of a wavelet based estimator for estimating d was first proposed in Abry et al. (1998) after preliminary studies devoted to selfsimilar processes.Then Bardet et al. (2000) provided proofs of the consistency of such an estimator in a Gaussian semiparametric frame.Moulines et al. (2007) improved these results, proved a central limit theorem for the estimator of d and showed that this estimator is rate optimal for the minimax criterion.Finally, Roueff and Taqqu (2009a) established similar results in a semiparametric frame for linear processes.All these papers were obtained using a wavelet analysis based on a discrete multi-resolution wavelet transform, which notably allows to compute the wavelet coefficients with the fast Mallat's algorithm.However, there remains a gap in these papers: in a semiparametric frame the "optimal" scale used for the wavelet analysis is depending on the second order expansion of the spectral density around 0 frequency and these papers consider that the power of the second order expansion is known while this is unknown in practice.Two papers proposed a method for automatically selecting this "optimal" scale in the Gaussian semiparametric frame.Firstly, Veitch et al. (2003) using a kind of Chi-square test which provides convincing numerical results but the consistency of this procedure is not established.Secondly, Bardet et al. (2008) proved the consistency of a procedure for choosing optimal scales based on the detection of the "most linear part" of the log-variogram graph.In this latter article, the "mother" wavelet is not necessary associated to a multi-resolution analysis: the time consuming is clearly more important but a large choice of continuous wavelet transforms can be chosen and the choice of scales is not restricted to be a power of 2.
The present article is devoted to an extension of the article Bardet et al. (2008).Three main improvements are obtained: 1.The semiparametric Gaussian framework of Bardet et al. (2008) is extended to a semiparametric framework for linear processes.The same automatic procedure of the selection of the optimal scale is also applied and this leads to adaptive estimators.Bardet et al. (2008), the "mother" wavelet is not restricted to be associated to a discrete multiresolution transform.Moreover we modified a little the definition of the sample variance of wavelet coefficients (variogram).The result of both these positions is a multidimensional central limit theorem satisfied by the logarithms of variograms with an extremely simple asymptotic covariance matrix (see ( 9)) only depending on d and the Fourier transform of the wavelet function.Hence it is easy to compute an adaptive pseudo-generalized least square estimator (PGLSE) of d which is proved to satisfy a CLT with with an asymptotic variance smaller than the one of the adaptive (respectively non-adaptive) ordinary least square estimator of d respectively considered in Bardet et al. (2008) and Roueff and Taqqu (2009).Simulations confirm confirm the good performance of this PGLSE.

As in
3. Finally, an adaptive goodness-of-fit test can be built from this PGLSE.It consists in a normalized sum of the squared PGLS-distance between between the PGLS-regression line and the points.We prove that this test statistic converges in distribution to a chi-square distribution.Thus it is a very simple test to be computed since the asymptotic covariance matrix is easy to be approximated.When d > 0 this test is a long memory test.Moreover, simulations show that this test provides good properties of consistency under H 0 and reasonable properties of robustness under H 1 .
For all these reasons, we can say that this paper is an achievement of the article Bardet et al. (2008).Moreover, the adaptive PGLS estimator and test represent an interesting extension of the paper Roueff and Taqqu (2009).
We organized the paper as follows.Section 2 contains the assumptions, definitions and a first multidimensional central limit theorem, while Section 3 is devoted to the construction and consistency of the adaptive PGLS estimator and goodness-of-fit test.In Section 4 we illustrate with Monte-Carlo simulations the convergence of the adaptive estimator and we compare these results to those obtained with other efficient semiparametric estimators; moreover we study the consistency and robustness properties of the adaptive goodness-of-fit test.
The proofs are provided in Section 5.

A central limit theorem for the sample variance of wavelet coefficients
For d < 1/2 and d ′ > 0, this paper deals with the following semi-parametric framework: t∈Z is a zero mean stationary linear process, i.e.
As a consequence, the spectral density f of X is such that and satisfies the same kind of expansion than (1).Thus, if d ∈ (0, 1/2) the process X is a long-memory process, and if d ≤ 0 a short memory process (see Doukhan et al., 2003).Now define ψ : R → R the wavelet function.Let k ∈ N * .We consider the following assumption on ψ: Straightforward implications of these assumptions are ψ (j) (0) = ψ (j) (1) = 0 for any 0 ≤ j ≤ k and ψ(u) ∼ C u k (u → 0) with C a real number not depending on u.Define also the Fourier transform of ψ, i.e. ψ(u) := a )Y t dt.However, a process X satisfying Assumption A(d, d ′ ) is a discrete-time process, and from a path (X 1 , . . ., X N ) we define the wavelet coefficients of X by for (a, b) ∈ N * + × Z.Then, Property 1.Under Assumption A(d, d ′ ) with d < 1/2 and d ′ > 0, and if ψ satisfies Assumption Ψ(k) with k > d ′ , then (e(a, k)) b∈{1,...,N −a} is a zero mean stationary linear process and with K (ψ,D) such that The proof of this property, like all the other proofs, is provided in Section 5. Let (X 1 , . . ., X N ) be a sampled path of X satisfying Assumption A(d, d ′ ).Property allows an estimation of 2d from a log-log regression, as soon as a consistent estimator of E(e 2 (a, 0)) is provided.For this and with 1 ≤ a < N , consider the sample variance of the wavelet coefficients, Remark 1.In Bardet et al. (2000), (2008) or in Moulines et al. (2007) or Roueff and Taqqu (2009), the considered sample variance of wavelet coefficients is (with a = 2 j in case of multiresolution analysis).The definition (6) has a drawback and two advantages with respect to this usual definition (7): it is not adapted to the fast Mallat's algorithm and therefore more time consuming, but it leads to more a simple expression of the asymptotic variance and simulations exhibit that this asymptotic variance is smaller that the one obtained with (7).
The following proposition specifies a central limit theorem satisfied by log T N (a), which provides the first step for obtaining the asymptotic properties of the estimator by log-log regression.More generally, the following multidimensional central limit theorem for a vector (log T N (a i )) i can be established, with 3 An adaptive estimator of the memory parameter and an adaptive goodness-of-fit test The CLT of Proposition 1 is very interesting because it has several consequences.We will see that the (quite) simple expression of the asymptotic covariance matrix is an important advantage compared to the complicated expression of the asymptotic covariance obtained in the case of a multiresolution analysis (see Roueff and Taqqu, 2009a).First it allows to obtain an estimator d N of d by using an ordinary least square estimation.Hence, define Remark 2. From Proposition 1, it is not possible to chose (r 1 , . . ., r ℓ ) for minimizing the asymptotic covariance matrix Γ(r 1 , • • • , r ℓ , ψ, d) without knowing the value of d.Hence, in the sequel we will only consider the choice Then, it is clear from Proposition 1 that d N (a N ) converges to d following a central limit theorem with convergence rate However, in practice, d ′ is unknown.In Bardet et al. (2008), an automatic procedure for choosing an "optimal" scale a N has been proposed.We are going to apply again this procedure after recalling its principle: for α ∈ (0, 1), define Q N (α, c, d) corresponds to a squared distance between the ℓ points log(iN α ) , log T N (iN α ) i and a line.It can be minimized first by defining for α ∈ (0, 1) and then define α N by: Remark 3. As it was also claimed in Bardet et al. (2008), in the definition of the set A N , log N can be replaced by any sequence negligible with respect to any power law of N .Hence, in numerical applications we will use 10 log N which significantly increases the precision of α N .
Under the assumptions of Proposition 1, one obtains (see the proof in Bardet et al., 2008), Then define: It is clear that d (a convergence rate can also be found in Bardet et al., 2008) and therefore, from the expression of Γ in ( 9) and its smoothness with respect to the variable d, Thus it is possible to define a (pseudo)-generalized least square estimator (PGLSE) of d.Before this, define For technical reasons (i.e.Pr( 0), which is not satisfied by α N , see Bardet et al., 2008), in the sequel we prefer to consider α N rather than α N .Finally, using the usual expression of PGLSE, the adaptive estimators of c and d can be defined as follows: The following theorem provides the asymptotic behavior of the estimator d N , Remark 4.
1. From Gauss-Markov Theorem it is clear that the asymptotic variance of d N is smaller or equal to the one of d N .Moreover d N satisfies the CLT (13) which provides confidence intervals that are simple to calculate.
2. In the Gaussian case, the adaptive estimator d N converge to d with a rate of convergence rate equal to the minimax rate of convergence N d ′ 1+2d ′ up to a logarithm factor (see Giraitis et al., 1997).Thus, this estimator can be compared to adaptive log-periodogram or local Whittle estimators (see respectively Moulines andSoulier, 2003, andRobinson, 1995).
3. Under additive assumptions on ψ (ψ is supposed to have its first m vanishing moments), the estimator d N can also be applied to a process X with an additive polynomial trend of degree ≤ m − 1.Then the tend is "vanished" by the wavelet function and the value of d N is the same than without this additive trend.Such robustness property is not possible with an adaptive log-periodogram or local Whittle estimator.
Finally it is easy to deduce from the previous pseudo-generalized least square regression an adaptive goodnessof-fit test.It consists on a sum of the PGLS squared distances between the PGLS regression line and the points.More precisely consider the statistic: Then, using the previous results, one obtains: Theorem 2. Under assumptions of Proposition 1, This (adaptive) goodness-of-fit test is therefore very simple to be computed and used.In the case where d > 0, which can be tested easily from Theorem 1, this test can also be seen as a test of long memory for linear processes.ion

Simulations
In the sequel, the numerical consistency and robustness of d N are first investigated.Simulation are realized and the results obtained with the estimator d N are compared to those obtained with the best known semiparametric long-memory estimators.Finally numerical properties of the test statistic T N are also studied.
Remark 5. Note that all the softwares (in Matlab language) used in this section are available with a free access on http://samm.univ-paris1.fr/-Jean-Marc-Bardet.
To begin with, the simulation conditions have to be specified.The results are obtained from 100 generated independent samples of each process belonging to the following "benchmark".The concrete procedures of generation of these processes are obtained from the circulant matrix method in case of Gaussian processes or a truncation of an infinite sum in case of non-Gaussian process (see Doukhan et al., 2003).The simulations are realized for d = 0, 0.1, 0.2, 0.3 and 0.4, for N = 10 3 and 10 4 and the following processes which satisfy Assumption A(d, d ′ ): 1. the fractional Gaussian noise (fGn) of parameter H = d + 1/2 (for 0 ≤ d < 0.5) and σ 2 = 1.A fGn is such that Assumption A(d, 2) holds even if a fGn is generally not studied as a Gaussian linear process; 2. a FARIMA[p, d, q] process with parameter d such that d ∈ [0, 0.5), p, q ∈ N. A FARIMA[p, d, q] process is such that Assumption A(d, 2) holds when Eξ 4 0 < ∞ where ξ 0 is the innovation process.3. the Gaussian stationary process X (d,d ′ ) , such that its spectral density is with d ∈ [0, 0.5) and d ′ ∈ (0, ∞).Therefore the spectral density f 3 is such that Assumption A(d, d ′ ) holds and since X (d,d ′ ) is a Gaussian process, from the Wold decomposition it is also a linear process.
Note that the processes X 4 and X 5 do not satisfy the condition Eξ 4 0 required in Theorems 1 and 2. However, since we consider the logarithm of wavelet coefficient sample variance and not only the wavelet coefficient sample variance, it should be possible to prove the consistency of d N under a condition such as Eξ r 0 with r ≥ 2 and perhaps only r > 0...

Comparison of the wavelet based estimator and other estimators
First let us specify the different choices concerning the wavelet based estimator: Choice of the function ψ: as it was said previously, it is not mandatory to use a wavelet function associated with a multi-resolution analysis.We use here the function Choice of the parameter ℓ: This parameter is important to estimate the "beginning" of the linear part of the graph drawn by points (log(ia N ), log T N (ia N )) 1≤i≤ℓ and therefore the data-driven a N .Moreover this parameter is used for the computation of dN as the number of regression points.We chose a two step procedure: 1. following a numerical study (not detailed here), ℓ = [2 * log(N )] (therefore ℓ = 13 for N = 1000 and ℓ = 18 for N = 10000) seems to be a good choice for the first step: compute α n .
2. for the computation of dN , we first remark that with the chosen function ψ, Γ N does not seem to depend on d.As a consequence we decide to compute for several values of d and ℓ using classical approximations of the integrals defined in Γ(1, The results of these numerical experiments are reported in Figure 2. The conclusion of this numerical experiment is the following: for any d ∈ [0, 0.5[, σ 2 d (ℓ) is almost not depending on d and decreases when ℓ increases.Therefore we chose for this second step ℓ = N 1− αN (log N ) −1 : by this way the larger considered scale is N (log N ) −1 (which is negligible with respect to N and therefore the CLT 8 holds).Simulation results are reported in Table 1.  1 is generally smaller than the one obtained with the estimator defined in Bardet et al. (2008) for two reasons: the choice of the definition (6) of wavelet coefficient sample variance instead of ( 7) and the choice of a PGLS regression instead of a LS regression.

Conclusions from
Comparison of the robustness of the different semiparametric estimators: To conclude with the numerical properties of the estimators, 3 different processes not satisfying Assumption A(d, d ′ ) are considered: • a Gaussian stationary process with a spectral density , but the smoothness condition for f in Assumption A(0, 2) is not satisfied.
• a trended Gaussian FARIMA(0, d, 0) with an additive linear trend ( • a Gaussian FARIMA(0, d, 0) with an additive linear trend and an additive sinusoidal seasonal component of period The results of these simulations are given in Table 2.

Conclusions from Table 2:
The main advantages of d N with respect to d MS and d R are exhibited in this table: it is robust with respect to smooth trends (or seasonality).Note that the sample mean of d MS and d R in the case of processes with trend or with trend and seasonality is almost 0.5

Consistency and robustness of the adaptive goodness-of-fit test:
Tables 1 and 2 provide informations concerning the adaptive goodness-of-fit test.A general conclusion is that the consistency properties of this test are clearly satisfying when N is large enough (N = 1000 seems to be too small for using this goodness-of-fit test).
We also would like to know the behavior of the test statistic under the assumption H 1 .We are going to study the case of a process which does not satisfy either the stationarity condition either the relation (1) also verified by the spectral density.Hence 3 particular cases are considered: 1. a process X denoted MFARIMA and defined as a succession of two independent Gaussian FARIMA processes.More precisely, we consider 2. a process X denoted MGN and defined by the increments of a multifractional Brownian motion (introduced in Peltier and Lévy-Vehel, 1995).Using the harmonizable representation, define Y = (Y t ) t such that where W (dx) is a complex-valued Gaussian noise with variance dx and H(•) is a function (the case H(•) = H with H ∈ (0, 1) is the case of fBm), C(•) i a function.Here we consider the functions H(t) = 0.5 + 0.4 sin(t/10) and C(t) = 1.Then X t = Y t+1 − Y t for t ∈ Z. X is not a stationary process but "locally" behaves as a fGn with a parameter H(t) (therefore depending on t).
this is the frequency of acceptation of the adaptive goodness-of-fit test.Table 3: Robustness of the adaptive goodness-of-fit test with p n = 1 n # T N < q χ 2 (ℓ−2) (0.95) the frequency of acceptation of the adaptive goodness-of-fit test.
3. a process X denoted MFGN and defined by the increments of a multiscale fractional Brownian motion (introduced in Bardet and Bertrand, 2007).Let Z = (Z t ) t be such that where W (dx) is a complex-valued Gaussian noise with variance dx, H(•) and σ(•) are piecewise constant functions.Here we consider the functions H(x) = 0.9 for 0.001 ≤ x ≤ 0.04 and H = 0.1 for 0.04 ≤ x ≤ 3. Then X t = Z t+1 − Z t for t ∈ Z and X is a Gaussian stationary process which can be written as a linear process behaving as a fGn of parameter 0.9 for low frequencies (large time) and as a fGn of parameter 0.1 for high frequencies (small time).
We applied the test statistic to 100 independent replications of both these processes.The results of this simulation are proposed in Table 3.We observed that the processes MGN and MFGN are clearly rejected with the adaptive goodness-of-fit test.However, the test is not able to reject the process MFARIMA which does not satisfy the Assumption of the Theorem 2. The reason is that the test does an average of the behavior of the sample and in the case of changes (it is such the case for MFARIMA) it is the average LRD parameter which is estimated (an average of 0.30 for d N and a standard deviation 0.03 are obtained).

Proofs
First, we will use many times the following lemma: Proof of Lemma 1.This proof is easily established from a mathematical induction on k when λ ∈ [−π, π].
Then since we consider 2π-periodic functions (of λ) the result can be extended to R.
Proof of Property 1. First, it is clear that for a ∈ N * , (e(a, b)) 1≤b≤N −a is a centered linear process.It is a stationary process because X is a stationary process and clearly < ∞.Now following similar computations to those performed in Bardet et al. (2008), we obtain for a ∈ N * , and thus there exists C > 0 (not depending on a) such that for a large enough, E(e 2 (a, 0)) Following the same reasoning, for any n ≥ 0, there exists C(n) > 0 (not depending on a) such that for a large enough, Finally, from Assumption A(d, d ′ ), we obtain the following expansion: using the definition (5) of K (ψ,α) and because lim λ→0 ε(λ) = 0 and applying Lebesgue Theorem.Then, using (18), (19) and (20), we obtain that When k > d ′ , it implies (4).
Proof of Theorem 1.We decompose this proof in 4 steps.First define the normalized wavelet coefficients of X by: and the normalized sample variance of wavelet coefficients: Step 1 We prove in this part that N Cov ( T N (r a N ), T N (r ′ a N )) converges to the asymptotic covariance matrix Γ(ℓ 1 , • • • , r ℓ , ψ, d) defined in (9).First for λ ∈ R, denote Thus, since there are only two nonvanishing cases: using the relation (24).From usual asymptotic behavior of Dirichlet kernel, for g ∈ C 1 2π ((−π, π)), lim Therefore, * Case 2: in such a case, with s 1 = s 2 , Cov ξ s1 ξ s2 , ξ s1 ξ s2 = 1 and using the asymptotic behaviors of two Dirichlet kernels.

Now we have to compute
. In both cases (C 1 and C 2 ), one again obtains a function of a Dirichlet kernel: For a continuous function h : thanks to Lebesgue Theorem and with a/N → 0 (N → 0).Then, from (27), using the same arguments than in Property 1 since a N → ∞ (and therefore a → ∞ and a ′ → ∞).

Then for
with the asymptotic behavior of Dirichlet kernel.Now, in case a/ of Lemma 2 of Giraitis (1985), consider the diagram V 1 = {(1, 1), (2, 1), (3, 1)} and assume that for the rows L j of the array T Then the inequality (39) can be repeated, and on the hyperplane x V1 , a part of the integral (34) provides and the same expressions of α i provided in Giraitis (1985).It remains to bound α i (u).But, with the same approximations as in the proof of Property 1, for a and N large enough For the k − 3 other terms, a result corresponding to Lemma 1 of Giraitis (1985) can also be obtained.Indeed, for a and N large enough, g N,j Therefore, N N log 3 N + N log N , and hence h(u 1 , u 2 ) 2 = o(N a N ).Finally, (34) holds for all γ and it implies (32).If ℓ > 1, the same proof can be repeated from the linearity properties of cumulants.Thus, ( T N (r i a N )) 1≤i≤ℓ satisfies the following central limit: with Γ(r 1 , • • • , r ℓ , ψ, d) = (γ ij ) 1≤i,j≤ℓ given in (9).
Step 3 Now we extend the central limit obtained in Step 2 for linear processes with an innovation distribution satisfying a Cramèr condition (E e rξ0 < ∞) to the weaker condition Eξ 4 0 < ∞ using a truncation procedure.Thus assume now that Eξ We have E e + (r i a N , b)) 2 = s∈Z β 2 a (s) E(ξ + 0 ) 2 = E(ξ + 0 ) 2 from previous arguments and since we assume that the distribution of ξ 0 is symmetric.But using Hölder's and Markov's inequalities E(ξ + 0 ) 2 ≤ (Eξ
95) : this is the frequency of acceptation of the adaptive goodness-of-fit test.

Table 1 :
Bardet et al., 2008)timator d N numerically shows a convincing convergence rate with respect to the other estimators.Both the "spectral" estimator d R and d MS provide more stable results almost not sensible to d and the flatness of the spectral density of the process, while the convergence rate of the wavelet based estimator d N is more dependent on the spectral density of the process.But, especially in cases of "smooth" spectral densities (fGn and FARIMA(0, d, 0)), d N is a very accurate semiparametric estimator and is globally more efficient than the other estimators.Remark 6.InBardet et al. (2008)we also compared two adaptive wavelet based estimators (the one defined inVeitch et al., 2003and the one defined inBardet et al., 2008)with d MS and d R (and also with two others defined in Giraitis et al., 2000, and Giraitis et al., 2006, which exhibit worse numerical properties of consistency).We observe that √ M SE of d N obtained in Table

Table 1 :
Comparison of the different long-memory parameter estimators for processes of the benchmark.For each process and value of d and N , √ M SE are computed from 100 independent generated samples.Here

Table 2 :
Robustness of the different long-memory parameter estimators.For each process and value of d and N , √ M SE are computed from 100 independent generated samples.Here