Data-driven semi-parametric detection of multiple changes in long-range dependent processes

This paper is devoted to the offline multiple changes detection for long-range dependence processes. The observations are supposed to satisfy a semi-parametric long-range dependence assumption with distinct memory parameters on each stage. A penalized local Whittle contrast is considered for estimating all the parameters, notably the number of changes. The consistency as well as convergence rates are obtained. Monte-Carlo experiments exhibit the accuracy of the estimators. They also show that the estimation of the number of breaks is improved by using a data-driven slope heuristic procedure of choice of the penalization parameter.


Introduction
There exists now a very large literature devoted to long-range dependent processes.The most commonly used definition of long-range dependency requires a second order stationary process X = (X n ) n∈Z with spectral density f such as: where L is a positive slow varying function, satisfying for any c > 0, lim h→0 L(c |h|) L(|h|) = 1, typically L is a function with a positive limit or a logarithm.From an observed trajectory (X 1 , . . ., X n ) of a long-range dependent process, the estimation of the parameter d is an interesting statistical question.The case of a parametric estimator for which the explicit expression of the spectral density f is known, was successively solved in many cases using maximum likelihood estimators (see for instance Dahlhaus, 1989) or Whittle estimators (see for instance Fox and Taqqu, 1987, Giraitis and Surgailis, 1990, or Giraitis and Taqqu, 1999).However, with numerical applications in view, knowing the explicit form of the spectral density is not a realistic framework.A semi-parametric estimation of d where only the behaviour (1.1) is assumed should be preferred.Thus, numerous semi-parametric estimators of d were defined and studied, the main ones being the log-periodogram (see Geweke andPorter Hudak, 1987, or Robinson, 1995a), the wavelet based (see Bardet et al., 2000) and the local Whittle estimators (see Robinson 1995b).This last one is a version of the Whittle estimator for which only asymptotically small frequencies are considered.It provides certainly the best trade-off between computation time and accuracy of the estimation (see for instance Bardet et al., 2003b).Its asymptotic normality was extended for numerous kinds of long-memory processes (see Dalla et al., 2006) and also non-stationary processes (see Abadir et al., 2006).However there is still not satisfactory adaptive method of choice of the bandwidth parameter even if several interesting attempts have been developed (see for instance Henry andRobinson, 1998, or Henry, 2007).Hence, the usual choice valid for FARIMA or Fractional Gaussian noise is commonly chosen.
In this paper we consider the classical framework of offline multiple change detection.It consists on the observed trajectory (X 1 , . . ., X n ) of a process X whose trajectory is partitioned into K * + 1 subtrajectories on which it is a linear long memory process whose long memory parameters are distinct from one area to another (see a more precise definition in (2.4)).Thus there is dependence between two subtrajectories since all the different linear processes are constructed from the same white noise.The aim of this paper is to present a method for estimating from (X 1 , . . ., X n ) the number K * of abrupt changes, the K * change-times (t * 1 , . . ., t * K * ) and the K * + 1 different long-memory parameters (d * 1 , . . ., d * K * +1 ), which are unknown.The framework of "offline" multiple changes we chose, has to be distinguished from that of the "online" one, for which a monitoring procedure is adopted and test of detection of change is successively applied (such as CUSUM procedure).The book of Basseville and Nikiforov (1993) is a good reference for an introduction on both online and offline methods.There exist several methods for building a sequential detector of long-range memory, see for instance Giraitis et al. (2001), Kokoszka and Leipus (2003) or Lavancier et al. (2013).For our offline framework, following the previous purposes, we chose to build a penalized contrast based on a sum successive local Whittle contrasts and to minimize it.The principle of this method, minimizing a penalized contrast, provides very convincing results in many frameworks: in case of mean changes with least squares contrast (see Bai, 1998), in case of linear models changes with least squares contrast (see Bai and Perron, 1998, generalized by Lavielle, 1999, andLavielle andMoulines, 2000) or least absolute deviations (see Bai, 1998), in case of spectral densities changes with usual Whittle contrasts (see Lavielle and Ludena, 2000), in case of time series changes with quasi-maximum likelihood (see Bardet et al., 2012),... Clearly, the remarkable paper of Lavielle and Ludena (2000) was the model of this article except that we used a semi-parametric version of their Whittle contrast with the local Whittle contrast, and this engenders additional difficulties... Restricting our paper to long-memory linear processes, we obtained several asymptotic results.First the consistency of the estimator has been established under assumptions on the second order term of the expansion of the spectral density close to 0. A convergence rate of the change times estimators is also provided, but we are not able to reach the usual O P (1) converge rate, which is obtained for instance in the parametric case (see Lavielle and Ludena, 2000).Monte-Carlo experiments illustrate the consistency of the estimators.When the number of changes is known, the theoretical results concerning the consistencies of the estimator are satisfying and n = 5000 provides very convincing results while they are still mediocre for n = 2000 and bad for n = 500.This is not surprising since we considered a semi-parametric statistical framework.When the number of changes is unknown, although we chose an asymptotically consistent choice of penalization sequence, the consistency is not satisfying even for large sample such as n = 5000.The accuracy of the number of changes estimator is extremely dependent on the precise choice of the penalization sequence, even if this choice should not be important asymptotically.Then we chose to use a data-driven procedure for computing "optimal" penalty, the so-called "Slope Heuristic" procedure defined in Arlot and Massart (2009).It provides more accurate results than with a fixed penalization sequence and it leads to very convincing results when n = 5000.
The following Section 2 is devoted to define the framework and the estimator.Its asymptotic properties are studied in Section 3. The concrete estimation procedure and numerical applications are presented in Section 4. Finally, Section 5 contains the main proofs.

The multiple changes framework
We consider in the sequel the case of multiple change long-range dependent linear processes.First we define a class L(d, β, c) of real sequences, where d ∈ [0, 1/2), β ∈ (0, 2] and c > 0: Note that the class L(d, β, c) is included in 2 (R), the Hilbert space of square summable sequences.Now, for (a i ) i∈N ∈ R n a sequence of the class L(d, β, c), it is possible to define a second order linear long-range dependent process.Indeed, with (ε t ) t∈Z a sequence of independent and identically distributed random variables (iidrv) with zero mean and unit variance, we can define Y = (Y k ) k∈Z such as Note that Y is a zero mean stationary process, with autocovariance r(k with B(u, v) the usual Beta function (see for instance Inoue, 1997).It is also possible to define the spectral density f of Y in [−π, 0) ∪ (0, π] and it satisfies for d ∈ (0, 1/2) using the Tauberian Theorem in Zygmund (1968) and with Γ(u) the usual Gamma function.By the way, we can also write that there exists c > 0 such as that is the classical assumption required for instance in Robinson (1995b).
Using these definitions, we are going to give the following assumption satisfied by the trajectory (X 1 , . . ., X n ) of the process X from we study the changes: Assumption A: Let (ε t ) t∈Z be a sequence of iidrv with zero mean and unit variance.Denote also: Define the process X = (X t ) 1≤t≤n such as The first condition (2.4) is relative to the behavior (X t ) in each stage: it is a stationary linear long-range process with a spectral density satisfying (2.3) (where d = d * i ).Moreover there also exists a dependence for (X t ) from one stage to another one (see for instance the proof of Lemma 5.2 where the covariance between two subtrajectories of X is computed in (5.13)), which makes the model much more realistic than if the independence of successive regimes had been assumed.The second condition (2.5) is the key condition insuring that the framework is the one of multiple long-range dependence change.

Definition of the estimator
First we will add other notation: For X satisfying Assumption A, denote: More generally, for For K ∈ {0, . . ., n}, we will also use the following multidimensional notation: From Assumption A, denote by I T the periodogram of X on the set T where T ⊂ {1, . . ., n}, and denote |T | = #{T }: Using the seminal papers of Kunsh (1987), Robinson (1995b) and Robinson and Henry (2003), we define a local Whittle estimator of d.For this, define for Remark 1.Note that we use Fourier frequencies λ ), while its common definition (see for instance Robinson, 1995b) consider the Fourier frequencies λ k = 2π k |T | .The explanation of this choice stems from the fact that in the definition of the following contrast L n (K, t, d, m) on the whole trajectory (X 1 , . . ., X n ) we will sum the local contrasts W n (T k , d k , m).This choice is required for allowing some simplifications in the proofs.But, as we assume that )n, we asymptotically use almost the usual frequencies.
Under Assumption A, we expect to estimate the distinct d * i on the different stages {t * i + 1, . . ., t * i+1 } by using several local Whittle contrasts.In addition we will obtaining a M -estimator for estimating d * i but also t * i and even K * .Hence, for m ∈ {1, . . ., n}, we consider now a penalized local Whittle contrast defined by: where K ∈ N is a number of changes, d ∈ [0, 0.5) K+1 , t ∈ T K (0) and (z n ) is a sequence of positive real numbers that will be specified in the sequel.This contrast is therefore a sum of local Whittle objective functions on the K + 1 different stages T k , k = 1, . . ., K + 1, and a penalty term that is a linear function of the number of changes (and therefore of the number of estimated parameters).Then, with K max ∈ N * a chosen integer number, we define: ) , and where for a ≥ 0, 3. Asymptotic behaviors of the estimators

Case of a known number of changes
We study first the case of a known number K * of changes.In such a framework, let us define two particular cases of the minimization of the function obtained when the number of changes is known and d * = ( d * i ) 1≤i≤K * +1 obtained when the number of changes and the change dates are known.They are defined by: Then, we can prove: This first theorem, whose proof as well as all other proofs can be found in Section 5, can be improved for specifying the rate of convergence of the estimators: This result provides a bound of the "best" convergence rate of t which is minimized by n (1+β * )/(1+2β * ) , i.e. the "best" convergence rate for τ is minimized by n −β * /(1+2β * ) .
Remark 2. This rate of convergence could be compared to the result obtained in the parametric framework of Lavielle and Ludena (2000) where the respective convergence rates (in probability) of t and τ are 1 and n −1 .This is the price to pay for going from the parametric to the semi-parametric framework.But also the price to pay to the definition of local Whittle estimator which does not allow some simplifications as in the proof of Theorem 3.4 of Lavielle and Ludena (2000, p. 860).Indeed the random term of their classical used definition of Whittle contrast is : the logarithm term does not make possible their simplifications.
Another consequence of this result is that there is asymptotically a small lose on the convergence rates of the long memerory parameter local Whittle estimators d i when the change dates are estimated instead of being known.More formally, using the results of Robinson (1995b) improved by Dalla et al. (2006), we know that under conditions of Theorem 3.2, Unfortunately, the rate of convergence obtained for t i in Theorem 3.2 does not allow to keep this limit theorem when d * i is replaced by d i .We rather obtain: Theorem 3.3.Under the assumptions of Theorem 3.2, for any

Case of an unknown number of changes
Here we consider the case where K * is unknown.For estimating K * , the penalty term of penalized local Whittle contrast J n is now essantial.Indeed, we obtain: Theorem 3.4.Under the assumptions and notations of Theorem 3.1, if Note that the conditions we obtained on m and z n imply that n −β * /(1+2β * ) = o(z n ), depending on β * that is generally unknown.However, the choice z n = n −1/2 is a possible choice solving this problem.The provided proof does not allow to establish the consistency of a typical BIC criterion, which should be z n = 2 log n/n (and the forthcoming numerical results obtained using this BIC penalty are not surprisingly a disaster).Then the convergence rates of the estimators obtained in the case where the number of changes is unknown is the same as if the number of changes is known.

Numerical experiments
In the sequel we first describe the concrete procedure for applying the new multiple changes estimator, then we present the numerical results of Monte-Carlo experiments.

Concrete procedure of estimation
Several details require to be specified to concretely apply the multiple changes estimator.Indeed, we have done: 1.The choice of meta-parameters: 1/ as we mainly studied the cases of FARIMA processes for which β = 2, we chose m = n 0.65 ; 2/ the number K max ≥ K * is crucial for the heuristic plot procedure (see below) and was chosen such as Then for K > K * , the decreasing of this contrast with respect to K is almost linear with a slope s (see Figure 1 where the linearity can be observed when K > K * = 4), which can be estimated for instance by a leastsquares estimator s.Then K H is obtained by minimizing the penalized contrast J n using z n = 2 s, i.e.
By construction, the procedure is sensitive to the choice of K max since a least squares regression is realized for the "largest" values of K and we preferred to chose the largest reasonable value of K max .
A software was written with Octave software (also executable with Matlab software) and is available on http://samm.univ-paris1.fr/IMG/zip/detectchange.zip.

Monte-Carlo experiments in case of known number of changes
In the sequel we first exhibit the consistency of the multiple breaks estimator when the number of changes is known.Monte-Carlo experiments are realized in the following framework: 1. Three kinds of processes are considered: a FARIMA(0, d, 0) process, a FARIMA(1, d, 1) process with a AR coefficient ψ = −0.7 and a MA coefficient θ = 0.3 (this refers to the familiar representation where B is the backward operator) and a linear stationary process called X (d,1) belonging to Class L(d, 1, 1), since we chose a sequence (a k ) k∈N satisfying Note that both the FARIMA processes belongs to Class L(d, 2, c 0 ).The results of Monte-Carlo experiments are detailed in Table 1.

Monte-Carlo experiments in case of unknown number of changes
In this subsection, we consider the result of the model selection using the penalized contrast for estimating the number of changes K * .We reply exactly the same framework that in the previous subsection and notify the frequencies of the event ' K = K * ', for: √ n; • K = K BIC obtained directly by minimizing J n with z n = 2 log n/n, following the usual BIC procedure; • K = K H obtained from the "Heuristic Slope" procedure described previously.
We obtained the results detailed in Table 2: From Tables 1 and 2, we may conclude that: 1.Even using the local Whittle estimator which is probably the most accurate in this framework, it is easy to verify that if the behaviour of the spectral density in 0 is not smooth, then even with a trajectory of size 5000, we keep a quadratic risk greater than 0.1 (see the case K * = 0 for a FARIMA(1, d, 1) or for the X (d,1) process).We do not have to forget that the parameter d is relative to the long memory behaviour of the process, in a semi-parameteric setting.
2. If the number of changes is known, the estimators of τ i and d i are consistent but their rates of convergence are slightly impacted by the number of changes: as we could imagine, the largest K * the largest the RMSE of the estimators.But finally, the case n = 5000 provides extremely convincing results in FARIMA framework concerning the estimation of τ i , while the convergence rates for the process X (d,1) are slow (since the asymptotic behavior of the spectral density around 0 is clearly rougher than in FARIMA framework).
3. The estimators of number of changes K n and K H have a satisfying behavior, meaning that they seem to converge to K * when the sample length increases in the FARIMA framework.Once again, the consistencies are slightly better for small K * than for large K * .The results obtained with the "Slope Heuristic" procedure estimator K H are almost the most accurate and provides very convincing results for n = 5000.Note also that the usual BIC penalty is not at all consistent, which can be explained by the use of local Whittle contrast that is not an approximation of the Gaussian likelihood as the usual Whittle contrast is.In case of process X (d,1) , only K H seems to be consistent while K n is not able to detect the number of changes: this is due to the fact that the bandwidth parameter m can not be chosen as n 0.65 for obtaining consistent estimators of long memory parameters.
Finally we could underline that our detector based on a local Whittle contrast added to a "slope heuristic" data-driven penalization provides convincing results when n = 5000 and not too bad when n = 2000 (the case n = 500 gives not significant estimation).

Proofs
Following the expansion (2.2), we denote in the sequel for i = 1, . . ., K * + 1, We first provide the statements and the proofs of two useful lemmas: Lemma 5.1.Under the assumptions of Theorem 3.1 and with S n (T, d, m) defined in (2.8), for any i ∈ {1, . . ., K * + 1} and T ⊂ T * i , Proof.In the sequel, we will use intensively the notation and numerous proofs of Dalla et al. (2006).However, the results obtained in this paper have to be established again since, we consider λ We first define η * j = I T (λ where C > 0 is a constant.For this we will go back to the proof of Proposition 5 in Dalla et al. (2006).Indeed, with the same notation, we have: where As in Proposition 5 of Dalla et al. (2006), we can write: and therefore (5.4) Now, following also in Proposition 5 of Dalla et al. (2006), from Robinson (1995b, Relation (3.17)), adapted with our problem, i.e. j ↔ jT /n we have: (5.5) Finally, we have to go back to the proof of (4.9) in Theorem 2 of Robinson (1995b) for bounding p |T |,1 (m).Indeed, in this proof and using its notation we have As a consequence we deduce: (5.6) Finally, using (5.4), (5.5) and (5.6), we deduce: and therefore (5.
i η * j , we deduce that for any For small N , for instance such as N = o(n/m), the random right side term is not bounded.However, for any T ⊂ T * i , we have E I T (λ Thus, there exists C i > 0 such as for any δ > 0, P sup (5.9) Thus we deduce (5.2) and this achieves the proof of Lemma 5.1.
In the sequel, we define: Note that S n (T, d, m), which is defined in (2.8) can also be written as: (5.11) The following lemma establish an asymptotic bound for R n when T and T are included in distinct stages of the process: (5.12) Proof.First, we can bound the covariance Cov(X t , X t ) with t ∈ T ⊂ T j and t ∈ T ⊂ T j , where j = j .Indeed, assuming since (ε i ) is supposed to be white noise with unit variance.Therefore, since a , there exists C such as As a consequence, there exist C > 0 and C > 0 such that for t > t, (5.13) Now, using (5.10) and (5.13), we have: The right side term of the previous equality is only depending on (t − t).Therefore, using the notations But from usual calculations, for any d ∈ [−1/2, 1/2), there exists C(d) > 0 such as we have As a consequence, if µ + ν ≤ n/m, we obtain: And when µ ≥ n/m, we can write: log(µ) Finally, by performing the same type of calculations several times, we obtain: log(µ) Now we are going to bound Var R n (T, T , d, m) .We have: Without loss of generality, set t ≤ s < t ≤ s .We have: Using the Cauchy-Schwarz Inequality, we have Now we apply the same trick as in (5.13) and obtain since s > t , and more generally, with .
As a consequence, we can easily see that J 1 is negligible with respect to J 2 since 2d − 3/2 < 2d − 1.
Concerning J 2 we use the same arguments than in Lavielle and Ludena (2000).Then, As a consequence, using (5.Moreover, after classical computations.From (5.19) and (5.20), we obtain: (5.21) Using the same decomposition of J 2 but beginning with s , t ∈ T instead of s, t ∈ T , we can also replace T by T in the previous bound.As a consequence, we obtain: Finally using symmetry reasons we also have 2 and therefore: with C > 0 that achieves the proof of (5.12) using Lemma 2.2 and 2.4 in Lavielle and Ludena (2000).
Now the proof of the consistency of τ can be established: Proof of Theorem 3.1.Mutatis mutandis, we follow here a similar proof than in Lavielle and Ludena (2000).Denote where J n is defined in (2.9).Then, using (5.11), we can write that for any d and t, Now using a decomposition of each S n on the 'true' periods, we can write: with R n defined in (5.10).As a consequence, using the concavity of x → log(x) and with n k = K * +1 j=1 n kj .Now we are going to use Lemma 5.1 and 5.2.Therefore: (5.28) Now, simple computations also imply and t − t * ∞ = max 1≤k≤K * |t k − t * k | .Therefore, it is also possible to write that for any δ > 0, ) which is negligible with respect to O P m −1/2 .Therefore, (5.32) becomes: (5.34)Then, from (5.33), we obtain after computations, Assume K < K * .Then, for any t and d, and using (5.27),Assume K * < K ≤ K max .With t = ( t 1 , . . ., t K ), there exists some subset {k j , 1 ≤ j ≤ K * } of {1, . . ., K} such that for any j = 1, . . ., K * , To see this, consider the ( t k j ) as the closest times among ( t 1 , . . ., t K ) to the (t * 1 , . . ., t * K * ).The other K − K * change dates t i could be consider exactly as additional "false" changes (since the parameters d do not change at these times) and therefore the t k j minimize J n (K, t, d, m) conditionally to those t i with i / ∈ {k 1 , . . ., k K * as if the number of changes is known and is K * .And therefore Theorem 3.2 holds for those t k j .
Then using the previous expansions detailed in the previous proofs, we obtain This achieves the proof.
Proof of Corollary 1.The results are easily obtained by considering conditional probability with respect to the event K = K * .

. 8 )
The local Whittle objective function d → W n (T, d, m) can be minimized for estimating d on the set T providing the local Whittle estimator d = arg min d∈[0,0.5)W n (T, d, m) on T .
10, 12 and 14 respectively for n = 500, 2000 and 5000.2.As the choice of the sequence (z n ) of the penalty term is not exactly specified but just has to satisfy max z n ,1The dynamic programming procedure is implemented for allowing a significant decrease of the time consuming.Such procedure is very common in the offline multiple change context and has been described with details in Kay (1998).4. For improving the procedure of selection of the changes number K Bardet et al. (2012)009)es, we implemented a data-driven procedure so-called "the heuristic slop procedure".This procedure was introduced byArlot and Massart (2009)in the framework of least squares estimation with fixed design, but that can be extended in many statistical fields (seeBaudry et al., 2012).Applications in the multiple changes detection problem was already successfully done inBaudry et al. (2012)in an i.i.d.context and also for dependent time series inBardet et al. (2012).In a general framework, it consists in computing −2 log( LIK(K)) where LIK(K) is the maximized likelihood for any K ∈ {0, 1, . . ., K max }.Here −2 log( LIK(K)) is replaced by

Table 1 :
RMSE of the estimators from 500 independent replications of processes, when the number K * of changes is known.

Table 2 :
Frequencies of recognition of the true number of changes with several criteria from 500 independent replications of processes.