A distribution free test for changes in the trend function of locally stationary processes

In the common time series model $X_{i,n} = \mu (i/n) + \varepsilon_{i,n}$ with non-stationary errors we consider the problem of detecting a significant deviation of the mean function $\mu$ from a benchmark $g (\mu )$ (such as the initial value $\mu (0)$ or the average trend $\int_{0}^{1} \mu (t) dt$). The problem is motivated by a more realistic modelling of change point analysis, where one is interested in identifying relevant deviations in a smoothly varying sequence of means $ (\mu (i/n))_{i =1,\ldots ,n }$ and cannot assume that the sequence is piecewise constant. A test for this type of hypotheses is developed using an appropriate estimator for the integrated squared deviation of the mean function and the threshold. By a new concept of self-normalization adapted to non-stationary processes an asymptotically pivotal test for the hypothesis of a relevant deviation is constructed. The results are illustrated by means of a simulation study and a data example.


Introduction
Within the last decades, the detection of structural breaks in time series has become a very active area of research with many applications in fields like climatology, economics, engineering, genomics, hydrology, etc. (see Aue and Horváth, 2013;Jandhyala et al., 2013;Woodall and Montgomery, 2014;Sharma et al., 2016;Chakraborti and Graham, 2019;Truong et al., 2020, among many others). In the simplest case, one is interested in detecting structural breaks in the sequence of means (µ i ) i=1,...,n = (µ(i/n)) n∈N of a time series (X i,n ) i=1,...,n corresponding to a location model of the form X i,n = µ(i/n) + ε i,n , i = 1, . . . , n .
(1.1) A large part of the literature considers the problem of detecting changes in a piecewise constant mean function µ : [0, 1] → R, where early references assume the existence of at most one change point (see, e. g. Priestley and Subba Rao, 1969;Wolfe and Schechtman, 1984;Horváth et al., 1999, among others) and more recent literature investigates multiple change points (see, e.g. Frick et al., 2014;Fryzlewicz, 2018;Dette et al., 2020;Baranowski et al., 2019, among many others). The errors (ε i,n ) i=1,...,n in model (1.1) are usually assumed to form at least a stationary process and many theoretical results for detecting multiple change points are only available for independent identically distributed error processes. These assumptions simplify the statistical analysis of structural breaks substantially, as -after removing the piecewise constant trend -one can work under the assumption of a stationary or an independent identically distributed error process and smoothing is not necessary to estimate the trend function.
On the other hand, the assumption of a strictly piecewise constant mean function might not be realistic in many situations and it might be more reasonable to assume that µ varies smoothly rather than abrupt. A typical example is temperature data (see, e. g. Karl et al., 1995;Collins et al., 2000) where it might be of more interest to investigate whether the mean function deviates fundamentally from a given benchmark denoted by g (µ). Here g is a functional of the mean function, such as the value at the point 0, that is g(µ) = µ(0), or an average over a certain time period, that is (1.2) (see Section 2 for more details). Moreover, there also exist many time series exhibiting a non-stationary behaviour in the higher order moments and dependence structure (see Stărică and Granger, 2005;Elsner et al., 2008;Guillaumin et al., 2017, among others), and the detection of fundamental deviations from a benchmark in a sequence of gradually changing means under the assumption of a location model with a stationary error process might be misleading.
In this paper we propose a distribution free test for relevant deviations of the mean function µ from a given benchmark g (µ) in a location scale model of the form (1.1) with a non-stationary error process. More precisely, for some pre-specified threshold ∆ > 0 we are interested in testing the hypotheses where τ is an appropriate measure on the interval [0, 1] chosen by the statistician. This means that we are looking for "substantial" deviations of the mean function from a given benchmark g (µ) in an L 2 -sense. The choice of the threshold depends on the particular application and is related to a balance between bias and variance as the detection of deviations from a (constant) mean often results in an adaptation of the statistical analysis (for example in forecasting). As such an analysis is performed "locally", resulting estimators will have a smaller bias but a larger variance. However, if the changes in the signal are only weak, such an adaptation might not be necessary because a potential decrease in bias might be overcompensated by an increase of variance.
In principle, a test for the hypotheses in (1.3) could be developed using a nonparametric estimate of the mean function µ to obtain an estimate, sayd 0 , of the distance d 0 . The null hypothesis in (1.3) is then rejected for large values ofd 0 . However, the distribution of the test statistic will depend in an intricate way on the dependence structure of the non-stationary error process in model (1.1), which is difficult to estimate. To address this problem we will introduce a new concept of self-normalization and construct an (asymptotically) pivotal test statistic for the hypotheses in (1.3). The basic idea of our approach is to permute the data and consider the partial sum process of this permutation, thus, taking into account observations over the whole interval rather than only the first observations. The new concept and the asymptotic properties of the standardized statistic can be found in Section 3, while some details on the testing problem and mathematical background on locally stationary processes are introduced in Section 2. In Section 4 we investigate the finite sample properties of the proposed testing procedure by means of a simulation study and provide an application to temperature data. Finally, in Section A, the proofs of the theoretical results in Section 3 are presented.
1.1. Related literature. Despite of its importance the problem of detecting relevant deviations in a sequence of gradually changing means has only been considered by a few authors. Dette and Wu (2019) investigate a mass excess approach for this problem. More precisely, these authors measure deviations from the benchmark by the Lebesgue measure of the set {t ∈ [0, 1] : |µ(t) − g(µ)| > ∆} and test whether this quantity exceeds a certain threshold c > 0. Their approach requires estimation of the local long-run variance and multiplier bootstrap. More recently, Bücher et al. (2020) propose the maximal distance to measure relevant deviations from the benchmark and consider the null hypothesis H 0 : sup t∈[0,1] |µ(t) − g(µ)| ≤ ∆. While the maximum deviation might be easy to interpret for practitioners, the asymptotic analysis of a corresponding estimate is challenging. In particular it requires an estimation of the long-run variance and additionally the estimation of the sets, where the absolute difference |µ(t) − g(µ)| attains its sup-norm. The methodology proposed here avoids the problem of estimating tuning parameters of this type using an L 2 -norm in combination with a new concept of self-normalization.
Ratio statistics or self-normalization have been introduced by Horváth et al. (2008) and Shao (2010) in the context of change point detection in stationary processes and avoid a direct estimation of the long-run variance through a convenient rescaling of the test statistic. The currently available self-normalization procedures are based on partial sum processes (see Shao, 2015, for a recent review), which usually (under the assumption of stationarity) have a limiting process of the form {σW (λ)} λ∈ [0,1] , where {W (λ)} λ∈ [0,1] is a known stochastic process and σ an unknown factor encapsulating the dependency structure of the underlying process. In this case the factorisation of the limit into the long-run variance and a probabilistic term is used to construct a pivotal test statistic by forming a ratio such that the factor σ in the numerator and denominator cancels. However, in the case of non-stationarity, the situation is more complicated, because the limiting process is of the form λ 0 σ(u)dW (u) λ∈ [0,1] such that the probabilistic and the part representing the dependence structure cannot be separated. Zhao and Li (2013) and Rho and Shao (2015) discuss in fact these problems in the context of locally stationary time series, but the proposed self-normalizations need to be combined with a wild bootstrap. In this paper, we present a full self-normalization procedure for non-stationary time series, which might be also useful for testing classical hypotheses.

Relevant deviations in a sequence of gradually changing means.
Recall the definition of model (1.1) and the hypotheses (1.3). Different benchmarks may be of interest in applications. For example, if one is interested in deviations from the value of the mean function at a given time, say t ∈ [0, 1], one could choose g(µ) = µ(t), while relevant deviations from an average over a certain time period are obtained for the choice (1.2). In particular if t 0 , t 1 and τ are chosen 0, 1 and the Lebesgue measure, respectively, one compares the local mean µ(x) with the overall mean g(µ) =μ = 1 0 µ(y)dy and the hypotheses in (1.3) read as follows The tests which will be developed in this paper are based on an appropriate estimate of the quantity for which we require precise estimates of the mean function µ and the threshold g(µ).
Note that the measure τ in (2.1) is chosen by the statistician and therefore known.
Throughout this paper, we assume that τ is absolutely continuous with respect to the Lebesgue measure and has a piecewise continuous density, say f τ . Further, we assume that the mean function µ is sufficiently smooth, as specified in the following assumption.
A natural idea for the construction of a test of the hypotheses (1.3) is to estimate the L 2 -distance d 0 as defined in (2.1) and to reject the null hypothesis for large values of the corresponding estimate. For this purpose one can use the local linear estimator, which is defined as the first coordinate of the vector to estimate the mean function µ locally (see, for example Fan and Gijbels, 1996). In order to reduce the bias we consider the Jackknife estimatoř as proposed by Schucany and Sommers (1977) and obtain an estimateǧ n = g(μ hn ) of the threshold g(µ) (other estimates could be used as well). Here h n is a positive bandwidth satisfying h n = o(1) as n → ∞, K h (·) = K( · h ) and K denotes a kernel function satisfying the following assumption. The estimate of d 0 can then be defined as To study the asymptotic properties of the statistic defined in (2.4) and alternative estimates proposed in this paper (see Section 3 for more details) we require several assumptions regarding the dependency structure of the error process in model (1.1), which will be discussed next.

Locally stationary processes.
For the proofs of our main results we require several assumption on the dependence structure of the non-stationary time series defined in (1.1). In the following, we work with the notion of local stationarity as introduced by Zhou and Wu (2009). To be precise, let η = (η i ) i∈Z be a sequence of independent identically distributed random variables and let (η ) = (η i ) i∈Z be an independent copy of η. Further, define F i = {η k : k ≤ i} and F * i = (. . . , η −2 , η −1 , η 0 , η 1 , . . . , η i ). Let G : [0, 1] × R ∞ → R denote a filter, such that G(t, F i ) is a properly defined random variable for all t ∈ [0, 1].
A triangular array {(ε i,n ) 1≤i≤n } n∈N is called locally stationary, if there exists a filter G, which is continuous in its first argument, such that ε i,n = G(i/n, F i ) for all i ∈ {1, . . . , n}, n ∈ N. The physical dependence measure of a filter G with sup t∈[0,1] G(t, F i ) q,Ω < ∞ with respect to · q,Ω is defined by The filter G models the non-stationarity of (ε i,n ). The quantity δ q (G, i) measures the dependence of (ε i,n ) and plays a similar role as mixing coefficients. We now state some assumptions regarding the error terms in model (1.1).
(2) The filter G is Lipschitz continuous with respect to · 4,Ω and (3) The (local) long-run variance of G, defined as (4) The moments of order 8 are uniformly bounded, i. e., max 1≤i≤n Eε 8 i,n < ∞.

Testing for relevant differences -the problem of estimating the variance.
Continuing the discussion in Section 2.1 it follows from the results given in Section 3 that the estimator (2.4) is asymptotically normal distributed if Assumptions 2.1, 2.2, 2.3 and an additional assumption on the consistency of the statisticǧ n are satisfied. More precisely, it can be shown (see Remark 3.7) that where the symbol denotes weak convergence, σ 2 (·) is the local long-run variance defined in (2.5) and d ω (·) denotes an unknown function, that depends on the function µ and the error process. In principle, ifσ 2 n andd 2 ω are estimators of the local long-run variance and the function d ω , respectively, a reasonable strategy would be to reject the null hypothesis in (1.3) if where z 1−α denotes the (1 − α)-quantile of the standard normal distribution. It will be shown in Remark 3.7 below, that this decision rule provides a consistent and asymptotic level α-test for the hypotheses in (1.3). However, it turns out that this decision rule does not provide a stable test because local estimators of the long-run variance have a rather large variability.
In order to avoid the intricate estimation of the local long-run variance we will re-define the local linear estimator in (2.2) permuting the data and consider the partial sum process of the new estimators in the following section. This approach will enable us to construct an (asymptotically) pivotal test statistic for the hypotheses in (1.3).

A pivotal test statistic
3.1. Self-normalization. A common technique to avoid estimating the long-run variance are ratio statistics or self-normalization as first introduced by Horváth et al. (2008) and Shao (2010), which are based on a convenient rescaling of the test statistic. However, these concepts are not easy to transfer to non-stationary time series as they rely on the asymptotic properties of a corresponding partial sum process. To illustrate the problems of these concepts in non-stationary time series consider the simplest case of model (1.1), where the mean function is constant and the error process is stationary. In this case the estimate of the constant mean function µ from the partial sample X 1,n , . . . , X λn ,n is its mean and under the assumptions stated in Section 2 we have the weak convergence where W (λ) λ∈[0,1] denotes a standard Brownian motion and the long-run variance σ 2 is defined in (2.5) and does not depend on t (because of the stationarity assumption). In this case, the factorisation of the limit into the long-run variance and a probabilistic term is used to construct a test statistic in the form of a ratio, such that σ occurs in the nominator and denominator, and therefore cancels out. On the other hand, if the error process in model (1.1) is non-stationary (but the mean function is still constant) we have the weak convergence In this case, the limiting distribution does not factorise and it is no longer possible to use the common self-normalization approach. Zhao and Li (2013) and Rho and Shao (2015) discuss locally stationary time series, but the proposed self-normalization procedures have to be combined with a wild bootstrap.
In this work, we present an alternative self-normalization procedure for non-stationary time series which does not require resampling to obtain (asymptotically) pivotal statistics. Our approach is based on the idea that in a locally stationary setting, observations from the whole interval [0, 1] need to be taken into account. Therefore, let b n denote a sequence with b n → ∞ and b n /n → 0, as n → ∞, and let n = n/b n . We define a (fixed) permutation of the set {1, . . . , n} by T : Roughly speaking, the mapping T splits the set {1, . . . , n} into n blocks with block length b n , that is where the blocks correspond to the columns in the above display.
With this notation, for ζ > 0 and λ ∈ [ζ, 1], we define the sequential local linear estimator of the mean function µ from the sample X T 1 ,n , . . . , X T λn ,n as the first coordinate of the vector In the following we will work with a bias corrected version ofμ hn (λ, t) and consider the sequential Jackknife estimator we can rewrite the distance in (2.1) as d 0 = d 2,τ . In order to estimate d 0 letĝ n (λ) be a suitable sequential estimator of the benchmark g(µ) from the sample X T 1 ,n , . . . , X T λn ,n and defined Note that all estimates are calculated from a part of the permuted sample and that the statisticd 2,n (1) estimates d 0 from the full sample X 1,n , . . . , X n,n and therefore coincides with the estimator defined in (2.4). For the proofs of our main results we need an assumption regarding the precision of the estimatorĝ n (·) of the benchmark, the bandwidth h n and the block length b n , which are given next.

Assumption 3.2.
There exist constants α, β > 0 such that the sequence of bandwidths

Remark 3.3.
(1) Assumption 3.1 is rather mild and satisfied for many functionals as explained below. Proofs of the following statements can be found in Section A.3 of the Appendix. The assumption holds for g(µ) = c with some known c ∈ R , forĝ n (λ) = g(μ hn (λ, ·)). (iii) Assumption 3.1 is satisfied for the functional defined in (1.2) and the estimator If there exists a continuous function h g in the equivalence class corresponding tō h g , the estimatorĝ n (λ) = g(μ hn (λ, ·)) satisfies Assumption 3.1.
Nevertheless a corresponding pivotal test can be developed as well -see Remark 3.5 for more details.
Remark 3.5. If g(µ) = µ(t), for some fixed t ∈ [0, 1], the benchmark g(µ) needs to be estimated locally and there is no estimator satisfying Assumption 3.1. However, an analogous result as stated in Theorem 3.4 can be shown with the same arguments given in the proof of the latter theorem. More precisely, if g(µ) = µ(t) we can usê g n (λ) =μ hn (λ, t) and under Assumptions 2.1, 2.2, 2.3 and 3.2, the process , 1, 2} and t ∈ {0, 1}, and K * is defined by In the following, we will develop a pivotal test for the hypotheses (1.3) on the basis of Theorem, 3.4 or Remark 3.5. For this purpose let ν be a probability measure on the interval [ζ, 1] with ν({1}) = 0. We propose to reject the null hypothesis if where q 1−α denotes the (1 − α)-quantile of the distribution of the random variable . Corollary 3.6. Let the assumptions of either Theorem 3.4 or of Remark 3.5 be satisfied. If ∆ > 0, the decision rule (3.6) defines a consistent and asymptotic level α-test for the hypotheses (1.3) of a relevant deviation of the mean function µ from the threshold g(µ), that is Remark 3.7. Note that the Jackknife estimator defined in (2.3) coincides withμ hn (1, ·) andǧ n = g(μ hn (1, ·)). Consequently, the continuous mapping theorem and Theorem 3.4 yield the weak convergence stated in equation (2.6) of Section 2.3. Consequently, if the estimatorsσ 2 n andd 2 ω are consistent, the decision rule in (2.7) defines a consistent and asymptotic level α test for the hypothesis (1.3), that is

Monte Carlo simulation study.
A large scale Monte Carlo simulation study was performed to analyse the finite-sample properties of the proposed test (3.6). The local linear estimator in (3.1) requires the specification of the kernel K and the bandwidth h n . We used the quartic kernel K(x) = 15 16 (1−x 2 ) 2 , but other kernels will yield similar results. The choice of the bandwidth h n for the estimatorμ hn is crucial to avoid both overfitting and oversmoothing, and we employ the following k-fold cross-validation procedure with k = 10 (as recommended by Hastie et al., 2009, page 242). (1) Split the observed data randomly in k = 10 sets S 1 , . . . , S 10 of equal length.
(2) For h n = 1 n and each set S i , calculate the Jackknife estimatorμ Step (2), compute the mean squared prediction error (4) Repeat Steps (2) and (3) for the bandwidths h n = 2 n , . . . , n/2 n (5) Choose the bandwidth h n that minimises the mean squared prediction error MSE hn .
As block width we chose b n = 20 and as measure ν on [0, 1] in (3.6) we used the uniform distribution on the set {1/5, . . . , 4/5}. Preliminary simulation studies showed that different choices of b n and the measure ν lead to similar results.
We considered two types of mean functions µ, three different error processes and four different choices of the time-dependent variance. The first class of models is based on the mean function µ (1) a (x) = 10 + 1 2 sin(8πx) + a x − 1 4 2 1 x > 1 4 , (4.1) which is displayed in the left part of Figure 1 for various choices of the parameter a. We considered the testing problem and τ (·) = 2λ [1/2,1] (·) is the Lebesgue measure on the interval [ 1 2 , 1]. Such a scenario might for instance be encountered and of interest in the context of analyzing climate data where measurements for a recent period are compared with an average from previous years.

Note that µ
(1) a −μ (1) a 2,τ = 1/2 for a * ≈ 1.43. We call this situation (i.e. when there is equality in (4.2)) the boundary of the hypotheses. On the other hand for a < a * and a > a * the null hypothesis and alternative in (4.2) are satisfied, respectively.
The second model has the mean function which is displayed in the right part of Figure 1. For models involving this mean function, we considered the testing problem for various choices of the threshold ∆ > 0, where g(µ) ≡ 10 and τ (·) = λ [0,1] (·) is the Lebesgue measure on the interval [0, 1]. Such a setting might be encountered in quality control, where deviations from a target value might occur gradually due to wear and tear (and eventual failure) of a component of a complex system. Note that µ (2) −10 2,τ ≤ ∆ for ∆ ≥ 1.392, whereas µ (2) − 10 2,τ > ∆ for ∆ < 1.392.
We consider four different choices of time-dependent varianceσ 2 (t) = E[G 2 (t, F 0 )], that isσ 2 0 (t) = 1,σ 2 1 (t) = 1 2 + t, σ 2 2 (t) = 1 − 1 2 cos(2πt),σ 2 3 (t) = 1 2 + 1(t ≥ 1/2) , and three classes of error processes {ε i,n : 1 ≤ i ≤ n} n∈N in model (1.1), that is 2 ε i−1,n /2, for k ∈ {0, 1, 2, 3}, where (η i ) i∈Z is an i.i.d. sequence of standard normal distributed random variables.  (1) a defined in (4.1), which yields to different values of d 0 in the hypotheses (4.2). On the other hand, in Table 2 the function µ (2) and therefore the value d 0 is fixed and the threshold ∆ in the hypotheses is varied. The lines marked in boldface indicate the boundary of the null hypothesis, that is, the parameter where d 0 = ∆. More precisely, note that the null hypothesis in (4.2) holds if and only if d 0 ≤ 0.5 and we display exemplary results for the cases d 0 = 0.35, 0.4, 0.45 and 0.5 in Table 1, where the last case corresponds to the boundary of the null hypotheses. The remaining cases d 0 = 0.6, 0.7 and 0.8 represent three scenarios of the alternative in (4.2). Similarly, in Table 2    (4.4) holds if and only if the threshold satisfies ∆ ≥ 1.39. We observe in most cases a good approximation of the nominal level at the boundary of the hypotheses and the test is also able to detect alternatives with reasonable power. These empirical findings corresponds with the theoretical results derived in Section 3.
We conclude this section with a comparison of the new test (3.6) with the test (2.7) which relies on the estimation of the (local) long-run variance. For this purpose we use the long-run variance estimator as proposed in equation (4.7) of  with bandwidths as suggested in this reference. In Table 3 and 4 we display the rejection probabilities for both tests for some of the models considered in Table 1 and Table 2, where we use the Lebesgue measure on the interval [0, 1] for the calculation of the L 2distances and the benchmark is given by g(µ) = 1 0 µ(x)dx. For the sake of brevity we restrict ourselves to the sample size n = 500 and the variance functionσ 2 0 (t) = 1.  Table 3. Empirical rejection rates of tests (3.6) and (2.7) for the hypotheses (4.2). The mean function is given by (4.1), where different values for the parameter a and different error processes are considered. The variance isσ 2 0 (t) = 1, the sample size is n = 500 and the line in boldface corresponds to the boundary of the hypotheses.  Table 4. Empirical rejection rates of tests (3.6) and (2.7) for the hypotheses (4.4). The mean function is given by (4.3), where different values for the threshold ∆ and different error processes are considered. The variance isσ 2 0 (t) = 1, the sample size is n = 500 and the line in boldface corresponds to the boundary of the hypotheses.
We observe that the test (2.7) is conservative at the boundary of the hypotheses. As a consequence the proposed test (3.6) based on self-normalization is usually more powerful.

Case Study.
Time series with possibly smoothly varying mean naturally arise in the field of meteorology. To illustrate the proposed methodology, we consider the mean of daily minimal temperatures (in degrees Celsius) over the month of July for a period of approximately 120 years in eight different places in Australia. At each station we tested for relevant deviations of the temperature from the mean temperature calculated for an historic reference period ranging from the late 19th century to 1925 at that station. As a threshold ∆, we chose 0.25, 0.5 and 0.75 degrees Celsius. Exemplary, the observed temperature curves at the weather station in Cape Otway, Gayndah and Melbourne and the mean over all weather stations are plotted in Figure 2, alongside with their estimated smooth mean curvesμ and the estimated benchmarksĝ.
The results for all stations under consideration can be found in Table 5. For test (3.6), most p-values are significant for ∆ = 0.25 degrees Celsius. Further, two p-values for ∆ = 0.5 are significant. The test (2.7) does not yield a significant p-value below 0.05 at any station. Test (3.6) based on the proposed self-normalization procedure seems to be more powerful than (2.7), which confirms the numerical findings of the simulation study. for j ∈ {0, 1, 2}. If this result is true, the proof follows by arguments similar to those used in the proof of Lemma B.1 of . To be precise, note that S n,0 (λ, t)S n,2 (λ, t) − S 2 n,1 (λ, t) > 0 for any λ ∈ [ζ, 1] and almost every n ∈ N. This means, that the Hessian matrix H f is positive definite and both partial derivatives vanish if any only if The statement of Lemma A.1 now follows from the definition of the Jackknife estimator µ hn in (3.2).
To finish the proof, we now show the remaining estimate (A.3). For this purpose let λ ∈ [ζ, 1] and define B(λ) = n j=1 B j (λ), where the sets B j (λ) are defined by Note that (  (1−t)/hn −t/hn x j K(x)dx, for j ∈ N 0 , and Proof. By similar arguments as given for the approximation in (A.3) it follows that for j ∈ {0, 1, 2}. Note that S n,0 (λ, t)S n,2 (λ, t) − S 2 n,1 (λ, t) > 0 for any λ ∈ [ζ, 1] and almost every n ∈ N since, by Assumption 2.2, Proof. First observe that G n (λ) = λ √ n d n (λ, ·) − d 2 2,τ + 2 d,d n (λ, ·) − d τ , and note thatĝ by Assumptions 2.3 and 3.1. Recall the definition of the interval I n = [h n , 1 − h n ] and denote f, g In = In f (x)g(x)τ (dx) and f 2,In = f, f 1/2 In , for any f, g ∈ L 2 ([0, 1], τ ). In the following let then the assertion follows from the statements For a proof of (A.9) note that by the Cramér-Wold device, it is sufficient to prove (A.13) Observe that by Assumption 3.1, (A.7), (A.12) and (A.13) uniformly with respect to λ ∈ [ζ, 1], where the random variablesε T j ,n are defined in (A.11) and m n -dependent in the sense thatε T j 1 ,n andε T j 2 ,n are independent if |T j 1 − T j 2 | > m n . For the estimation of the first term, let K i,j denote the integral . By the Cauchy-Schwarz inequality and absolute continuity of τ , K i,j can be bounded from above by Ch n . Tith implies E sup λ∈ [ζ,1] where the last estimate follows observing max 1≤i≤n Eε 8 i,n < ∞ by Assumption 2.3 (4) and the fact that only O(n 4 m 4 n ) summands of the inner sum are non-zero due to the m n -dependency of the random variablesε j,n . Thus, by (A.14), uniformly with respect to λ ∈ [ζ, 1]. If it follows that G n (λ) = o P (1) and therefore we assume d 0 > 0 in the following discussion.
In this case we have from (A.8) and (A.13) that By Lipschitz continuity of d and supp(K) = [−1, 1] it follows that We obtain for any point of continuity y of the piecewise continuous density f τ of the measure τ that 1 Therefore, for T j ∈ {2nh n , . . . , n − 2nh n }, it holds where the random variables Y 1 , . . . , Y n are defined by and Observe that Y 1 , . . . , Y n centred and m n -dependent random variables in the sense that Y j 1 and Y j 2 are independent if |T j 1 − T j 2 | > m n . Define the big blocks B j = {k ∈ N : (j −1)b n +1 ≤ k ≤ jb n −m n } and the small blocks S j = {k ∈ N : jb n −m n +1 ≤ k ≤ jb n }, for j = 1 . . . , n , and the remainder R = {k ∈ N : n b n + 1 ≤ k ≤ n}. In the following, we will show that the small blocks and the remainder are negligible and the asymptotic behaviour of Z n is determined by the big blocks. First observe thatε T k 1 ,n ∈ S j 1 and ε T k 2 ,n ∈ S j 2 are independent for j 1 = j 2 . Thus, (A.20) Further, T k ∈ S j for some k ≤ λn , if and only if k = r n + j for some r ≥ b n − m n and r ≤ λn−j n and we obtain (j−1)bn+r 2 +1 n × E[ε (j−1)bn+r 1 +1,nε(j−1)bn+r 2 +1,n ].
For λ < 1, it holds that λ n bn n → λ and 1 − mn bn → 1, so for almost every n ∈ N, b n − m n ≥ λn−j n . Thus, if λ i 1 < 1 or λ i 2 < 1, the sums indexed by r 1 and r 2 on the right-hand side of the previous display are empty sums for almost every n ∈ N. For λ = 1, there are m n summands in both sums, thus, the right-hand side of the previous display is of order O(m 2 n /b n ) which vanishes by assumption. Thus the small blocks are asymptotically negligible, and analogously, The sums over the big blocks are independent, and we have analogously to (A.20), n j=1 r 1 ∈B i 1 ,j r 2 ∈B i 2 ,j E[ε r 1 ,nεr 2 ,n ] = λ i 1 ∧ λ i 2 b n σ 2 jb n n + O b 3 n n + b n γ bn + 1 + b 2 n n 1/2 m c n .
Plugging this into (A.21) and observing n = n/b n leads to Finally, observe that by Jensen's inequality and Assumption 2.3 (4), for some constant Eε 4 T k ,n = O(b 3 n /n).