On detecting changes in the jumps of arbitrary size of a time-continuous stochastic process

This paper introduces test and estimation procedures for abrupt and gradual changes in the entire jump behaviour of a discretely observed Ito semimartingale. In contrast to existing work we analyse jumps of arbitrary size which are not restricted to a minimum height. Our methods are based on weak convergence of a truncated sequential empirical distribution function of the jump characteristic of the underlying Ito semimartingale. Critical values for the new tests are obtained by a multiplier bootstrap approach and we investigate the performance of the tests also under local alternatives. An extensive simulation study shows the finite-sample properties of the new procedures.


Introduction
Stochastic processes are widely used in science nowadays, as they allow for a flexible modelling of time-dependent phenomena. For example, in physics stochastic processes are used to explain the behaviour of quantum systems (see van Kampen, 2007), but stochastic processes are also suitable for financial modelling. The seminal paper by Delbaen and Schachermayer (1994) suggests to use the special class of Itō semimartingales in continuous time. Financial models based on Itō semimartingales satisfy a certain condition on the absence of arbitrage and moreover they are still rich enough to accommodate stylized facts such as volatility clustering, leverage effects and jumps. As a consequence, in recent years a lot of research was focused on the development of statistical procedures for characteristics of Itō semimartingales based on discrete observations. In particular, the importance of the jump component has been enforced by recent research (see Jacod, 2009a andJacod, 2009b) and common methods in this field are gathered in the recent monographs by Jacod and Protter (2012) and Aït-Sahalia and Jacod (2014).
A fundamental topic in statistics for stochastic processes is the analysis of structural breaks. Corresponding test procedures, commonly referred to as change point tests, have their origin in quality control (see Page, 1954;Page, 1955) and nowadays, these techniques are widely used in many fields of science such as economics (Perron, 2006), finance (Andreou andGhysels, 2009), climatology (Reeves et al., 2007) and engineering (Stoumbos et al., 2000). The contributions of the present paper to this field of research are new statistical procedures for the detection of changes in the jump behaviour of an Itō semimartingale. In contrast to the existing works Bücher et al. (2017) and  this paper introduces methods of inference on the jump behaviour of the underlying process in general, while in the previously mentioned references the authors restrict the analysis to jumps which exceed a minimum size ε > 0.
Throughout this work we assume that we have high-frequency data X i∆n (i = 0, 1, . . . , n) with ∆ n → 0, where the process (X t ) t∈R + is an Itō semimartingale with the following decomposition Here W is a standard Brownian motion and µ is a Poisson random measure on R + × R with predictable compensatorμ satisfyingμ(ds, du) = ds ν s (du). Our approach is completely nonparametric, that is we only impose structural assumptions on the characteristic triplet (b s , σ s , ν s ) of (X t ) t∈R + . The crucial quantity here is the transition kernel ν s which controls the number and the size of the jumps around time s ∈ R + . Our aim is to test the null hypothesis H 0 : ν s (dz) = ν 0 (dz) (1.1) against various alternatives involving the non-constancy of ν s . In particular, the detection of abrupt changes in a stochastic feature has been discussed extensively in the literature (see Horváth, 2013 andJandhyala et al., 2013 for an overview in a time series context). The first part of this paper belongs to this area of research and introduces tests for H 0 versus alternatives of an abrupt change of the form H (ab) 1 : ν (n) s (dz) = 1 {s< nθ 0 ∆n} ν 1 (dz) + 1 {s≥ nθ 0 ∆n} ν 2 (dz), for some unknown θ 0 ∈ (0, 1) and two distinct Lévy measures ν 1 = ν 2 . Similar to the classical setup of detecting changes in the mean of a time series it is only possible to define the change point relative to the length of the data set which in our case is the time horizon n∆ n . However, for inference on the jump behaviour the time horizon has to tend to infinity (n∆ n → ∞) since there are only finitely many jumps of a certain size on every compact interval. Furthermore, we also discuss how to estimate the unknown change point θ 0 , if the alternative H (ab) 1 is true. A more difficult problem is the detection of gradual (smooth, continuous) changes in a stochastic feature. As a consequence, the setup in most papers on this topic is restricted to nonparametric location or parametric models with independently distributed observations (see e.g. Bissell, 1984, Gan, 1991, Siegmund and Zhang, 1994, Husková, 1999, Husková and Steinebach, 2002and Mallik et al., 2013. Gradual changes in a time series context are for instance discussed in Aue and Steinebach (2002) and Vogt and Dette (2015). In the second part of this paper we contribute to this development by introducing new procedures for gradual changes in the kernel ν s , where we basically test H 0 against the general alternative H (gra) 1 : ν s (dz) is not Lebesgue-almost everywhere constant in s ∈ [0, n∆ n ].
Moreover, we introduce an estimator for the first time point where the jump behaviour deviates from the null hypothesis.
The remaining paper is organized as follows: In Section 2 we give the basic assumptions on the characteristics of the underlying process and the observation scheme. Section 3 introduces test and estimation procedures for abrupt changes in the jump behaviour in general by using CUSUM processes. In Section 4 we discuss how to detect and estimate gradual changes in the entire jump behaviour. Section 5 contains an extensive simulation study investigating the finite-sample performance of the new procedures. Finally, all proofs are relegated to Section 6 and the technical appendices A, B and C.

The basic assumptions
In order to accommodate both abrupt and gradual changes in our approach we follow  and assume that there is a driving law behind the evolution of the jump behaviour in time which is common for all n ∈ N. That is we assume that at step n ∈ N we observe an Itō semimartingale X (n) with characteristics (b (n) s , σ (n) s , ν (n) s ) at the equidistant time points i∆ n with i = 0, 1, . . . , n which satisfies the following rescaling assumption ν (n) s (dz) = g s n∆ n , dz (2.1) for a transition kernel g(y, dz) from ([0, 1], B([0, 1])) into (R, B), where here and below B(A) denotes the trace σ-algebra on A ⊂ R of the Borel σ-algebra B of R. In order to detect changes in the jump behaviour of the underlying Itō semimartingale in general, we have to draw inference on the kernel g(y, B) for sets B ∈ B containing the origin. However, g has locally the properties of a Lévy measure. Thus, if we deviate from the (simple) case of finite activity jumps the total mass of g on every neighbourhood of the origin is infinite and we cannot estimate g(y, ·) on sets containing 0 directly. We address this problem by weighting the kernel g according to an auxiliary function, precisely for change point detection we consider for (θ, t) ∈ [0, 1] × R, where ρ is chosen appropriately such that the integral is always defined. Under weak conditions on ρ, this so-called Lévy distribution function N ρ determines the entire kernel g and therefore the evolution of the jump behaviour in time. The natural approach to draw inference on N ρ is the following sequential generalization of an estimator in Nickl et al. (2016)Ñ (n) ρ (θ, t) = 1 n∆ n nθ i=1 ρ(∆ n i X (n) )1 (−∞,t] (∆ n i X (n) ), for (θ, t) ∈ [0, 1] × R, where ∆ n i X (n) = X (n) i∆n − X (n) (i−1)∆n . Using a spectral approach similar to Nickl and Reiß (2012) these authors prove weak convergence of √ n∆ n Ñ (n) ρ (1, t) − N ρ (g; 1, t) in ∞ (R) to a tight Gaussian process, but only for Lévy processes without a diffusion component, i.e. in particular for constant g(y, ·) ≡ ν(·). The main difficulty in generalizing this result is the superposition of small jumps with the roughly fluctuating Brownian component of the process. We solve this problem by using a truncation approach which has originally been used by Mancini (2009) to cut off jumps in order to draw inference on integrated volatility. More precisely, we follow  and identify jumps by inverting the truncation technique of Mancini (2009), i.e. all test statistics and estimators investigated below are functionals of the sequential truncated empirical Lévy distribution function for some suitable null sequence v n → 0. As a further improvement to previous studies we analyse the asymptotic behaviour of our tests under local alternatives. That is, in the rescaling assumption (2.1) we let g = g (n) depend on n ∈ N, where there exist transition kernels g 0 , g 1 , g 2 satisfying some additional regularity assumptions such that for each y ∈ [0, 1] g (n) (y, dz) = g 0 (y, dz) + 1 √ n∆ n g 1 (y, dz) + R n (y, dz) (2.4) and for each y ∈ [0, 1], B ∈ B and n ∈ N the remainder kernel R n satisfies R n (y, B) ≤ a n g 2 (y, B) for a sequence a n = o((n∆ n ) −1/2 ) of non-negative real numbers. For constant g 0 (y, ·) ≡ ν 0 (·) assumption (2.4) is exactly the local alternative where the jump behaviour converges to the null hypothesis g 0 (y, ·) ≡ ν 0 (·) from the direction defined by g 1 at rate (n∆ n ) −1/2 . In this sense, Theorem 6.1, in which we prove weak convergence of the stochastic process G (n) ρ (θ, t) = n∆ n N (n) ρ (θ, t) − N ρ (g (n) ; θ, t) , (θ, t) ∈ [0, 1] × R to a tight Gaussian process in ∞ ([0, 1] × R), is a generalization of the results in  to sequential processes for time dependent variable jump behaviour as in (2.4).
Critical values for the test procedures introduced below and the optimal choice of a regularization parameter of the new estimator for gradual change points are obtained by a multiplier bootstrap approach. Precisely, Theorem 6.8, in which we prove conditional weak convergence in a suitable sense of the bootstrapped version ρ to a Gaussian process, where (ξ i ) i∈N is a sequence of i.i.d. multipliers with mean 0 and variance 1, complements the paper .
For the rescaling assumptions (2.1) and (2.4) we consider transition kernels g i (y, dz) of the set G(β, p) depending on parameters β ∈ (0, 2), p > 0. In order to define this set we denote by λ the one-dimensional Lebesgue measure defined on the Lebesgue σ-algebra L 1 of R and we denote by λ 1 the restriction of λ to the trace σ-algebra [0, 1] ∩ L 1 .
(2) For n ∈ N let C n := {z ∈ R | 1 n ≤ |z| ≤ n}. Then for each n ∈ N there exists a K n > 0 with h y (z) ≤ K n for each z ∈ C n and all y ∈ [0, 1] \ L.
The items above basically say that the densities h y are bounded by a continuous Lévy density of a Lévy measure which behaves near zero like the one of a β-stable process, whereas this density has to decay sufficiently fast at infinity. Such conditions are well-known in the literature and often used in similar works on high-frequency statistics; see e.g. Aït-Sahalia and Jacod (2009a) or Aït-Sahalia and Jacod (2010). From Assumption 6.12 and Proposition 6.13 in Section 6 it can be seen that it is even possible to work with a wider class of transition kernels g(y, dz) which does not require Lebesgue densities. Nevertheless, we stick to the set G(β, p) defined above which is much simpler to interpret. The following example shows that alternatives of abrupt changes in the jump behaviour can be described by transition kernels in the set G(β, p).
Example 2.2. (abrupt changes) In Section 3 we introduce statistical procedures for inference of abrupt changes in the jump behaviour. In this case the kernel g 0 is typically of the form as discussed below. For β ∈ (0, 2) and p > 0 let M(β, p) be the set of all Lévy measures ν such that the constant transition kernel g(y, dz) = ν(dz) belongs to G(β, p).
The variance gamma process is a common model for the log stock price in finance (see for instance Madan et al. (1998)). Moreover, the Lévy measure of a variance gamma process has the form ν(dz) = (a 1 z −1 e −b 1 z − a 2 z −1 e −b 2 z ) dz for a 1 , a 2 , b 1 , b 2 > 0. Thus, the transition kernel g 0 (y, dz) belongs to G(β, p) for all β ∈ (0, 2) and p > 0, if similar as in (2.5) g 0 is piecewise constant in y ∈ [0, 1] and on the domains of constancy it is equal to the Lévy measure of a variance gamma process. Below we give the main assumptions which are sufficient for the convergence results in this paper.
Obviously, the measure with density M (dy, dz) := ρ(z)g 0 (y, dz)dy is completely determined from knowledge of the entire function N ρ (g 0 ; ·, ·) and does not charge [0, 1] × {0}. Therefore, due to Assumption 2.3(a3) 1/ρ(z)M (dy, dz) = g 0 (y, dz)dy and consequently the jump behaviour corresponding to g 0 is known as well. Furthermore, Assumption 2.3(a4) ensures that a characteristic quantity for a gradual change, which we introduce in Section 4 is zero if and only if the jump behaviour corresponding to g 0 is constant in time. All convergence results in this paper also hold without Assumption 2.3(a3) and (a4). Moreover, the functioñ is suitable for any choice of the constants β and τ . In practice, however, one would like to work with a polynomial decay at zero, in which case the condition on p comes into play. Here, the smaller the parameter β, the smaller p can be chosen. For example, for β < 3/5 and τ > 3/35 even a choice p < 2 is possible. Furthermore, it is also important to choose the observation scheme suitably. Obviously, we have ∆ n → 0 and n∆ n → ∞ because of 0 < t 1 < t 2 ≤ 1, and a typical choice is ∆ n = O(n −y ) and n −y = O(∆ n ) for some 0 < t 1 < y < t 2 ≤ 1. Finally, Assumption 2.3(c) requires only a bound on the moments of the remaining characteristics and is therefore extremely mild.
In the remaining part of this section we illustrate an example of a kernel g 0 ∈ G(β, p) for some suitable β, p and a function ρ satisfying Assumption 2.3(a2) and (a3).

Statistical inference for abrupt changes
In this section we deduce test and estimation procedures for abrupt changes in the jump behaviour of the underlying process, that is we investigate the situation of Example 2.2. To this end, we test the null hypothesis of no change in the jump behaviour H 0 : Assumption 2.3 is satisfied for g 1 = g 2 = 0 and there exists a Lévy measure ν 0 such that g 0 (y, dz) = ν 0 (dz) for Lebesgue almost every y ∈ [0, 1].
against the alternative that the jump behaviour is constant on two intervals H 1 : Assumption 2.3 is satisfied for g 1 = g 2 = 0 and there exists some θ 0 ∈ (0, 1) and two Lévy measures ν 1 = ν 2 such that g 0 has the form (2.5).
The corresponding alternative for fixed t 0 ∈ R is given by: for a Lévy measure ν.
Moreover, we investigate the behaviour of the tests introduced in this section under local alternatives which tend to the null hypothesis as n → ∞:

Weak convergence of test statistics
Following Inoue (2001) a suitable approach to introduce tests for the hypotheses above is to investigate the convergence behaviour of the CUSUM process . The corresponding test rejects the null hypothesis H 0 for large values of the Kolmogorov-Smirnov-type statistic The theorem below establishes functional weak convergence of T (n) ρ in the general case of local alternatives. the process T (n) ρ converges weakly in ∞ ([0, 1] × R) to the process T ρ + T ρ,g 1 , where the tight mean zero Gaussian process T ρ has the covariance structure and the deterministic function T ρ,g 1 ∈ ∞ ([0, 1] × R) is given by where N ρ (g 1 ; ·, ·) is defined in (2.2).
As an immediate consequence of the previous result and the continuous mapping theorem we obtain weak convergence of the statistic T (n) ρ .
is true, then we have in (R, B) with T ρ + T ρ,g 1 the limit process in Theorem 3.1.
In applications the Lévy measure ν 0 which describes the limiting jump behaviour of the underlying process is usually unknown. If one is only interested in the detection of changes in the distribution function N ρ (ν 0 ; t 0 ) for a fixed t 0 ∈ R, the processes converge weakly to a shifted version of a pivotal limit process.
where K denotes a standard Brownian bridge and with the deterministic functionV where N ρ 2 (ν 0 ; ·) is defined in (3.1). In particular, Quantiles of functionals of the limit process T ρ + T ρ,g 1 in Theorem 3.1 are not easily accessible since the distribution of such functionals usually depends in a complicated way on the unknown quantities ν 0 and g 1 in the jump characteristic of the underlying process. In order to obtain reasonable approximations for these quantiles we use a multiplier bootstrap approach. That is, in the following we consider bootstrapped processes,Ŷ n =Ŷ n (X 1 , . . . , X n , ξ 1 , . . . , ξ n ), which depend on random variables X 1 , . . . , X n defined on a probability space (Ω X , F X , P X ) and on random weights ξ 1 , . . . , ξ n which are defined on a distinct probability space (Ω ξ , F ξ , P ξ ). Thus, the processesŶ n live on the product space (Ω, A, P) := (Ω X , A X , P X ) ⊗ (Ω ξ , A ξ , P ξ ). Below we use the notion of weak convergence conditional on the sequence (X i ) i∈N in probability. It can be found in Kosorok (2008) on pp. 19-20.
Definition 3.4. LetŶ n =Ŷ n (X 1 , . . . , X n ; ξ 1 , . . . , ξ n ) : (Ω, A, P) → D be a random element taking values in some metric space D depending on some random variables X 1 , . . . , X n and some random weights ξ 1 , . . . , ξ n . Moreover, let Y be a tight, Borel measurable random variable into D. ThenŶ n converges weakly to Y conditional on the data X 1 , X 2 , . . . in probability, if and only if (a) sup Here, E ξ denotes the conditional expectation over the weights ξ given the data X 1 , . . . , X n , whereas BL 1 (D) is the space of all real-valued Lipschitz continuous functions f on D with supnorm f D ≤ 1 and Lipschitz constant 1. Here and below we denote the sup-norm of a real valued function f on a set M by f M . Furthermore, in item (b) f (Ŷ n ) * and f (Ŷ n ) * denote a minimal measurable majorant and a maximal measurable minorant with respect to the joint probability space (Ω, A, P). The type of convergence defined above is denoted byŶ n ξ Y .
(i) Throughout this work all expressions f (Ŷ n ), with a bootstrapped statisticŶ n and a Lipschitz continuous function f , are measurable functions of the random weights. To this end we do not use a measurable majorant or minorant in item (a) in the definition above.
(ii) The implication "(ii) ⇒ (i)" in the proof of Theorem 2.9.6 in Van der Vaart and Wellner (1996) shows that conditional weak convergence ξ implies unconditional weak convergence with respect to the product measure P.
For the results on conditional weak convergence of the bootstrapped processes below we require a rather mild additional assumption on the sequence of multipliers, which is satisfied for many common distributions such as for instance the Gaussian, the Poisson or the Binomial distribution.
Assumption 3.6. The sequence (ξ i ) i∈N is defined on a distinct probability space than the one generating the data {X (n) i∆n | i = 0, 1, . . . , n} as described above, is i.i.d. with mean zero, variance one and there exists an M > 0 such that for each integer m ≥ 2 we have Reasonable bootstrap counterpartsT (n) ρ of the processes T (n) ρ are given bŷ In the following theorem we establish conditional weak convergence ofT (n) ρ under the general assumptions of Section 2.
Theorem 3.7. Let Assumption 2.3 be valid and let the multipliers (ξ j ) j∈N satisfy Assumption 3.6. Then we haveT (3.7) Remark 3.8. The aim of our bootstrap procedure is to mimic the convergence behaviour of T (n) ρ . The covariance function of the limiting process in Theorem 3.7 differs from (3.3), because Theorem 3.7 holds under the general conditions introduced in Assumption 2.3, i.e. for an arbitrary kernel g 0 ∈ G(β, p). Under the null hypothesis H 0 , where we have g 0 (·, dz) = ν 0 (dz), the covariance function (3.7) coincides with (3.3).
The limit distribution of the Kolmogorov-Smirnov-type test statistic T (n) ρ in Corollary 3.2 can be approximated under H 0 by the bootstrap statistics in the following corollary, which is an immediate consequence of Proposition 10.7 in Kosorok (2008).
Corollary 3.9. If Assumption 2.3 and Assumption 3.6 are satisfied, we havê with T ρ the limit process in Theorem 3.7.

Test procedures for abrupt changes
The weak convergence results of the previous section make it possible to define test procedures for abrupt changes in the jump behaviour of the underlying process based on Lévy distribution functions of type (2.2). In the following let B ∈ N be some large number and let (ξ (b) ) b=1,...,B be independent vectors of i.i.d. random variables ξ (b) = (ξ (b) j ) j=1,...,n with mean zero and variance one, which satisfy Assumption 3.6. WithT (n) ρ,ξ (b) andT (n) ρ,ξ (b) we denote the corresponding bootstrapped quantity calculated with respect to the data and the b-th multiplier sequence ξ (b) . For a given level α ∈ (0, 1), we propose to reject H 0 in favor of H 1 , if (3.8) Furthermore, according to Proposition 3.3 we define an exact test procedure, that is H 0 is rejected in favor of the point-wise alternative H where q K 1−α is the (1 − α)-quantile of the Kolmogorov-Smirnov-distribution, that is the distribution of the supremum of a standard Brownian bridge K = sup θ∈[0,1] |K(θ)|. The following results show the behaviour of the previously introduced tests under the null hypothesis, local alternatives and the alternatives of an abrupt change. In particular, these tests are consistent asymptotic level α tests. First, recall the tight centered Gaussian process T ρ in ∞ ([0, 1]×R) with covariance function (3.3), let L ρ : (R, B) → (R, B) be the distribution function of the supremum variable sup (θ,t)∈[0,1]×R |T ρ (θ, t)| and let L (t 0 ) ρ be the distribution function of sup θ∈[0,1] |T ρ (θ, t 0 )|. Furthermore, recall the random variable defined in (3.5) with the deterministic function Then the results on consistency of the tests are as follows.
holds for each α ∈ (0, 1) and additionally if N ρ 2 (ν 0 , t 0 ) > 0 then for all α ∈ (0, 1) we have Remark 3.11. According to Corollary 1.3 and Remark 4.1 in Gaenssler et al. (2007) the distribution function L ρ is continuous on R and strictly increasing on R + . Thus, (3.11) basically states that under the local alternative for large B, n ∈ N the probability that the test (3.8) rejects the null hypothesis is approximately equal to the probability that the supremum of the shifted version T ρ,g 1 exceeds the (1 − α)-quantile of the non-shifted version T ρ,0 . An analysis of the latter probability, which is beyond the scope of this paper, then shows in which direction, i.e. for which g 1 , it is harder to distinguish the null hypothesis from the alternative. The assertions (3.12) and (3.13) can be interpreted in the same way.

The argmax-estimators
If one of the aforementioned tests rejects the null hypothesis in favor of an abrupt alternative the natural question arises of how to estimate the unknown break point θ 0 . A typical approach in change-point analysis to this estimation problem is the so-called argmax-estimator, that is we basically take the argmax of the function θ → sup t∈R |T (n) ρ (θ, t)| as an estimate for θ 0 . Consistency of our estimators follows with the argmax continuous mapping theorem of Kim and Pollard (1990) using the following auxiliary result.
For the test problem H 0 versus H 1 we consider the estimator and in the setup H 0 versus H (ρ,t 0 ) 1 a suitable estimator for the change point is given bỹ The following proposition establishes consistency of these estimators.
Remark 3.16. For the sake of convenience we have focused on the case of one single break. The results on the tests in Section 3.2 also hold for alternatives with finitely many abrupt changes. Moreover, the estimation methods depicted above can easily be extended to detect multiple change points by a standard binary segmentation algorithm dating back to Vostrikova (1981).

Statistical inference for gradual changes
As a generalization of Proposition 3.14 one can show that k −1/2 n T (n) ρ (θ, t) converges in ∞ ([0, 1] × R) in outer probability to the function T ρ,g 0 defined in (3.4) whenever Assumption 2.3 is satisfied. Thus, under some minor regularity conditions, argmax θ∈[0,1] |T (n) ρ (θ, t)| is a consistent estimator of argmax θ∈[0,1] |T ρ,g 0 (θ, t)|. However, if the jump behaviour changes gradually at θ 0 , the function θ → |T ρ,g 0 (θ, t)| is usually maximal at a point θ 1 > θ 0 . As a consequence the argmax-estimators investigated in Section 3.3 usually overestimate a change point, if the change is not abrupt. Therefore, in this section we introduce test and estimation procedures which are tailored for gradual changes in the entire jump behaviour.

A measure of time variation for the entire jump behaviour
If the jump behaviour is given by (2.1) for some suitable transition kernel g = g 0 from ([0, 1], B([0, 1])) into (R, B), we follow Vogt and Dette (2015) and base our analysis of gradual changes on the quantity and where N ρ (g 0 ; ·, ·) is defined in (2.2). Here and throughout this paper we use the convention 0 0 := 1. We will address D (g 0 ) ρ as the measure of time variation (with respect to ρ) of the entire jump behaviour of the underlying process, because the following lemma shows that D  ρ (ζ, θ, t) = 0 for all 0 ≤ ζ ≤ θ and t ∈ R if and only if the kernel g 0 (·, dz) is Lebesgue almost everywhere constant on [0, θ].
According to the preceding lemma there exists a (gradual) change in the jump behaviour given by g 0 if and only if sup As a consequence, the first point of a change in the jump behaviour is given by where we set inf ∅ := 1. We call θ 0 the change point of the jump behaviour of the underlying process. Notice that by the discussion after (4.2) the definition in (4.3) is independent of ρ. In Section 4.3 we construct an estimator for θ 0 , where we only consider the quantity ρ . On the one hand the monotonicity of D (g 0 ) ρ simplifies our entire presentation and on the other hand the first time point where D (g 0 ) ρ deviates from 0 is also given by θ 0 , so it is equivalent to consider D (g 0 ) ρ instead. Our analysis of gradual changes is based on a consistent estimator D (n) ρ of D (g 0 ) ρ which we construct in Section 4.2. Before that we illustrate the quantities introduced in (4.3) and (4.4) in the situations of Example 2.2 and Example 2.5.

The empirical measure of time variation and its convergence behaviour
Suppose we have established that N (n) ρ (·, ·) is a consistent estimator for N ρ (g 0 ; ·, ·). Then with the set C defined in (4.2) it is reasonable to consider as an estimate for the measure of time variation of the entire jump behaviour D (g 0 ) ρ defined in (4.1). In the following we want to establish consistency of the empirical measure of time variation D (n) ρ . To be precise, the following two theorems show that the process (4.10) and its bootstrapped counterpart converge weakly or weakly conditional on the data in probability, respectively, to a suitable tight mean zero Gaussian process.
Theorem 4.4. If Assumption 2.3 is satisfied, then the process H (n) ρ defined in (4.10) converges weakly, that is where H ρ is a tight mean zero Gaussian process with covariance function For the statistical change-point inference proposed in the following sections we require quantiles of functionals of the limiting distribution in Theorem 4.4. (4.11) shows that this distribution depends in a complicated way on the unknown underlying kernel g 0 and therefore corresponding quantiles are difficult to estimate. In order to solve this problem we want to use a multiplier bootstrap approach similar to Section 3. To this end, we define the following bootstrap coun- n∆n ; ξ 1 , . . . , ξ n ; ζ, θ, t) The result below establishes consistency ofĤ (n) ρ .
Theorem 4.5. Let Assumption 2.3 be valid and let the multiplier sequence (ξ i ) i∈N satisfy Assumption 3.6. Then we haveĤ (n) ρ ξ H ρ in ∞ (C × R), where the tight mean zero Gaussian process H ρ has the covariance structure (4.11).

Estimating the gradual change point
For the sake of a unique definition of the (gradual) change point θ 0 in (4.3) we suppose throughout this section that Assumption 2.3 holds with g 1 = g 2 = 0. Recall the definition in (4.4), then by Theorem 4.4 the process D (n) ρ (ζ, θ, t) from (4.9) is a consistent estimator of D (g 0 ) ρ (ζ, θ, t). Therefore, we set Corollary 4.6. If Assumption 2.3 is satisfied with g 1 = g 2 = 0, then with the centered Gaussian process H ρ defined in Theorem 4.4.
Below we obtain that the rate of convergence of an estimator for θ 0 depends on the smoothness of the curve θ → D (g 0 ) ρ (θ) at θ 0 . Thus, we impose a kind of Taylor expansion of the function D (g 0 ) ρ . More precisely, we assume throughout this section that θ 0 < 1 and that there exist constants ι, η, , c > 0 such that D (g 0 ) ρ admits an expansion of the form for all θ ∈ [θ 0 , θ 0 + ι], where the remainder term satisfies |ℵ(θ)| ≤ K θ − θ 0 +η for some K > 0.
According to Theorem 4.4 we have (n∆ n ) 1/2 D (n) ρ, * (θ) → ∞ in probability for any θ ∈ (θ 0 , 1]. Consequently, if the deterministic sequence κ n → ∞ is chosen appropriately, the statistic should satisfy Thus, we define the estimator for the change point bŷ (4.14) The theorem below establishes consistency of the estimatorθ (n) ρ under mild additional assumptions on the sequence (κ n ) n∈N .
Theorem 4.7 describes how the curvature of D in θ 0 yields a better rate of convergence. However, the estimator depends on the choice of the threshold level κ n and we explain below how to choose this sequence with bootstrap methods in order to control the probability of overand underestimation. But before that the following theorem investigates the mean squared error Recall the definition of H (n) ρ in (4.10) and define which is an upper bound for the distance between the estimator D (n) ρ, * (θ) and the true value D (g 0 ) ρ (θ). For a sequence α n → ∞ with α n = o(κ n ) we decompose the MSE into which can be considered as the MSE due to small and large estimation error.
In the following we discuss the choice of the regularizing sequence κ n for the estimatorθ (n) ρ in order to control the probability of over-and underestimation of the change point θ 0 ∈ (0, 1). Let θ * n be a preliminary consistent estimate of θ 0 . For example, if (4.13) holds for some > 0, one can takeθ * n =θ (n) ρ (κ n ) for a sequence κ n → ∞ satisfying the assumptions of Theorem 4.7. In the sequel, let B ∈ N be some large number and let (ξ (b) ) b=1,...,B denote independent sequences of random variables, ξ (b) := (ξ (b) j ) j∈N , satisfying Assumption 3.6. We denote byĤ (n,b) ρ, * the particular bootstrap statistics calculated with respect to the data and the bootstrap multipliers for θ ∈ [0, 1]. With these notations for B, n ∈ N and 0 < r ≤ 1 we define the following empirical distribution function and we denote by n,B (x) ≥ y its pseudo-inverse. Then in the sense of the theorems below the optimal choice of the threshold is given byκ (1 − α). for a confidence level α ∈ (0, 1).
We conclude this section with an example which shows that the expansion (4.13) and the additional assumption N ρ 2 (g 0 ; θ 0 , t 0 ) > 0 of the preceding theorems are satisfied in the situations of Example 2.2 and Example 2.5. A proof for this example can be found in Section 6.7.
(1) Recall the situation of an abrupt change considered in Example 4.2. In this case it follows from (4.6) that Moreover, if ν 1 = 0 and the function ρ meets Assumption 2.3(a3), the transition kernel given by (4.5) satisfies the additional assumption N ρ 2 (g 0 ; θ 0 , t 0 ) > 0 in Theorem 4.9 and Theorem 4.10 for some t 0 ∈ R.
(2) In the situation discussed in Example 4.3 let for y ∈ U and t ∈ R. Then we have denotes the k-th partial derivative ofN with respect to y at (θ 0 , t), which is a bounded function on R. Furthermore, there exists a ι > 0 such that

Testing for a gradual change
In Section 3 we introduced change point tests for the situation of an abrupt change as in Example 2.2, where the jump behaviour is assumed to be constant before and after the change point. In this section we illustrate a reasonable way to derive test procedures for the existence of a gradual change in the data. In order to formulate suitable hypotheses for a gradual change point recall the definition of the measure of time variation for the entire jump behaviour D (g 0 ) ρ in (4.1) and define for t 0 ∈ R and θ ∈ [0, 1] the quantities We test the null hypothesis H 0 : Assumption 2.3 is satisfied with g 1 = g 2 = 0 and there exists a Lévy measure ν 0 such that g 0 (y, dz) = ν 0 (dz) holds for Lebesgue almost every y ∈ [0, 1].
versus the general alternative of non-constant jump behaviour H * 1 : Assumption 2.3 holds with g 1 = g 2 = 0 and we have D If one is interested in gradual changes in N ρ (ν (n) s ; t 0 ) for a fixed t 0 ∈ R, one can consider the corresponding alternative H * 1 (t 0 ): Assumption 2.3 is satisfied with g 1 = g 2 = 0 and we have D Furthermore, we investigate the behaviour of the tests introduced below under local alternatives of the form H (loc) 1 : Assumption 2.3 holds with g 0 (y, dz) = ν 0 (dz) for Lebesgue-a.e. y ∈ [0, 1] for some Lévy measure ν 0 and some transition kernels g 1 , g 2 ∈ G(β, p).
Corollary 4.14. The tests (4.19) and (4.20) are asymptotic level α tests in the following sense: The tests (4.19) and (4.20) are also consistent under the fixed alternatives H * 1 , H * 1 (t 0 ) in the sense of the following proposition.

Finite-sample properties
In this section we present the results of an extensive simulation study assessing the finite-sample properties of the new statistical procedures. The design of this study is as follows: • We apply our estimators and test statistics to n data points {X ∆n , . . . , X n∆n } as realizations of an Itō semimartingale (X t ) t∈R + with characteristics (b, σ, ν s ). For the sample size we choose either n = 10000 or n = 22500, where for the effective sample size we consider the choices k n := n∆ n = 50, 100, 200 in the case n = 10000 resulting in frequencies ∆ −1 n = 200, 100, 50 and in the case n = 22500 we consider k n = n∆ n = 50, 75, 100, 150, 250 resulting in ∆ −1 n = 450, 300, 225, 150, 90.
• Corresponding to our basic rescaling assumption (2.1) the jump characteristic satisfies where the transition kernel g(y, dz) is given by and g(y, (−∞, z]) = 0 for all z < 0.
• In order to simulate data points {X ∆n , . . . , X n∆n } including an abrupt change we choose for θ 0 ∈ (0, 1), ψ ≥ 1 and we use a modification of Algorithm 6.13 in Cont and Tankov (2004) to simulate pure jump Itō semimartingales under H 0 , i.e. for ψ = 1. Under the alternative of an abrupt change, i.e. for ψ > 1, we merge two paths of independent semimartingales together.
• A gradual change in the jump characteristic is realized by choosing in (5.1) for some θ 0 ∈ [0, 1], A > 0 and w > 0. In order to obtain pure jump Itō semimartingale data according to this model we sample 15 times more frequently, i.e. for j ∈ {1, . . . , 15n} we use a modification of Algorithm 6.13 in Cont and Tankov (2004) to simulate an increment Z j =X where ν (j) (dz) = g(j/(15n), dz). For the resulting data vector {X ∆n , . . . , X n∆n } we use  Table 1: Test procedures (3.8), (3.9) and (3.10) under H 0 . Simulated rejection probabilities in the application of the test (3.8), the test (3.9) and the test (3.10) using 500 pure jump subordinator data vectors under the null hypothesis.
• In order to investigate the performance of our truncation method we either use the plain pure jump data vector {X ∆n , . . . , X n∆n } as described above, resulting in the characteristics b = σ = 0 for the continuous part, or we use In the graphics depicted below the results for pure jump data are presented on the left-hand side, while the results including a continuous component are always placed on the right-hand side.
• For the truncation sequence v n = γ∆ w n we choose γ = 1 and w = 3/4 in each run resulting in the parameter τ = 2/15 in Assumption 2.3.
• Due to computational reasons we approximate the supremum in t ∈ R by taking the maximum either over the finite grid T 1 := {0.1 · j | j = 1, . . . , 30} or the finite grid • For the function ρ we use ρ L,p from (2.8) in Example 2.5 with parameters L = 1 and p = 2.
• Each combination of parameters we present below is run 500 times and if the statistical procedure includes a bootstrap method we always use B = 200 bootstrap replications. In order to illustrate the power of our test procedures we display simulated rejection probabilities, i.e. the mean of the 500 test results. Furthermore, we measure the performance of our estimators by mean absolute deviation, i.e. if Θ = {θ 1 , . . . ,θ 500 } is the set of obtained estimation results we depict where θ 0 is the location of the change point.  Table 2: Test procedures (3.8), (3.9) and (3.10) under H 0 . Simulated rejection probabilities in the application of the test (3.8), the test (3.9) and the test (3.10) using 500 pure jump subordinator data vectors plus a drift and plus a Brownian motion under the null hypothesis.

Finite-sample performance of the procedures in Section 3
In order to demonstrate the performance of the procedures introduced in Section 3 we choose the sample size n = 22500 and the grid T 1 = {0.1 · j | j = 1, . . . , 30} to approximate the supremum in t ∈ R. The confidence level of the test procedures is α = 5% in each run.
Simulated rejection probabilities for the tests (3.8), (3.9) and (3.10) Table 1 and Table 2 show a reasonable approximation of the nominal level α = 0.05 of the tests (3.8), (3.9) and (3.10) under H 0 . Test (3.10) appears to be slightly more conservative than the test (3.9). Moreover, in Table 2 the underlying process includes a continuous component with b = σ = 1 and the results are similar to Table 1.
In Figure 1 we depict rejection probabilities for the test (3.8) for different effective sample sizes k n = n∆ n . The factor of jump size corresponds to ψ in (5.2) and the dashed red line indicates the nominal level α = 5%. The change point is located at θ 0 = 0.5. The results are as expected: Large differences of the factor of jump size before and after the change yield higher rejection probabilities. Moreover, due to better approximations the relative frequencies of rejections increase with the effective sample size k n = n∆ n . Notice also that the results for pure jump Itō semimartingales and for data including a continuous component are almost the same. This fact indicates a reasonable performance of the proposed truncation technique for an ordinary sample size n = 22500. Figure 2 shows rejection probabilities for varying locations of the change point θ 0 , where ψ = 4 in (5.2). Our results illustrate that an abrupt change can be detected best, if it is located close to the middle of the data set, i.e. for θ 0 ≈ 0.5. Furthermore, in this case the power of the test is increasing in the effective sample size k n = n∆ n as well and the performance for data including a continuous component is nearly the same.
In Figure 3 we display relative frequencies of rejection for different values of the parameter p ∈ [2, 20] of the function ρ 1,p defined in (2.8). This function is used to calculate the process for certain values of θ ∈ [0, 1] and t ∈ T 1 . In each run the change point is located at θ 0 = 0.5 and we have ψ = 3 in (5.2). The results suggest to use the lowest possible value of the parameter p in order to obtain the maximum power of the test. Again, the rejection probabilities of the test are nearly unaffected by the presence of a Brownian component.
In Figure 4 we depict rejection probabilities of the tests (3.9) and (3.10) for different values of t 0 ∈ [0.1, 50]. In the underlying model (5.1) we use η(y) defined in (5.2) with θ 0 = 0.5 and ψ = 3. We observe that test (3.9) has slightly more power than test (3.10) and the power of both tests is increasing for small values of t 0 . The latter can be explained by the fact that less increments of the underlying Itō semimartingale which take values in the interval (v n , t 0 ] are used to calculate the test statistics. The effect is even more significant when a Brownian component is present (right panel). In this case it is more difficult to detect a change, because of the superposition of small increments with an i.i.d. sequence of random variables following a normal distribution with variance ∆ n (see also Figure 3 in Bücher et al. (2017)). Furthermore, one can show (see, for instance, Lemma 6.3 in Hoffmann et al. (2017)) that in the case of a pure jump Itō semimartingales the probability of the event that m increments exceed the value t 0 is bounded by Kt −m/2 0 . As a consequence, for large t 0 the power of both tests reaches a saturation, because only a negligible proportion of increments exceed t 0 .
Finite-sample performance of the argmax-estimator (3.16) In Figure 5 we display mean absolute deviations of estimator (3.16) for different values ψ ∈ [1, 5] in (5.2), where the true change point is located at θ 0 = 0.5. The results correspond to Figure 1 in the sense that large values of ψ yield a better performance of the statistical procedure. Because of better approximations the mean absolute deviation is also decreasing in the effective sample size k n = n∆ n . Additionally, we also observe in the estimation results that due to the truncation approach the mean absolute deviation is nearly unaffected by the presence of a Brownian component. Figure 6 shows mean absolute deviations of estimator (3.16) for different locations of the change point θ 0 ∈ (0, 1), where we choose ψ = 3 in (5.2). Similar to Figure 2 the results suggest that a change point can be detected best if it is located around the middle of the data set, i.e. if θ 0 ≈ 0.5. Furthermore, as in Figure 5 the estimation error is decreasing in the effective sample size k n .

Finite-sample performance of the procedures in Section 4
In this section we investigate the finite-sample performance of the statistical procedures introduced in Section 4.
For computational reasons the results on the tests (4.19) and (4.20) depicted below are obtained for a sample size n = 10000 and effective sample size k n ∈ {50, 100, 200}. In each run we choose the confidence level α = 5% and the supremum in t ∈ R is approximated by the maximum over the finite grid T 1 = {0.1 · j | j = 1, . . . , 30}.

Test (4.19)
Test (4.20)   Figure 7 shows simulated rejection probabilities of the test (4.19) for different degrees of smoothness of the change w in (5.3). The change is located at θ 0 = 0.4 and A is chosen such that the characteristic quantity for a gradual change satisfies D (g) ρ (1) = 3 in each scenario. As expected, it is more difficult to distinguish a very smooth change from the null hypothesis and therefore the rejection probability is decreasing in w. Similar to the CUSUM test investigated in Section 5.1 the power of the test is increasing in k n = n∆ n as well. Furthermore, all simulation results on the tests (4.19) and (4.20) are similar for pure jump processes and processes including a Brownian component. This indicates that our truncation approach also works in this setup.
In Figure 8 we depict rejection rates of the test (4.19) for different locations of the change point θ 0 ∈ (0, 1). We simulate a linear change, i.e. we have w = 1 in (5.3), and A is chosen such that D (g) ρ (1) = 0.3 holds in each run. As before, the power of the test is increasing in the effec- tive sample size k n = n∆ n and moreover it is decreasing in θ 0 . The latter can be explained by the shape of the model (5.3): For large θ 0 the jump characteristic is "close" to the null hypothesis.
From the theoretical standpoint of Section 4.3 we have to ensure that the preliminary estimatê θ (pr) in Step 1 is consistent in order to guarantee consistency of the final estimateθ. If not declared otherwise, we always make the "arbitrary" choiceθ (pr) = 0.1 for two reasons: First, a simulation study which is not included in this paper, where the estimation procedure is started in Step 2 with the choiceκ (in) = 3 √ n∆ n (which yields consistency according to Theorem 4.7) has shown similar results as the ones depicted below. Secondly, with the small choice ofθ (pr) = 0.1 in Step 1 we obtain smaller values of the thresholdsκ (in) ,κ (f i) and this reduces the calculation time. Furthermore, in the following simulation study we choose the sample size n = 22500 and we vary the effective sample size in k n = n∆ n ∈ {50, 100, 250}. For the evaluation of (4.16) we always use α = 10% and for computational reasons suprema in t ∈ R are approximated by maxima over t ∈ T 2 = {0.1 + j · 0.3 | j = 0, 1, . . . , 9}. If not declared otherwise, we simulate a linear change, i.e. w = 1 in (5.3), which is located at θ 0 = 0.4. A is always chosen such that the characteristic quantity for a gradual change satisfies D (g) ρ (1) = 3 in all scenarios. Figure 9 shows mean absolute deviations for different choices of r ∈ (0, 1] in Step 1. The graphics suggest that in all cases the mean absolute deviation for r = 0.3 is close to its overall minimum. Thus, we choose r = 0.3 in Step 1 in all following investigations. In Figure 10 we depict mean absolute deviations of the estimator (4.14) for different choices of the preliminary estimateθ (pr) ∈ (0, 1) in Step 1. The estimation error is smallest, if the preliminary estimate is chosen close to 1. This finding corresponds to another simulation study which is not included below and which has shown that the estimation procedure (4.14) tends to underestimate the change point. As a consequence,θ (pr) close to 1 induces larger values of the quantitiesκ (in) ,θ (in) ,κ (f i) in Steps 2-4 and prevents the underestimation error. Figure 11 shows simulated mean absolute deviations of estimator (4.14) for different degrees of smoothness of the change w in (5.3). The results correspond to Figure 7 and confirm the intuitive idea that a smooth change is more difficult to detect. Moreover, better approximations  for large effective sample sizes k n = n∆ n reduce the estimation error.
In Figure 12 we display simulated mean absolute deviations of estimatorθ (n) ρ for different locations of the change point θ 0 ∈ (0, 1) in (5.3). The results correspond to Figure 8 and show that for small values of θ 0 the change point can be detected best. This is a consequence of model (5.3): For large θ 0 ∈ (0, 1) the jump behaviour is nearly constant.

Proofs and technical details
Before we prove the results of Section 3 and Section 4 we begin with a generalization of the results in Hoffmann and Vetter (2017).
Additionally, the sample paths of G ρ are almost surely uniformly continuous with respect to the semimetric Theorem 6.1 is a generalization of  in the sense that their Theorem 3.1 can be obtained if we set g 0 (y, dz) = ν(dz) for a Lévy measure ν and g 1 = g 2 = 0. In order to prove Theorem 6.1 we divide the process G (n) ρ into two parts which correspond to large and small jumps of the underlying process X (n) , respectively. To this end we choose an auxiliary function Ψ : for t, z ∈ R and define the following empirical processes: Then, of course, we have G ρ,n (θ, t). In Section 6.4 we show that it suffices to prove three auxiliary lemmas in order to establish Theorem 6.1. The first one is concerned with the behaviour of the large jumps, i.e. it holds for G (α) ρ,n and a fixed α > 0.
Lemma 6.2. If Assumption 2.3 is satisfied, we have weak convergence where G ρα denotes a tight centered Gaussian process with covariance function The sample paths of G ρα are almost surely uniformly continuous with respect to the semimetric The general idea behind the proof of Lemma 6.2 is to replace the increments of the underlying process X (n) by increments of pure jump Itō semimartingales. Precisely, let µ (n) be the Poisson random measure associated with the jumps of X (n) . Then we consider with the truncation v n = γ∆ w n as above. The main advantage of the processes L (n) is that they have deterministic characteristics. Therefore, their increments are independent (see Theorem II.4.15 in Jacod and Shiryaev (2002)) and we can use a central limit theorem tailored for triangular arrays of independent stochastic processes from Kosorok (2008) to prove weak convergence of to G ρα , where (θ, t) ∈ [0, 1] × R and where f : R → R is a bounded continuous function, for which we plug in ρ α and ρ • α later. In order to prove Lemma 6.2 we need to ensure that the distance between Y (n) ρα and G (α) ρ,n is small. To this end, our next claim shows that the bias due to estimating (n∆ n ) −1 is small compared to the rate of convergence. In order to state the result recall that for a real-valued non-negative function f : Ξ → R + on a measure space (Ξ, B, ϑ) the essential supremum with respect to ϑ is given by Moreover, recall that λ 1 denotes the restriction of the one-dimensional Lebesgue measure to [0, 1].
Then if v n > 0 is a null sequence and L (n) is defined as in (6.4) we have The following proposition establishes the desired weak convergence of Y Proposition 6.4. Suppose Assumption 2.3 is satisfied and let f : In order to obtain the result from Theorem 6.1 the following lemma ensures that the limiting process G ρα converges in a suitable sense as α → 0.
Lemma 6.5. Under Assumption 2.3 the weak convergence Its proof is a direct consequence of the following result.
Proposition 6.6. Suppose Assumption 2.3 is satisfied and let f n : R → R (n ∈ N 0 ) be Borel measurable functions with |f n (z)| ≤ K(1 ∧ |z| p ) for a constant K > 0 and all n ∈ N 0 , z ∈ R.
Assume further that f n (z) → f 0 (z) converges for all z outside a set B ∈ B such that [0, 1] × B is a g 0 (y, dz)dy-null set. Then we have weak convergence Our final lemma shows that the contribution due to small jumps are uniformly small as α tends to zero.
Lemma 6.7. Suppose Assumption 2.3 is satisfied. Then for each η > 0 we have:

The multiplier bootstrap approach
In this section we investigate a bootstrapped version of G (n) ρ given bŷ for (θ, t) ∈ [0, 1] × R and where the sequence of multipliers (ξ i ) i∈N satisfies Assumption 3.6. The bootstrap results in Section 3 and Section 4 are a simple consequence of Proposition 10.7 in Kosorok (2008) and the following theorem, which establishes weak convergence conditional on the data in probability ofĜ (n) ρ .
Theorem 6.8. Let Assumption 2.3 be valid and let the multipliers (ξ i ) i∈N satisfy Assumption 3.6. Then we haveĜ where G ρ is the tight mean zero Gaussian process of Theorem 6.1.
Similar to the proof of Theorem 6.1 we show Theorem 6.8 by treating small and large increments of X (n) separately. Therefore, with the quantities defined prior to (6.3) we consider the processesĜ Section 6.5 reveals that the claim of Theorem 6.8 can be obtained using the following two lemmas which state weak convergence conditional on the data in probability of the previously established processes for fixed α > 0.
Lemma 6.9. If Assumption 2.3 and Assumption 3.6 are satisfied, we havê Lemma 6.10. Suppose Assumption 2.3 and Assumption 3.6 are valid. Then for each α > 0 in a neighbourhood of 0Ĝ The lemmas above will be verified by approximating the truncated increments of the underlying processes by the increments of the pure jump Itō semimartingales from (6.4) with the usual truncation v n = γ∆ w n . The main advantage of the processes L (n) is the fact, that they have deterministic characteristics and therefore independent increments. As a consequence, we can use a result from Kosorok (2008) for triangular arrays of processes which are independent within rows to prove weak convergence conditional on the data in probability of the bootstrapped analogs of Y (n) f from (6.5) which are given bŷ for (θ, t) ∈ [0, 1] × R and where f : R → R is a bounded continuous function. Precisely, the following proposition is the main tool in order to obtain Lemma 6.9 and Lemma 6.10.
Proposition 6.11. Suppose Assumption 2.3 and Assumption 3.6 are satisfied. Then for a continuous function f : , where G f is the tight mean zero Gaussian process defined in Theorem 6.1.

Alternative Assumptions
All results in this paper also hold under the weaker assumptions given below. Here and throughout the following proofs, K or K(δ) denote generic constants which sometimes depend on an auxiliary quantity δ and may change from place to place.
Assumption 6.12. At step n ∈ N we observe an Itō semimartingale X (n) adapted to the filtration of some filtered probability space (Ω, F, (F t ) t∈R + , P) with characteristics (b (n) s , σ (n) s , ν (n) s ) at the equidistant time points {i∆ n | i = 0, 1, . . . , n}. Furthermore, the following assumptions are satisfied: (a) Assumptions on the jump characteristic and the function ρ: For each n ∈ N and s ∈ [0, n∆ n ] we have and for each y ∈ [0, 1], B ∈ B and n ∈ N the kernel R n satisfies R n (y, B) ≤ a n g 2 (y, B) for a sequence a n = o((n∆ n ) −1/2 ) of non-negative real numbers. Furthermore, we have (2) ρ : R → R is a bounded C 1 -function with ρ(0) = 0. Furthermore, there exists some p > β + (β ∨ 1) such that the derivative satisfies |ρ (z)| ≤ K|z| p−1 for all z ∈ R and some K > 0. ( holds for n ∈ N sufficiently large, where λ 2 denotes the restriction of the twodimensional Lebesgue measure to the measure space ([0, 1] 2 , [0, 1] 2 ∩ L 2 ) with the two-dimensional Lebesgue σ-algebra L 2 on R 2 . (II) For each α > 0 there is a K(α) > 0 such that for every choice m 1 , m 2 ∈ {g 0 , g 1 , g 2 } we have for n ∈ N large enough with the constants from (a(4)I).
(b) Assumptions on the truncation sequence v n and the observation scheme: We have v n = γ∆ w n for some γ > 0 and w satisfying Furthermore, the observation scheme satisfies with the constants from the previous assumptions: (1) ∆ n → 0, (2) n∆ n → ∞, (c) Assumptions on the drift and the diffusion coefficient: In the sequel, we will work with Assumption 6.12 without further mention. This is due to the following result which proves that Assumption 2.3 implies the set of conditions above.
Proof of Proposition 6.3. Let F n = {z : |z| > v n },Ñ (n) = 1 F n (z) µ (n) and let i ∈ {1, . . . , n} be fixed in the entire proof. According to Proposition II.1.14 in Jacod and Shiryaev (2002) for each n ∈ N there exist a thin random set D n with an exhausting sequence of stopping times (T (n) m ) m∈N and an R-valued optional process ξ (n) such that where (s,x) denotes the Dirac measure with mass in (s, x) ∈ R + × R. Furthermore, due to Lemma A.13Ñ (n) t 1 follows a Poisson distribution for 0 ≤ t 1 ≤ t 2 and the setsÃ (i) where M C denotes the complement of a set M . Thus, we calculate for n ∈ N large enough where the inequality above follows because of two reasons, first (6.19) as well as the fact that f is bounded lead to the term K∆ 2 n v −2β n and secondly for each ω ∈ Ñ (n) i∆n −Ñ (n) (i−1)∆n = 0 and m ∈ N we have T (n) m (ω), ξ (n) (ω) / ∈ ((i − 1)∆ n , i∆ n ] × F n such that ∆ n i L (n) (ω) = 0 and thus f (∆ n i L (n) (ω)) = 0 by the assumptions on f . However, T (n) m (ω), ξ (n) (ω) ∈ ((i − 1)∆ n , i∆ n ] × F n holds for exactly one m ∈ N if ω ∈ Ñ (n) i∆n −Ñ (n) (i−1)∆n = 1 . This observation yields the following bound We apply the defining relation of the predictable compensator of an optional P -σ-finite random measure. But notice that it cannot be guaranteed that the integrand in the stochastic integral with respect to µ (n) in the first line of (6.20) is P -measurable. Therefore, we treat the leading term after the last inequality sign in (6.20) and δ (i,t) n separately. However, the integrand f (z)1 (−∞,t] (z)1 {|z|>vn} 1 ((i−1)∆n,i∆n] (s) on the right-hand side of (6.20) is P -measurable. Thus, Theorem II.1.8 in Jacod and Shiryaev (2002) yields Now, because of |f (z)| ≤ K|z| p on a neighbourhood of 0, the above display yields for n ∈ N large enough Finally, (6.18) and the assumption that f is bounded by some constant K > 0 gives an estimate for δ (i,t) n from (6.21) where #M denotes the cardinality of a set M . With Lemma A.13 and the previous inequality we obtain n holds and (6.22) yields (6.6), because neither of the bounds for γ (i,t) n or δ (i,t) n depends on i or t.
with s i 1 ,n (ω) = 1 √ n∆n f (∆ n i 1 L (n) (ω)) and s i 2 ,n (ω) = 1 √ n∆n f (∆ n i 2 L (n) (ω)). Consequently, in the sense of Definition 4.2 in Pollard (1990), for every s ∈ R 2 no proper coordinate projection of G nω can surround s and therefore G nω has a pseudo dimension of at most 1 (Definition 4.3 in Pollard (1990)). Thus, by Corollary 4.10 in the same reference, there exist constants A and W which depend only on the pseudodimension such that for all 0 < x ≤ 1, n ∈ N, ω ∈ Ω and each rescaling vector α ∈ R n with non-negative entries, where · 2 denotes the Euclidean distance on R n , D 2 denotes the packing number with respect to the Euclidean distance and denotes coordinate-wise multiplication. Obviously, we have Then, for any i 1 , i 2 ∈ {1, . . . , n}, the projection p i 1 ,i 2 (H nω ) of H nω onto the i 1 -th and the i 2 -th coordinate is either {(0, 0), (1, 0), (1, 1)} or {(0, 0), (0, 1), (1, 1)}. Therefore, the same reasoning as above shows that H nω is a set of pseudodimension at most one, whence the triangular array {h ni } is manageable with envelopes {H ni }.
Proof of (E). We have n∆ n → ∞. Thus, for > 0, we can choose So for n ≥ N the integrand satisfies G 2 ni 1 {G ni > } = 0 for all 1 ≤ i ≤ n and this yields the assertion.
Proof of (F). Due to symmetry of the semimetrics let θ 1 ≤ θ 2 without loss of generality.
The decomposition below is similar to Step 5 in the proof of Theorem 13.1.1 in Jacod and Protter (2012) and it will occur frequently in the sequel. With the constants from Assumption 6.12 let ∈ R satisfy 1 < < 1 2βw ∧ (1 + ) and also < 2(p − 1)w − 1 2(β − 1)w if β > 1, with an > 0 for which Assumption 6.12(b6) holds. Then we set u n = (v n ) and F n = {z : |z| > u n } as well asX In the following proofs it is necessary to ensure that with high probability at most one large jump occurs and the increments of the remaining part, that is the quantities ∆ n iX n , are small.
To this end, we show in Lemma A.4 that P(Q n ) → 1 as n → ∞ for the sets Proof of Lemma 6.2. Let α > 0 be fixed and recall the definition of the processes L (n) = (z1 {|z|>vn} ) µ (n) in (6.4). Due to Proposition 6.3 and Proposition 6.4 the processes converge weakly to G ρα in ∞ ([0, 1] × R), because by Assumption 6.12(a1) we have λ 1 − ess sup y∈[0,1] t −∞ ρ α (z)g (n) (y, dz) ≤ K for all n ∈ N and thus sup (θ,t)∈[0,1]×R holds for some small δ > 0. The final equality in the display above follows using 1 − 2βw > 0, p − β > 1, as well as Assumption 6.12(b4) and (b6). As a consequence, it suffices to show According to Lemma A.9 we have for n ∈ N large enough such that v n ≤ α/4 where the processes in the display above are defined in (6.23) and where K > 0 denotes a bound for ρ. Therefore, due to P(Q n ) → 1 it is enough to show C n (α) = o P (1) and D n (α) = o P (1) in order to verify (6.26) and to complete the proof of Lemma 6.2. First, we consider D n (α). For later reasons, we let f be either ρ α or ρ • α . Then there exists a constant K > 0 which depends only on α, such that we have for x, z ∈ R and v > 0: Note that for |x + z| > v and |x| > v we use the mean value theorem and |z| ≤ |x| as well as | df dx (x)| ≤ K|x| p−1 for all x ∈ R by the assumptions on ρ and because the derivatives of Ψ α and Ψ • α have a compact support, which is bounded away from 0. In all other cases in which the left hand side does not vanish we have |z| ≤ |x| ≤ 2v as well as |f (x)| ≤ K|x| p for all x ∈ R by another application of the mean value theorem and the assumptions on ρ. Consequently, ED n (α) ≤ a n (α) + b n (α) (6.29) holds for and we conclude D n (α) = o P (1) because of Lemma A.17. Finally, we show C n (α) = o P (1). To this end, we define for 1 ≤ i, j ≤ n with i = j and the constant r in Assumption 6.12 as well as the sets J (1) n (α) by: i,j (α). (6.31) Then according to Lemma A.6 we have P J (1) n (α) → 1. Moreover, Lemma A.8 shows that for all n ∈ N, ω ∈ J (1) n (α) ∩ Q n and t ∈ R the random set has at most c n := (v n /∆ r n ) + 1 elements. Consequently, on J (1) n (α) ∩ Q n for each t ∈ R at most c n summands in the sum of the definition of C n (α) can be equal to 1 and we conclude C n (α) ≤ K/ n∆ (1+2(r−w))∨1 n , on J (1) n (α)∩Q n . Thus, C n = o P (1) follows using Assumption 6.12(b7) as well as P J (1) n (α)∩Q n → 1.
As a consequence, the triangular array {μ ni } is separable and therefore AMS by Lemma 11.15 in Kosorok (2008).

Proof of (H). Simple calculations show
Thus (6.42), (6.43) and (6.44) give |f | converges weakly to the tight process G |f | in ∞ ([0, 1] × R) by Proposition 6.4. Proof of (I). According to Theorem 11.17 in Kosorok (2008), it suffices to verify that the triangular arrays which are bounded with envelope vector (F n1 (ω), . . . ,F nn (ω)). Butμ ni does not depend on i, such that every coordinate projection of F nω onto two coordinates i 1 , i 2 ∈ {1, . . . , n} is a subset of the straight line {(x, y) ∈ R 2 | x = y}. Consequently, in the sense of Definition 4.2 in Pollard (1990), for every s ∈ R 2 no proper coordinate projection of F nω can surround s and therefore F nω has a pseudo dimension of at most 1 (Definition 4.3 in Pollard (1990)). Now the managebility of the triangular array {μ ni } follows with the same reasoning as in the verification of (B) in the proof of Proposition 6.4.
Proof of (J). The envelopes {F ni } are independent of i as well. Therefore, with Lemma A.20 we obtain because the processes L (n) have independent increments. As a consequence, we have in fact n i=1 [F ni ] 2 = o P (1), which proves the claim. Proof of Step (b). We have As an immediate consequence of Lemma A.20 we obtain sup t∈R |U n (t)| = o P (1). Furthermore, the (ξ i ) i∈N are i.i.d. with mean zero and variance one, so it is well known from empirical process theory (see for instance Theorem 2.5.2 and Theorem 2.12.1 in Van der Vaart and Wellner (1996)) that 1/ √ n × nθ i=1 ξ i converges weakly to a Brownian motion in ∞ ([0, 1]). The law of a Brownian motion is tight in ∞ ([0, 1]) (see for example Section 8 in Billingsley (1999)) and thus in outer probability. Proof of Lemma 6.9. By Proposition 6.11 we haveŶ (n) ρα ξ G ρα for each fixed α > 0 and therefore due to Lemma C.1 it only remains to show that the term for (θ, t) ∈ [0, 1] × R, converges to 0 in ∞ ([0, 1] × R) in outer probability. Consequently, it suffices to showV for each fixed α > 0, wherê To this end, Lemma A.11 yields the estimatê on J (1) n (α)∩Q n for n ∈ N large enough such that v n ≤ α/4, where Q n is defined in (6.24), J (1) n (α) is defined in (6.31) and witĥ A∈Sn i∈A ξ i a n i (α) , , where the quantities in the display above are introduced in (6.23) and where S n = {M ⊂ {1, . . . , n} | #M ≤ c n } with c n = (v n /∆ r n ) + 1 . Lemma A.4 and Lemma A.6 show P(J (1) n (α) ∩ Q n ) → 1 and thus it is further enough to verifŷ for each α > 0 as n → ∞.
Recall the quantity D n (α) introduced in (6.27). (6.29) and Lemma A.17 yield ED n (α) → 0. Moreover, the bootstrap multipliers have variance 1 and satisfy therefore E|ξ i | ≤ 1 for all i ∈ N. Thus because of the independence of the multipliers and the other involved processes we obtain 0 ≤ ED n (α) ≤ ED n (α) → 0, which proves (6.46). Concerning (6.47) we have for n ∈ N large enough for some K(α) > 0, all m ∈ N and all i = 1, . . . , n by virtue of Lemma A.14. Thus, using Assumption 3.6 as well as independence of ξ i and a n i (α) we obtain for every integer m ≥ 2 and n ∈ N large enough for some constants C 1 , C 2 > 0 and where for n ∈ N, α > 0, i = 1, . . . , n. Furthermore, due to the definition ofX (α) n in (6.23) the variables (Z n i (α)) i=1,...,n are independent with mean zero. Consequently, Lemma A.16 shows which proves (6.47). In order to show (6.48) observe first that for n ∈ N large enough holds for each i = 1, . . . , n on the set Q n , because by (6.24) we have |∆ n iX n | ≤ v n /2 on Q n . Therefore, we obtain from the mean value theorem for large n on the set Q n b n i (α) = a n i (α) + for some ζ n i (α) between ∆ n iX (α) n and ∆ n iX n + ∆ n iX (α) n . Thus, the indicators and Assumption 6.12(a2) show for large n ∈ N The bootstrap multipliers are defined on a distinct probability space and satisfy E|ξ i | ≤ 1 for all i ∈ N. As a consequence, (6.49) gives Therefore, (6.48) follows from (6.47), Lemma A.4 and Lemma A.17.
Proof of Lemma 6.10. Due to Proposition 6.11 we haveŶ for each α > 0 in a neighbourhood of zero. Simple manipulations givê for (θ, t) ∈ [0, 1] × R. As a consequence, it only remains to show that for each α > 0 in a neighbourhood of 0 witĥ Due to Lemma A.12 for each α > 0 and n ∈ N large enough such that v n ≤ α we have the bound on J (2) n (α) ∩ Q n , where Q n is defined in (6.24), J (2) n (α) is defined in (6.37) and witĥ where the processes involved in the display above are introduced in (6.23), v > 0 is the constant from Assumption 6.12(a(4)I), ς n i (α) = ∆ n iX n +∆ n iX (8α) n , S n = {M ⊂ {1, . . . , n} | #M ≤ c n } for c n = (v n /∆ r n ) + 1 and with Lemma A.4 and Lemma A.7 show P(J (2) n (α) ∩ Q n ) → 1 for each α > 0 small enough and consequently it suffices to verifyĈ for all α > 0 as n → ∞. Concerning (6.51), we have due to the triangle inequality and |ρ for fixed α > 0 on the set Q n and where ς n i (α) = ∆ n iX n + ∆ n iX (8α) n . Consequently, because of E|ξ i | ≤ 1 for each i ∈ N and the fact that the multipliers are defined on a distinct probability space we obtain Thus, (6.51) follows using P(Q n ) → 1 and Lemma A.19.
Recall the quantity D • n (α) introduced in (6.33). Because of (6.38), Lemma A.18 and the fact that the multipliers (ξ i ) i∈N are independent of the other involved quantities and satisfy E|ξ i | ≤ 1 we obtain 0 ≤ ED • n (α) ≤ ED • n (α) → 0, which proves (6.52). In order to show (6.53) we have for α > 0 and n ∈ N large enough for some K > 0, all m ∈ N and all i = 1, . . . , n using Lemma A.15. Thus, with Assumption 3.6 as well as independence of ξ i andā n i (α) we obtain for every integer m ≥ 2 and n ∈ N large enough for some constants C 1 , C 2 > 0, whereZ for n ∈ N, α > 0 and i = 1, . . . , n. Furthermore, due to the definition ofX (8α) n in (6.23) the variables (Z n i (α)) i=1,...,n are independent with mean zero. Consequently, Lemma A.16 shows which proves (6.53). Concerning (6.54) notice first of all that we haveā n i (α) =b n i (α) = 0 on the set {|∆ n iX (8α) n | ≤ v n /2}∩Q n , because of the indicators in the definition of these terms and the fact that |∆ n iX n | ≤ v n /2 holds for each i = 1, . . . , n on Q n according to (6.24). Therefore, we have for arbitrary i ∈ {1, . . . , n} (6.55) For the first summand on the right-hand side of (6.55) the triangle inequality gives Due to Assumption 6.12(a2) we have |ρ(z)| ≤ K|z| p for all z ∈ R and some K > 0. Thus, because |∆ n iX n | ≤ |∆ n iX (8α) n | holds on {v n /2 < |∆ n iX (8α) n |} ∩ Q n we further obtain The fact that |∆ n iX n | ≤ v n /2 for all i = 1, . . . , n on Q n yields for the second summand in (6.55) The derivative of the function Ψ • α from (6.3) is supported by a compact set which is bounded away from the origin. Therefore, by Assumption 6.12(a2) there exists a constant K > 0, which may depend on α, such that the derivative satisfies | d dz ρ • α (z)| ≤ K|z| p−1 . As a consequence, we have due to the mean value theorem and |∆ n iX Finally we conclude with (6.55), (6.56) and (6.57) where h(Ĝ (n) ρ ) * and h(Ĝ (n) ρ ) * denote a minimal measurable majorant and a maximal measurable minorant with respect to the joint data, respectively. In order to show (6.59) observe that by the properties of bounded Lipschitz functions we have for every α > 0, where Thus, due to Lemma 1.2.2(i) in Van der Vaart and Wellner (1996) Notice that the supremum in the definition of Y (α) n is measurable, because the processĜ •(α) ρ,n depends only via nθ on θ ∈ [0, 1] and is right-continuous in t ∈ R. Let ε > 0 be arbitrary. Then due to Proposition B.4 and monotonicity of the integral we obtain EY (α) ≤ ε/4, (6.62) for all α > 0 in a neighbourhood of 0. Moreover, because of Lemma 6.5 and Theorem 1.12.1 in Van der Vaart and Wellner (1996) we have p(α) ≤ ε/4, (6.63) for α > 0 small enough. Thus, choose an α > 0 such that (6.62), (6.63) and Lemma 6.10 hold. Then Lemma 6.9 yields q (α) n P → 0 as n → ∞ and due to Lemma 6.10 we have As a consequence, we obtain (6.59) with (6.61): In order to show (6.60) we have for each h ∈ BL 1 ( ∞ ([0, 1] × R)) and each α > 0 Therefore, applying Lemma 1.2.2(i) in Van der Vaart and Wellner (1996) and the relation −Z * = (−Z) * between the minimal measurable majorant and the maximal measurable minorant of a random element Z several times yields for every h ∈ BL 1 ( ∞ ([0, 1] × R)) and each α > 0. For arbitrary ε > 0 we choose α > 0 such that Lemma 6.10 holds and EY (α) ≤ ε/8. Then as above we see → 0 by Lemma 6.9 as n → ∞. These facts together with (6.64) give (6.60):

Proofs of the results in Section 3
Proof of Theorem 3.1. For each (θ, t) ∈ [0, 1] × R, n ∈ N we have under H (loc) 1 with the mappings h n : Thus, by Assumption 2.3(a) we obtain where the o-term is deterministic. By the same reasoning as in the proof of Theorem 2.6 in Bücher et al. (2017) it can be seen that h n G (n) . As a consequence, Slutsky's lemma (Example 1.4.7 in Van der Vaart and Wellner (1996)) yields the assertion, since the tight process T ρ is separable (see Lemma 1.3.2 in the previously mentioned reference).
Thus, (6.68) is also true for α ∈ (0, 1) ∩ Q. Finally, (3.13) can be shown by exactly the same steps as above and (3.12) is an immediate consequence of the Portmanteau theorem.
holds for every t ∈ Q and each y ∈ [0, θ] outside the Lebesgue null set t∈Q M (t) . According to Assumption 2.3 the function y → (1 ∧ |z| p )g 0 (y, dz) is bounded on [0, 1]. Hence, by Lebesgue's dominated convergence theorem and the assumptions on ρ the quantities on both sides of (6.69) are right-continuous in t ∈ R. As a consequence, (6.69) holds for every t ∈ R and each y ∈ [0, θ] outside the Lebesgue null set t∈Q M (t) . Thus, by the uniqueness theorem for measures the kernel ρ(z)g 0 (y, dz) is Lebesgue almost everywhere on [0, θ] equal to the finite signed measure η θ with measure generating function t → A θ (t) of bounded variation. Now, recall that g 0 (y, dz) does not charge {0}, so by Assumption 2.3(a3) the kernel g 0 (y, dz) is Lebesgue almost everywhere on [0, θ] equal to the measure with density (1/ρ(z))1 {ρ(z) =0} η θ (dz).
Proof of Theorem 4.4. We consider the functional Λ : ×R the mapping Λ is Lipschitz continuous. Thus, due to Theorem 6.1 and the continuous mapping theorem Λ(G (n) ρ ) converges weakly in ∞ (C × R) to the tight mean zero Gaussian process H ρ := Λ(G ρ ), where a simple calculation shows that H ρ has the covariance structure (4.11). Furthermore, we have where the o-term is deterministic and uniform in (ζ, θ, t) ∈ C × R by Assumption 2.3. Finally, the desired weak convergence follows using Slutsky's lemma (Example 1.4.7 in Van der Vaart and Wellner (1996)) and the fact that H ρ is separable as it is tight (see Lemma 1.3.2 in the previously mentioned reference).
Proof of Theorem 4.5. We haveĤ (n) ρ = Λ(Ĝ (n) ρ ) and H ρ = Λ(G ρ ) with the Lipschitz continuous mapping Λ defined in (6.70). Thus, the assertion follows from Proposition 10.7 in Kosorok (2008). Proof of Theorem 4.10. We start with a proof of ϕ * n P → 0 which is equivalent toκ (αn,ρ) n,Bn (r)/ √ n∆ n P → 0. Therefore, we have to show for arbitrary x > 0, by the definition ofκ (αn,ρ) n,Bn (r) in (4.16). Since the are pairwise uncorrelated with mean zero and bounded by 1, we have (6.72) Therefore, in order to prove (6.71), it suffices to verify with Q n the set defined in (6.24). The first inequality in the above display follows with the Markov inequality and the last inequality in (6.73) is a consequence of the fact thatĤ (n) ρ, * (θ * n ) ≤ H (n) ρ, * (1) ≤ 2 sup t∈R sup θ∈[0,1] |Ĝ (n) ρ (θ, t)|. Due to Lemma A.5 we obtain P Q C n ≤ Kn∆ 1+τ n and consequently α −1 n P(Q C n ) → 0. For the second summand on the right-hand side of (6.73) the definition ofĜ (n) ρ in (6.7) gives The final estimate above follows using Lemma A.21, E|ξ i | ≤ 1 for every i = 1, . . . , n and independence of the multipliers and the other involved quantities. Therefore, with the Markov inequality we obtain by the assumptions on the involved sequences. Thus, we conclude β * n P → 0. Next we showκ (αn,ρ) n,Bn (r) P → ∞, which is equivalent to for each x > 0. With the same considerations as for (6.72) it is sufficient to show P P ξ Ĥ (n) ρ, * (θ * n ) > x 1/r ≤ 2α n → 0.
Let t 0 ∈ R with N ρ 2 (θ 0 , t 0 ) > 0. By continuity of the function ζ → N ρ 2 (ζ, t 0 ) we can find 0 <ζ <θ < θ 0 with N ρ 2 (ζ, t 0 ) > 0 (6.74) and because of on the set {θ <θ * n } and consistency of the preliminary estimate it further suffices to prove P P ξ Ĥ (n) ρ, * (θ * n ) > x 1/r ≤ 2α n andθ <θ * n ≤ P P ξ Ĥ (n) ρ (ζ,θ, t 0 ) > x 1/r ≤ 2α n → 0. (6.75) In order to show (6.75) we want to use a Berry-Esseen type result. Recall By the assumptions on the multiplier sequence it is immediate to see that Thus, Theorem 2.1 in Chen and Shao (2001) yields and where Φ denotes the standard normal distribution function. Before we proceed further in the proof of (6.75), we first show Let M > 0. Then a straightforward calculation gives Consequently, with (6.74) we obtain (6.77) due to because Theorem 6.1 also holds for ρ 2 since Assumption 6.12 is also valid for 2p instead of p (cf. (6.12) in the proof of Proposition 6.13). Recall that our main objective is to show (6.75) and thus we consider the Berry-Esseen bound on the right-hand side of (6.76). For the first summand we distinguish two cases according to the assumptions on the multiplier sequence.
Let us discuss the case of bounded multipliers first. For M > 0 we have for large n ∈ N on the set {1/Ŵ 2 n ≤ M }. In the situation of normal multipliers, recall that there exist constants K 1 , K 2 > 0 such that for ξ ∼ N (0, 1) and y > 0 large enough we have Thus, we can calculate for n ∈ N large enough on the set where K 1 and K 2 depend on M . The first inequality in the display above uses boundedness of |B n i | again and the last one follows with (6.79). Now, according to Assumption 2.3(b) let 0 < t 2 ≤ 1 and δ > 0 with n −t 2 +δ = o(∆ n ). Furthermore, defineδ > 0 via 1 +δ = 1/(t 2 − δ) and q := 1/δ. Then we have n∆ 1+δ n → ∞ and for n ≥ N (M ) ∈ N on the set {1/Ŵ 2 n ≤ M }, using exp(−K 2 n∆ n ) ≤ (n∆ n ) −q , we conclude We now consider the second term on the right-hand side of (6.76), for which Consequently, follows. Thus, from (6.78), (6.80) and (6.81) we see that with K > 0 from (6.76) for each M > 0 there exists a K 3 > 0 such that Now we can show (6.75). Let η > 0 and according to (6.77) choose an M > 0 with P(1/Ŵ 2 n > M ) < η/2 for all n ∈ N. For this M > 0 choose a K 3 > 0 such that the probability in (6.82) is smaller than η/2 for large n. Then for n ∈ N large enough we have using (6.76) and the fact, that if 1/Ŵ 2 n ≤ M there exists a c > 0 with (1 − Φ(x 1/r /Ŵ n )) > c . Thus, we have shownκ (αn,ρ) n,Bn (r) P → ∞ and we are only left with proving (4.17). Let n,Bn (r) for some θ > θ 0 + Kϕ * n .
Proof of Proposition 4.13, Corollary 4.14 and Proposition 4.15. The assertions can be obtained by a similar reasoning as in the proofs of Proposition 3.10, Corollary 3.12 and Proposition 3.13 and we omit the details.
Proof of the results in Example 2.5, Example 4.3 and Example 4.11(2).
(2) Now we show that if additionally (4.7) and (4.8) are satisfied, k 0 < ∞ holds and N k (t) is a bounded function on R for each k ∈ N 0 as stated in Example 4.11(2). To this end, elementary calculations show that the functionN is given byN (y, t) = Υ L,p (Ā(y),β(y),p(y), t) with (6.83) Furthermore, it is well known from complex analysis that there is a domain U ⊂ U * ⊂ C with holomorphic functions A * : U * → C, β * : U * → Cp − := {u ∈ C | Re(u) <p} and p * : U * → C 1+ := {u ∈ C | Re(u) > 1} such thatĀ,β andp are the restrictions of A * , β * and p * to U . Moreover, it can be seen from (6.83) that for fixed t ∈ R the mapping (a, β, p) → Υ L,p (a, β, p, t) is partially holomorphic on C×Cp − ×C 1+ , that is it is holomorphic in each of the variables a, β and p when the remaining variables are fixed. By a deep result of complex analysis in several variables which dates back to Hartogs (1906) this implies that (a, β, p) → Υ L,p (a, β, p, t) is holomorphic on C × Cp − × C 1+ for fixed t ∈ R (see also Remark 1.2.28 in Scheidemann (2005)). Additionally, by Proposition 1.2.2(5) in Scheidemann (2005) the function Ξ : U * → C × Cp − × C 1+ with Ξ(y) := (A * (y), β * (y), p * (y)) is holomorphic and thus for each fixed t ∈ R the mapping y →N (y, t) is real analytic, because it is the restriction of the holomorphic function y → Υ L,p (Ξ(y), t) to U . Consequently, by shrinking the set U if necessary, we have the power series expansion (6.84) for every y ∈ U and t ∈ R. If k 0 = ∞, then for any k ∈ N and t ∈ R we have N k (t) = 0. Thus, we obtain for some constant K > 0 for each t ≥ 2 and y ∈ U , where Taking the derivative with respect to y ∈ U on both sides of (6.85) yields Ψ L,p (y) + KĀ (y)(1 −p(y)) +Ā(y)p (y) (1 −p(y)) 2 t 1−p(y) −p (y) KĀ(y) 1 −p(y) log(t)t 1−p(y) = 0, (6.87) for each y ∈ U and t ≥ 2. Hence,p (y) is equal to zero for each y ∈ U , because otherwise the display above is not valid for each t ≥ 2. This fact together with (6.87) gives Ψ L,p (y) + KĀ (y) 1 −p(y) t 1−p(y) = 0, for all y ∈ U and t ≥ 2. Consequently,Ā (y) = 0 holds for every y ∈ U and with (6.86) we obtain Ψ L,p (y) = 4LĀ(θ 0 )β (y)(p −β(y)) −2 = 0, (y ∈ U ) which impliesβ (y) = 0 for all y ∈ U . Thus, k 0 = ∞ contradicts the assumption that at least one of the functionsĀ,β andp is non-constant.
The following consideration will be helpful in order to show that N k (t) is bounded in t ∈ R for each k ∈ N 0 . Let f 1 , f 2 : U ×R → R be functions, which are arbitrarily often differentiable with respect to y ∈ U for fixed t ∈ R such that for each ∈ N 0 the -th derivatives with respect to y satisfy sup for some constant K > 0 which does not depend on . (Here we set 0 0 := 1.) Then by the product formula for higher derivatives we obtain for the -th derivative with respect to y of the product of f 1 and f 2 for each ∈ N 0 as soon as we can show that there exists a K > 0 such that for every ∈ N 0 the following bounds for the derivatives hold LetĀ(y) = ∞ =0 A (y − θ 0 ) be the power series expansion of the real analytic function A around θ 0 . By the definition of real analytic functions this power series has a positive radius of convergence and due to the Cauchy-Hadamard formula this is equivalent to the existence of a constant K > 0 with |A | ≤ K +1 for each ∈ N 0 . Thus, because of A ( ) (θ 0 ) = !A for each ∈ N 0 , (6.90) follows. By assumption in Example 4.3 we havē β(y) ≤β ≤ 1 ∨β <p <p(y) for each y ∈ U . As a consequence, the functions y → 1 p(y) − 1 and y → 1 p −β(y) are real analytic on U as concatenations of real analytic functions. So the same reasoning as above yields (6.91) and (6.92). Let the affine linear functionsβ andp be given bȳ β(y) = β 0 + β 1 (y − θ 0 ) andp(y) = p 0 + p 1 (y − θ 0 ). Then for ∈ N 0 , t > 0 we have and for ∈ N 0 let h (1) : (0, ∞) → R be defined by h (1) (t) = t 1−p 0 (log(t)) . h (1) 0 is clearly bounded in t ≥ 2 due to p 0 > 1 and for ∈ N the only possible roots of the derivative of h (1) in t ∈ (0, ∞) are t = 1 and t = exp{ /(p 0 − 1)}. Thus, we obtain for the supremum in (6.93) and for ∈ N 0 let h (2) : (0, 1] → R be defined by h (2) (t) = tp −β 0 (log(t)) . For ∈ N the only possible roots in (0, 1] of the derivative of h (2) are t = 1 and t = exp{− /(p − β 0 )}. As a consequence, we obtain for each ∈ N 0 for the supremum in (6.94) because lim t→0 h (2) (t) = 0. Notice that for t = 0 the function y → tp −β(y) is zero constant and for = 0 the function t → tp −β 0 is bounded by 1 on [0, 1] due top > β 0 .
(3) The expansion (4.18) can be deduced along the same lines as in step (3)  A Technical details in the proofs of Theorem 6.1 and Theorem 6.8 In this appendix we give the details of the proofs of Theorem 6.1 and Theorem 6.8. Here and also in the appendices B and C K or K(α) denote generic constants which sometimes depend on a further quantity α and may change from place to place.

A.1 Moments of functionals of integer-valued random measures
Hoffmann and Vetter (2017) used Lemma 2.1.5 and Lemma 2.1.7 of Jacod and Protter (2012) frequently in order to achieve their weak convergence results. However, in Jacod and Protter (2012) these results are only proved for Poisson random measures with a predictable compensator of the form ds⊗F (dz) with a Lévy measure F . Therefore, using tools from Jacod (1979) we prove the generalized versions stated below. First, we introduce some notations. Let p be an integervalued random measure on R + × R with predictable compensator q(ω; ds, dz) = ν s (ω; dz)ds for a transition kernel ν s (ω; dz) from (Ω × R + , P) into (R, B), where P is the predictable σ-algebra on Ω × R + (with respect to some prespecified filtration (F t ) t∈R + ) and where an integer-valued random measure is a random measure which satisfies the requirements of Definition II.1.3 and Definition II.1.13 in Jacod and Shiryaev (2002). Furthermore, we set Ω = Ω × R + × R and P = P ⊗ B is the predictable σ-algebra on Ω . Then, for a real-valued P -measurable function δ on Ω and p, t ∈ R + , u > 0 let δ(p) t,u (ω) = 1 u t+u t |δ(ω, s, z)| p ν s (ω; dz)ds.
Lemma A.1. Suppose thatδ(2) 0,u < ∞ almost surely for all u > 0. Then the process Y = δ (p − q) is a locally square integrable martingale, and for all finite stopping times T and u > 0 and also for p ≥ 2 Lemma A.2. Suppose thatδ(1) 0,u < ∞ almost surely for all u > 0. Then the process Y = δ p is of locally integrable variation. Furthermore, for all finite stopping times T and u > 0 we have for p ∈ (0, 1] Proof of Lemma A.1. δ 2 q is a continuous increasing process and we have δ 2 q t = tδ(2) 0,t for all t > 0. Thus, for n ∈ N let M n be a null set such that δ 2 q n is finite on M C n . Such a set exists by the assumption on δ. Then the increasing process δ 2 q t is finite for all t ∈ R + on M C with M = n∈N M n . Therefore, (T n ) n∈N defined via T n = inf{t > 0 | δ 2 q t ≥ n} is a localizing sequence of stopping times and the stopped continuous processes satisfy (δ 2 q) Tn t ≤ n for all t ∈ R + . Consequently, δ 2 q is locally bounded and in particular locally integrable. Thus, by Theorem II.1.33(a) in Jacod and Shiryaev (2002) the process Y is well-defined and a locally square integrable martingale. In order to show the claimed inequalities we want to reduce our setup to the situation of Lemma 2.1.5 in Jacod and Protter (2012). To this end, let F be a Lévy measure on (R, B) without atoms and F (R) = ∞. Furthermore, let x 0 / ∈ R be an exterior point of R and let (R x 0 , B x 0 ) denote the measurable one point extension of (R, B), that is R Then according to Theorem 14.53 in Jacod (1979) there exist a measurable function h : (Ω , P ) → (R x 0 , B x 0 ) and a P-null set N such that for each A ∈ B(R + )⊗B and ω / ∈ N . Additionally, by Theorem 14.56 in Jacod (1979) there exists a filtered measurable space (Ω • , F • , (F • t ) t∈R + ) and a transition probability Q(ω, dω • ) from (Ω, F) into (Ω • , F • ) such that on the extended filtered probability space (Ω,F, (F t ) t∈R + ,P), which is , there exists a Poisson random measurep with predictable compensatorq(ds, dz) = F (dz)ds such that forP-almost everyω = (ω, ω • ) we have for all A ∈ B(R + ) ⊗ B. Furthermore, we identify the filtration (F t ) t∈R + on (Ω, F) with the induced filtration F t ⊗ {∅, Ω • } on (Ω,F), which we denote by (F t ) t∈R + as well. Any random variable X on (Ω, F) will be identified with the induced mapping X(ω, ω • ) = X(ω). Then we have for every A ∈F and every stopping time T on (Ω, F) where for the sake of a clear notation we denote by F T (Ω) and F T (Ω), respectively, the σalgebra of events up to time T with respect to (Ω, F, (F t ) t∈R + ) and (Ω,F, (F t ) t∈R + ), respectively. Consequently, for A = A 1 × Ω • ∈ F T (Ω) with A 1 ∈ F T (Ω) and a random variable X on (Ω, F) we have by the definition of the conditional expectation and thus Let O be the optional σ-algebra on Ω × R + and letÕ denote the optional σ-algebra onΩ × R + . Then by Proposition II.1.14 there exist a thin random set D ∈ O, an optional process (β s ) s∈R + on (Ω, F, (F t ) t∈R + , P), a thin random setD ∈Õ and an optional process (β s ) s∈R + on (Ω,F, (F t ) t∈R + ,P) such that for every (ω, ω • ) ∈Ω, where (x,y) is the Dirac measure on R + × R with mass in (x, y). As a consequence, we obtain from (A.2) for every t ≥ 0 andP-almost every (ω, ω • ), where we set f (ω, s, h(ω, s, z)) = 0 if h(ω, s, z) = x 0 for a real-valued predictable function f on Ω . Thus, the processes δ(ω, t, β t (ω))1 D (ω, t) and δ(ω, t, h(ω, t,β t (ω, ω • )))1D((ω, ω • ), t) areP-indistinguishable on (Ω,F, (F t ) t∈R + ,P) and the stochastic integrals δ (p − q) and (δ • h) (p −q) areP-indistinguishable as well (cf. Definition II.1.27 in Jacod and Shiryaev (2002)), where for the sake of brevity (δ•h) denotes the predictable map (ω, s, z) → δ(ω, s, h(ω, s, z)) on Ω . Notice that δ 2 q = (δ 2 • h) q outside a null set due to (A.1). Thus, the same reasoning as at the beginning of the proof shows thatỸ t := (δ •h) (p−q) t is well-defined and a locally square integrable martingale. Finally, for every finite stopping time T and all u > 0, p ≥ 1 the variables sup 0≤v≤u |Y T +v − Y T | p and sup 0≤v≤u |Ỹ T +v −Ỹ T | p coincidẽ P-almost surely. Consequently, using (A.3) we obtain The second asserted inequality follows with exactly the same reasoning.
Proof of Lemma A.2. In the same way as at the beginning of the proof of Lemma A.1 we see that the increasing, continuous and finite-valued process |δ| q is locally bounded. Hence, by the definition of the predictable compensator (Theorem II.1.8 in Jacod and Shiryaev (2002)) the process |δ| p is locally integrable and thus Y is of locally integrable variation.
With the same quantities as in the proof of Lemma A.1 we obtain from (A.2) for all t ∈ R +P (d(ω, ω • ))-almost surely. Thus, we have |Ỹ T +v −Ỹ T | p P-almost surely for all finite stopping times T and p, u > 0. Now, the same reasoning as in (A.4) and (A.5), but using Lemma 2.1.7 of Jacod and Protter (2012) instead, yields the desired inequalities.

A.2 Results on the crucial decomposition
Recall the quantities defined in (6.23) which are used frequently in the proof of Theorem 6.1 and Theorem 6.8. With the constants from Assumption 6.12 let ∈ R have the properties 1 < < 1 2βw ∧ (1 + ) and also < 2(p − 1)w − 1 2(β − 1)w if β > 1, (A.6) with an > 0 for which Assumption 6.12(b6) holds. Then we have u n = (v n ) and F n = {z : |z| > u n } (A.7) as well asX The following lemma ensures that with high probability at most one large jump occurs and the remaining part is appropriately small.
Lemma A.4. Let Assumption 6.12 be satisfied. Then for the sets we have P(Q n ) → 1 as n → ∞.
Then by Lemma A.1 and Assumption 6.12(a1) we obtain for 1 ≤ i ≤ n and any sufficiently small 0 < δ < 1 Note that m > 2 always so the lemma quoted above can be applied. Furthermore,μ (n) (ds, dz) = ν (n) s (dz)ds yields for 1 ≤ i ≤ n and arbitrary δ > 0 small enough Let m b , m σ ∈ R be the constants in Assumption 6.12(c). Because of m b > 1 and m σ > 2 we can apply Hölder's inequality and the Burkholder-Davis-Gundy inequalities (see page 39 in Jacod and Protter (2012)) to obtain due to Assumption 6.12(c) for 1 ≤ i ≤ n: where the equalities in the above displays hold according to Fubini's theorem. Additionally, according to Lemma A.13 we have for any 1 ≤ i ≤ n and some K(δ) > 0 for n ∈ N large enough. Let us now choose δ > 0 in such a way that 1 − w(β + δ − 1) + > w. Then, for n large enough we have ∆ 1− w(β+δ−1) + n ≤ Kv n , and the Markov inequality gives and again for δ > 0 small enough. Thus, the right hand side of (A.10) converges to zero for this choice of δ, using Assumption 6.12(b4) and (b6).
If moreover Assumption 2.3 is valid, we can even give a rate for the convergence P(Q n ) → 1.
Proof. Let x be arbitrary and either z = 0 or |z| > α/4. Then, for n large enough we have Furthermore, using the fact that for large n ∈ N on Q n there is at most one jump ofX (α) n on an interval ((k − 1)∆ n , k∆ n ] with 1 ≤ k ≤ n, we thus obtain Now, forget about the indicator involving Q n and assume j < i. If (F t ) t∈R + denotes the underlying filtration, the inner integral in (A.15) with respect to µ (n) (ω; dt, dz) is an (F j∆n ⊗ B)-measurable function in (ω, x). Accordingly, the integrand in the integral with respect to µ (n) (ω; ds, dx) is in fact P -measurable. Therefore, Fubini's theorem and the definition of the predictable compensator of an optional P -σ-finite random measure (see Theorem II.1.8 in Jacod and Shiryaev (2002)) yield for n large enough: Thus, we have P J (1) n (α) → 1, because (A.16), Assumption 6.12(a(4)II) and Assumption 6.12(b3) show that there is a constant K > 0 such that Similar to (A.13) for α > 0, 1 ≤ i, j ≤ n with i = j and the constants v < r in Assumption 6.12 let and define the sets J (2) n (α) by Lemma A.7. Grant Assumption 6.12. Then for each α ∈ (0, α 0 /2), with α 0 the constant in Assumption 6.12(a(4)I), the sets J (2) n (α) defined in (A.17) satisfy lim n→∞ P J (2) n (α) = 1.
In the following we gather auxiliary lemmas which have a similar proof and give bounds for crucial quantities in the proof of Theorem 6.1 and Theorem 6.8. The first one is concerned with from (6.26), where α > 0, χ (α) t is defined in (6.3) and L (n) = z1 {|z|>vn} µ (n) is the pure jump Itō semimartingale defined in (6.4).
Lemma A.9. Let Assumption 6.12 be satisfied. Then for α > 0, ω ∈ Q n and n ∈ N large enough such that v n ≤ α/4 we have for K > 0 a bound for ρ and with ρ α defined prior to (6.3) and where the particular processes are defined in (A.8).
Proof. On Q n , and with n large enough such that v n ≤ α/4, one of the following mutually exclusive possibilities holds for 1 ≤ i ≤ n: (i) ∆ n i N n = 0. Then we have |∆ n i X (n) | = |∆ n iX n | ≤ v n /2 and there is no jump larger than u n (and v n ) on the interval ((i − 1)∆ n , i∆ n ]. Thus, χ (α) t (∆ n i X (n) )1 {|∆ n i X (n) |>vn} = 0 = χ (α) t (∆ n i L (n) ) holds for all t ∈ R and the corresponding summand in (A.20) vanishes.
(ii) ∆ n i N n = 1 and ∆ n iX n = ∆ n iX (α) n = 0. So the only jump in ((i − 1)∆ n , i∆ n ] (of absolute size) larger than u n is in fact not larger than α/4, and because of v n ≤ α/4 we have |∆ n i X (n) | ≤ α/2. Thus, as in the first case, is true for all t ∈ R and the corresponding summand in (A.20) is equal to zero.
Thus, we obtain an upper bound for V (n) α on Q n , as soon as v n ≤ α/4: where we can substitute ∆ n i X (n) = ∆ n iX n + ∆ n iX (α) n in the first line.
With a similar reasoning as above we deduce a bound for from (6.32) in the next lemma, where α > 0 and χ •(α) t is defined in (6.3).
Lemma A.10. Let Assumption 6.12 be satisfied. Then for α > 0, ω ∈ Q n and n ∈ N large enough such that v n ≤ α we have with K > 0 a bound for ρ and , v > 0 is the constant from Assumption 6.12(a(4)I) and the involved processes are defined in (A.8).
Proof. On the set Q n , and if v n ≤ α, we have three mutually exclusive possibilities for 1 ≤ i ≤ n: (i) ∆ n i N n = 0. Then we have |∆ n i X (n) | = |∆ n iX n | ≤ v n /2 and there is no jump larger than u n (and v n ) on the interval ((i − 1)∆ n , i∆ n ]. Thus, χ •(α) t (∆ n i X (n) )1 {|∆ n i X (n) |>vn} = 0 = χ •(α) t (∆ n i L (n) ) holds for all t ∈ R and the i-th summand in (A.21) vanishes.
(ii) ∆ n i N n = 1 and ∆ n iX n = 0, but ∆ n iX (8α) n = 0. So the only jump in ((i − 1)∆ n , i∆ n ] larger than u n is also larger than 2α. Because |∆ n iX n | ≤ v n /2 ≤ α/2 holds, we have |∆ n i X (n) | ≥ α, and consequently χ •(α) (iii) ∆ n i N n = 1 and ∆ n iX n = ∆ n iX (8α) n = 0. Here we can write ∆ n i X (n) = ∆ n iX n + ∆ n iX (8α) n =: ς n i (α) and χ Therefore, on Q n and as soon as v n ≤ α, we have In the following lemma we obtain a bound for from (6.45), where α > 0, χ (α) t is defined in (6.3), L (n) = z1 {|z|>vn} µ (n) is the pure jump Itō semimartingale defined in (6.4) and (ξ i ) i∈N is a sequence of multipliers with mean zero and variance one defined on a distinct probability space than the remaining processes. Furthermore, for the claim of the lemma below recall the definition of the sets Q n and J (1) n (α) in (A.9) and (A.14), respectively, as well as the definition of ρ α prior to (6.3) and the quantities defined in (A.8).
In the next lemma we use a similar reasoning as above to obtain a bound for is defined in (6.3), L (n) = z1 {|z|>vn} µ (n) is the pure jump Itō semimartingale defined in (6.4) and (ξ i ) i∈N is a sequence of multipliers with mean zero and variance one defined on a distinct probability space than the remaining processes. Furthermore for the claim of the lemma below recall the definition of the sets Q n and J (2) n (α) in (A.9) and (A.17), respectively, as well as the definition of ρ • α prior to (6.3) and the quantities defined in (A.8).
Lemma A.12. Let Assumption 6.12 be satisfied. Then for α > 0, ω ∈ J (2) n (α) ∩ Q n and n ∈ N large enough such that v n ≤ α we havê where v > 0 is the constant from Assumption 6.12(a(4)I), ς n i (α) = ∆ n iX n + ∆ n iX (8α) n , S n = {M ⊂ {1, . . . , n} | #M ≤ c n } for c n = (v n /∆ r n ) + 1 and with Proof. Recall the cases which we have distinguished in the proof of Lemma A.10: On the set Q n , and if v n ≤ α, we have three mutually exclusive possibilities for 1 ≤ i ≤ n: (i) ∆ n i N n = 0. Then we have |∆ n i X (n) | = |∆ n iX n | ≤ v n /2 and there is no jump larger than u n (and v n ) on the interval ((i − 1)∆ n , i∆ n ]. Thus, χ •(α) t (∆ n i X (n) )1 {|∆ n i X (n) |>vn} = 0 = χ •(α) t (∆ n i L (n) ) holds for all t ∈ R and the i-th summand in (A.25) vanishes.
The proof of the following lemma requires the notion of Orlicz norms. Recall from Section 2.2 in Van der Vaart and Wellner (1996) that for Λ : R + → R a non-decreasing, convex function with Λ(0) = 0 and a random variable Z the Orlicz norm is defined as where we set inf ∅ = ∞. It is easy to check that if Λ equals the function x → x p for some p ≥ 1, the corresponding Orlicz norm is the well-known L p -norm Z p = E|Z| p 1/p . Furthermore for Λ 1 (x) := e x − 1 a straight forward calculation gives Z p ≤ p! Z Λ 1 , for all p ∈ N, (A.30) because x p ≤ p!(e x − 1) for all x ∈ R + by the series expansion of the exponential function.
Lemma A.16. Let Assumption 6.12 be satisfied and for n ∈ N let (Z n i ) i=1,...,n be independent random variables with mean zero such that there exist constants C 1 , C 2 > 0 with Proof. The modified Bernstein inequality (Lemma 2.2.11 in Van der Vaart and Wellner (1996)) and (A.31) yield P i∈A Z n i > x ≤ 2e − 1 2 x 2 bn+dnx for every x ∈ R + , A ∈ S n with b n = 2C 2 c n /n and d n = C 1 / √ n∆ n , because each A ∈ S n consists of at most c n elements. Therefore, by Lemma 2.2.10 in the previously mentioned reference, the fact that #S n ≤ (n + 1) cn and (A.30) we obtain for a universal constant C and n ≥ 2 ≤ C d n log(1 + (n + 1) cn ) + b n log(1 + (n + 1) cn ) ≤ K 1 √ n∆ n v n /∆ r n + 2 log(2n) + K v n /∆ r n + 2 log(2n)/n ≤ K log(2n) with some δ > 0 such that Assumption 6.12(b7) is satisfied. The final convergence in (A.32) holds, because by Assumption 6.12(b4) we have ∆ n = o(n −u ) for some 0 < u < 1.
Proof. First we consider y (α) n . The mean value theorem yields for some ξ i between |∆ n iX (8α) n | and |∆ n iX n + ∆ n iX (8α) n |. Next using the fact that due to the indicators |∆ n iX n | ≤ |∆ n iX (8α) n | holds, we obtain Note that on Q n the sum ∆ n iX (8α) n consists of at most one jump. Therefore, we can calculate with the definition of the predictable compensator of the random measure associated with the jumps of X (n) : by Assumption 6.12(a1) and (b4), because p − 1 > β. Now we show the claim z (α) n = o(1). This can be seen again by the definition of the predictable compensator of the random measure associated with the jumps of X (n) . By the fact that on Q n ∆ n iX (8α) n is either 0 or equal to the only jump of absolute size in (u n , 2α] on the interval ((i − 1)∆ n , i∆ n ], we have: for some ξ n i between ∆ n iX n and ∆ n iX n + ∆ n iX n using the mean value theorem and the fact that |∆ n iX n | ≤ v n /2 on Q n . Notice furthermore that due to |∆ n iX n | ≤ v n /2 on Q n the condition |∆ n i X (n) | > v n implies |∆ n iX n | > v n /2 and consequently |∆ n iX n | > |∆ n iX n |. The final inequality in (A.37) follows with the assumptions on ρ and the definition of u n = (v n ) with > 1 in (A.7), such that u n < v n /2 holds for large n ∈ N. On Q n ∆ n iX n is either zero or equal to the only jump in ((i − 1)∆ n , i∆ n ] of absolute size larger than u n . Thus, the definition of the predictable compensator of an optional P -σ-finite random measure (Theorem II.1.8 in Jacod and Shiryaev (2002)) gives In order to prove (A.36) we use the mean value theorem, the definition of Q n and the assumptions on ρ to obtain for i = j similar to (A.37) E ρ(∆ n i X (n) ) ρ(∆ n j X (n) ) 1 {|∆ n i X (n) |>vn} 1 {|∆ n j X (n) |>vn} 1 Qn ≤ E ρ(∆ n B Results on the limiting process from Theorem 6.1 B.1 Useful properties of the Gaussian limit and its covariance semimetric In the following we collect several lemmas which are useful to obtain bounds for the expectation of sup-functionals of the process G f . In particular, we apply them in the proof of Proposition 6.6 in order to show asymptotical uniform d-equicontinuity in probability of a sequence of processes G fn for some suitable semimetric d. Lemma B.1. Grant Assumption 6.12, let f : R → R be Borel measurable with |f (z)| ≤ K(1 ∧ |z| p ) for all z ∈ R and some K > 0. Furthermore, let G f be the tight centered Gaussian process in ∞ ([0, 1]×R) defined in Theorem 6.1. Then for (θ 1 , t 1 ), (θ 2 , t 2 ) ∈ [0, 1]×R the L 8 -norm satisfies G f (θ 1 , t 1 ) − G f (θ 2 , t 2 ) 8 = 105 1 8 d f ((θ 1 , t 1 ); (θ 2 , t 2 )), with d f the semimetric defined in (6.2).

C Auxiliary Results
The following lemma shows that two bootstrapped random elements with values in some metric space (D, d) which differ only by a term of order o P (1) converge simultaneously weakly conditional on the data in probability.
The next auxiliary result is useful in order to show consistency of the test procedures in this paper. In the assertion of this proposition (ξ (b) ) b=1,...,B for some B ∈ N denote independent sequences ξ (b) = (ξ (b) i ) i∈N of random variables satisfying Assumption 3.6. Furthermore,Ĝ (n) ρ,ξ (b) denotes the process defined in (6.7) calculated with respect to the b-th multiplier sequence ξ (b) .