Nonparametric inference on L´evy measures of compound Poisson-driven Ornstein-Uhlenbeck processes under macroscopic discrete observations

: This study examines a nonparametric inference on a stationary L´evy-driven Ornstein-Uhlenbeck (OU) process X = ( X t ) t ≥ 0 with a compound Poisson subordinator. We propose a new spectral estimator for the L´evy measure of the L´evy-driven OU process X under macroscopic observations. We also derive, for the estimator, multivariate central limit theorems over a ﬁnite number of design points, and high-dimensional central limit theorems in the case wherein the number of design points increases with an increase in the sample size. Built on these asymptotic results, we develop methods to construct conﬁdence bands for the L´evy measure and propose a practical method for bandwidth selection.


Introduction
Given a positive number λ and an increasing Lévy process J = (J t ) t≥0 without drift component, an Ornstein-Uhlenbeck (OU) process X = (X t ) t≥0 driven by J is defined by a solution to the following stochastic differential equation (SDE) dX t = −λX t dt + dJ λt , t ≥ 0.
We assume that X is stationary. If (2,∞) log xν(dx) < ∞, then the unique stationary solution of (1.1) exists (see Theorem 17.5 and Corollary 17.9 in [70]), and the stationary distribution π of X is self-decomposable with the characteristic function where k(x) = ν((x, ∞))1 [0,∞) . This study focuses on the case wherein the Lévy process J in (1.1) is a compound Poisson process. In other words, J is of the form where N = (N t ) t≥0 is a Poisson process with intensity α > 0 and {U j } j≥1 is a sequence of independent and identically distributed (i.i.d.) positive-valued random variables with common distribution F . In this case, J t has a characteristic function of the form ϕ Jt (u) = E[e iuJt ] = exp tα ∞ 0 (e iux − 1)F (dx) and the Lévy measure is given by ν(dx) = αF (dx). We also work with the macroscopic observation set up, that is, we have discrete observations X Δ , X 2Δ , . . . , X nΔ at frequency 1/Δ > 0 with Δ = Δ n → ∞ and Δ n /n → 0 as n → ∞. This is a technical condition to make the dependence among observations {X jΔ } n j=1 asymptotically negligible.
This study aims to develop a nonparametric inference on the Lévy measure of a Lévy-driven OU process. Therefore, we first propose a spectral (or Fourierbased) estimator for the k-function and derive a multivariate central limit theorem for the estimator over finite design points. As an extension of the result, we also derive high-dimensional central limit theorems for the estimator in the case wherein design points over a compact interval included in (0, ∞) increases as the sample size n goes to infinity. Second, built on those limit theorems, we develop methods for implementing confidence bands for the k-function. Similar methods to construct "asymptotic" uniform confidence bands are also proposed in [44]. Since confidence bands provide a simple graphical description of the accuracy of a nonparametric curve estimator, quantifying uncertainties of the estimator simultaneously over design points, they are practically important in statistical analysis. Third, we propose a practical method for bandwidth selection inspired by the idea developed by [9] on bandwidth selection in density deconvolution.
To the best of our knowledge, this is the first paper to establish limit theorems for nonparametric estimators for the Lévy measure of compound Poisson-driven OU processes.
Lévy-driven OU processes are widely used in modeling phenomena where random events occur at random discrete times. For example, refer to [1], [54], and [67] for applications of these processes to insurance, dam theory, and rainfall models. Several authors investigate the parametric inference on Lévy-driven OU processes driven by subordinators. We refer to [45], [61], and [56] under the high-frequency set up (i.e., Δ = Δ n → 0 and nΔ n → ∞ as n → ∞) and [10] under the low-frequency set up (i.e., Δ > 0 is fixed and n → ∞). There are several studies on parametric and nonparametric estimations and inferences on Lévy processes. We refer to recent contributions by [77], [52,53], and [11] on parametric inference on Lévy processes. We also find an overview of recent developments on the parametric inference on Lévy processes in [62]. Some authors have studied statistical inference on Lévy process under macroscopic observations. [29] investigates statistical inference on a compound Poisson process under three kinds of time scales-high-frequency, low-frequency, and macroscopic. [31] studies statistical inference on compound Poisson processes under macroscopic observations. [32] is another recent study on nonparametric estimation on compound Poisson processes under macroscopic observations. [22] discusses the robustness of spectral estimation of Lévy measures of compound Poisson processes to Δ n , and it includes the consistency of the estimator under the macroscopic set up. Concerning recent contributions to nonparametric inference on Lévy measures (or densities) under the high-frequency set up, we refer to [36,38,39], [76], [55], [66], and [50]. Recent studies on nonparametric estimation of Lévy densities under the high-frequency scale are [71], [33], [23,24,25], [37], [40,41], [64], [49], [3,4], [30], [48], [5], and [6]. Concerning literature on the low-frequency set up, we refer to [65] for inference on Lévy measures, and [68], [14], and [21] for nonparametric inference on compound Poisson processes. Further, [15] and [75] investigate nonparametric estimation of a class of Lévy processes under the low-frequency set up. [7] studies nonparametric estimation of Lévy measures of the moving average Lévy processes under low-frequency observations. [13], [12], and [43] study nonparametic inference on Lévy measures of Itô semimartingales with Lévy jumps under high-frequency observations. [47] and [46] investigate nonparametric estimation of the Lévy-driven OU processes. [47] derive consistency of their estimator for a class of Lévy-driven OU processes, which include compound Poisson-driven OU processes. [46] establish consistency of their estimator of the Lévy density of (1.1) with compound Poisson subordinator in uniform norm at a polynomial rate. However, they do not derive limit distributions of their estimators.
The analysis of the present study is related to deconvolution problems for mixing sequence. [57,58,59] investigate the probability density deconvolution problems for α-mixing sequences and derive convergence rates and asymptotic distributions of deconvolution estimators. Since the Lévy-driven OU process (1.1) is β-mixing under some conditions (see [60] for details), our analysis can be interpreted as a deconvolution problem for a β-mixing sequence. However, we need a non-trivial analysis since we are considering additional structures emerging from the properties of the compound Poisson-driven OU process. To be more precise, [59] assumes that, for a mixing sequence { X j } j≥0 , the joint densities p(x 1 , x j+1 ) of X 1 and X j+1 are uniformly bounded for any j ≥ 1 and x 1 , x j+1 ∈ R to show the asymptotic independence of their estimators at different design points. Although we also observe a β-mixing sequence {X jΔ } (see Remark 3.1 for details on the β-mixing property of {X jΔ }), we cannot assume such a condition directly in this study's context. Indeed, since the transition probability P t (x, dy) of X has a point mass at y = e −λt x, P t (x, ·) does not have a transition density function ( [78], Corollary 2). Therefore, to avoid such a problem, we consider the macroscopic regimes in this study.
The estimation problem of Lévy measures is generally ill-posed in the sense of inverse problems, and the ill-posedness is induced by a decay of the characteristic function of a Lévy process. We refer to [64] as the seminal work in which such an explanation is given for the first time. In our case, the ill-posedness is induced by the decay of the characteristic function of the stationary distribution π of the Lévy-driven OU (1.1). In this sense, the problem in this study is a (nonlinear) inverse problem. [73] investigates conditions wherein a self-decomposable distribution is nearly ordinary smooth, that is, the characteristic function of the self-decomposable distribution decays polynomially at infinity up to a logarithmic factor. [74] applies those results to the nonparametric calibration of self-decomposable Lévy option pricing models. Refining the result for a special case in [73], we will show that the characteristic function of a self-decomposable distribution is regularly varying at infinity with some index α > 0. This enables us to derive asymptotic distributions of the spectral estimator proposed in this study.
Our analysis is also related to [51] and [50]. [51] is a recent contribution to the literature on the construction of uniform confidence bands in probability density deconvolution problems for i.i.d. observations. The study formulates methods for constructing uniform confidence bands built on applications of intermediate Gaussian approximation theorems developed in [17,18,19,20] and provides multiplier bootstrap methods for implementing uniform confidence bands. [50] also develops confidence bands for Lévy densities based on intermediate Gaussian and multiplier bootstrap approximation theorems. However, we adopt different methods for the construction of confidence bands. We derive high-dimensional central limit theorems based on intermediate Gaussian approximation for βmixing process. Additionally, we can show that the variance-covariance matrix of the Gaussian random vector appearing in multivariate and high-dimensional central limit theorems is the identity matrix. Therefore, we do not need bootstrap methods to compute critical values of confidence bands.
The rest of the paper is organized as follows. In Section 2, we define a spectral estimator for the k-function. We give a multivariate central limit theorem of the spectral estimator in Section 3. In Section 4, we describe high-dimensional central limit theorems for the estimator and procedures for implementing confidence bands. In Section 5, we propose a practical method for bandwidth selection and report simulation results to study the finite sample performance of the spec-tral estimator. Discussions on our results and proposed confidence bands are presented in Section 6. All proofs are collated in Appendices A and B.

Notation
For any non-empty set T and any (complex-valued) function f on T , let f T = sup t∈T |f (t)|, and, for T = R, let f L p = ( R |f (x)| p dx) 1/p for p > 0. For any positive sequence a n , b n , we write a n b n if there is a constant C > 0 independent of n such that a n ≤ Cb n for all n, a n ∼ b n if a n b n and b n a n , and a n b n if a n /b n → 0 as n → ∞. For a, b ∈ R, let a ∨ b = max(a, b). For a ∈ R and b > 0, we use the shorthand notation transpose of a vector x is denoted by x . We use the notation d → as convergence in the distribution. For random variables X and Y , we write X d = Y if they have the same distribution. N (μ, Σ) denotes a (multivariate) normal distribution with a mean μ and a variance(-covariance matrix) Σ.

Estimation of the k-function
In this section, we introduce a spectral estimator for the Lévy measure (kfunction) of the Lévy-driven OU process (1.1). First, we consider a symmetrized version of the k-function, that is, Therefore, we have This formally yields

D. Kurisu
Here, θ n is a sequence of constants such that θ n → ∞ as n → ∞ (in the rest of this study, we set θ n ∼ n 1/2 (log n) −3 ). Let W : R → R be an integrable (kernel) function such that R W (x)dx = 1, and its Fourier transform ϕ W is supported in [−1, 1] (i.e., ϕ W (u) = 0 for all |u| > 1). Then, the spectral estimator for k at x > 0 is defined by where h = h n is a sequence of positive constants (bandwidths) such that h n → 0 as n → ∞, and In the following sections, we develop central limit theorems for k.
Remark 2.1. We need the truncation in ϕ θn to show Lemma A.2 in Appendix A by applying an exponential inequality for bounded mixing sequences. Additionally, refer to Remark 3.2 and the proof of Proposition 9.4 in [2].

Remark 2.2.
For a complex value a, let a be the complex conjugate of a. We observe that k is real-valued. In fact, since ϕ (t) = − ϕ (−t) and ϕ (t) = ϕ (−t), by a change of variables, we have Additionally, refer to Section 6 for detailed comments on the construction of the estimator k and an alternative estimator.

Multivariate central limit theorem
In this section, we present a multivariate central limit theorem for k . Assumption 3.1. We assume the following conditions.
(iii) Let r > 1/2, and let p be the integer such that p < r ≤ p + 1. The function k is p-times differentiable, and k (p) is (r − p)-Hölder continuous, that is, for some positive constant C 0 and δ ∈ (0, 1/12) as n → ∞. Here, β 1 is a positive constant. It appears in the mixing coefficient of X = (X t ) t≥0 (Conditions (i) and (ii) imply that X is exponentially β-mixing with βmixing coefficient β X (t) = O(e −β1t ) for some β 1 > 0. Refer to the following remark).

D. Kurisu
Condition (v) is concerned with the kernel function W . We assume that W is a (p + 1)-th order kernel. However, we allow for the possibility that R x p+1 W (x)dx = 0. It must be noted that since the Fourier transform of W has compact support, the support of the kernel function W is necessarily unbounded (see Theorem 4.1 in [72]).
Condition (vi) is concerned with the sampling frequency, bandwidth, and the sample size. The condition Δ log n implies that we work with macroscopic observation scheme; this is a technical condition for the inference on k. We assume this condition to guarantee that the dependence among {X jΔ } n j=1 can be ignored asymptotically. We note that, to estimate k uniformly on an interval I ⊂ (0, ∞), we do not need the condition and we can work with the low-frequency set up (i.e., Δ > 0 is fixed). From a practical viewpoint, our methods could be applied to low-frequency data; additionally, it would work effectively if we suitably rescale the time scale of the data and if the sample size n is sufficiently large. In our simulation study, we consider the case when (n, Δ) = (500, 1), and our method functions effectively in this case. We also need Condition (vi) to derive the lower bound of h for the uniform consistency of k (x) for x = x , j = 1, . . . , N with 0 < x 1 < · · · < x N < ∞. We need the upper bound of h for the undersmoothing condition. Refer to Remark 3.4 of this study for comments on the condition on h.
To state a multivariate central limit theorem for k , we introduce the notion of regularly varying functions.

Definition 3.1 (Regularly varying function). A measurable function
We say that a function U is slowly varying if U 0 ∈ RV 0 . We refer to [69] for details of regularly varying functions. The following lemma plays an important role in the proof of Theorem 3.1.
Remark 3.2. In Assumption 3.1, Condition (ii) is concerned with the smoothness of the stationary distribution π of the Lévy-driven OU process. Condition (ii) implies that the stationary distribution π is nearly ordinary smooth, that is, the characteristic function (1.2) decays polynomially fast as |u| → ∞ (Lemma 3.1), up to a slowly varying function. Since k(x) = ν((x, ∞)), the finiteness of k(0) is equivalent to the finiteness of the total mass of the Lévy measure of the Lévy process J. This means that the Lévy process J has finite activity, that is, it has only finitely many jumps in any bounded time interval. It is known that a Lévy process with a finite Lévy measure is a compound Poisson process. If k(0) = ∞, then the Lévy process J has infinite activity, that is, it has infinitely many jumps in any bounded time interval. In this case, the characteristic function (1.2) decays faster than polynomials. Particularly, it decays exponentially fast as |u| → ∞ if the Blumenthal-Getoor index of J is positive, that is, if For example, this case includes inverse Gaussian, tempered stable, and normal inverse Gaussian processes. Condition (ii) rules out these examples since we could not construct confidence bands based on Gaussian approximation under our observation scheme (see the comments after Assumption 10 in [51]). [51] develops some methods to construct uniform confidence bands for the density deconvolution problem by using the intermediate Gaussian approximation. In their study, when the density of a measurement error is super smooth (this case corresponds to the case in our framework wherein the BG-index is positive), they assume that the effect of the estimation of the characteristic function of the measurement error based on m = m n auxiliary independent observations is asymptotically negligible, that is, m n /n → ∞ as n → ∞. However, we can use n observations to estimate ϕ (this function corresponds to the characteristic function of a measurement error in deconvolution problems). Hence, in our situation, m = n. In this case, we can apply the results of the intermediate Gaussian approximation in [16] to the case wherein the density of a measurement error is ordinary smooth (or BG-index is 0). However, to the best of our knowledge, such a result has not been achieved in the literature on deconvolution problems when the density of a measurement error is super smooth (or BG-index is positive). Therefore, we assume nearly ordinary smoothness of π in our situation to obtain practical asymptotic theorems for the inference on k.
Remark 3.3. Lemma 3.1 implies that |ϕ(u)| is a regularly varying function at ∞ with index α. A slowly varying function L(u) may go to ∞ as u → ∞ but it does not grow faster than any power function, that is, for any δ > 0. Such a tail behavior of ϕ is related to Condition (vi) in Assumption 3.1. If the stationary distribution π is ordinary smooth, that is, ϕ satisfies the relation for some α > 0, then we can set δ = 0 in Condition (vi). However, we must introduce δ > 0 to consider the effect of the slowly varying function L.

Remark 3.4. As shown in (A.7) and the comments below, if we do not assume the condition
where the second term of the right-hand side comes from the deterministic bias. For central limit theorems to hold and for constructing the confidence bands, we have to choose a bandwidth to ensure that the bias term is asymptotically negligible relative to the first term or "variance" term. The right-hand side is optimized if we take h ∼ (log n/n) 1/(1+2r+2α−δ) .
Under Assumption 3.1, we can show that k (x) − k (x) has the following asymptotically linear representation: . By a change of variables, we may rewrite the first term in (3.1) as where K n is a function defined by It must be noted that K n is well-defined and real-valued. To construct a confidence interval for k(x), we estimate the variance of √ nhZ n (x), which is σ 2 n (x), by

Remark 3.5. We use Conditions (ii), (iv), and (v) in Assumption 3.1 to show that
Refer to the proof of Lemma A.5 in Appendix A for details. Combining this bound on K n and Condition (vi) in Assumption 3.1, we can show that the asymptotic variance-covariance matrix appearing in Theorem 3.1 is diagonal.
Remark 3.6. Propositions A.1 and A.2 and Lemma A.6 (see Appendix A) yield . Then, we can estimate σ 2 n (x) by σ 2 n (x) (see Lemma 4.1 and the proof in Appendix A for details). Now, we present the next multivariate central limit theorem.

Theorem 3.1. Assume Assumption 3.1. Then, for any
where I N is the N by N identity matrix and σ n (x) = σ 2 n (x).

High-dimensional central limit theorems
In Section 3, we present a multivariate (or finite-dimensional) central limit theorem for k . In this section, we present a high-dimensional central limit theorems as a refinement of Theorem 3.1. Moreover, we propose some methods for constructing confidence bands for the k-function in Section 4.2 as an application of those results.

High-dimensional central limit theorems for k
and let I ⊂ (0, ∞) be an interval with finite Lebesgue measure |I|, 0 < x 1 < · · · < x N < ∞, x j ∈ I, = 1, . . . , N. We assume that and this implies that N h 2δ−1 . Therefore, N is allowed to go to infinity as n → ∞.

Remark 4.1.
Since Remark 4.2. Theorem 4.1 can be shown in two steps. In the first step, we approximate the distribution of max 1≤ ≤N |W n (x )| by that of max 1≤ ≤N |Y n, |.
Here,Y n = (Y n,1 , . . . ,Y n,N ) is a centered normal random vector with covari- where q = q n is a sequence of integers with q n → ∞ and q n = o(n) as n → ∞, and In the second step, we approximate the distribution of max 1≤ ≤N |Y n, | by that of max 1≤ ≤N |Y |. For this, we compare the variance-covariance matrices Refer to proofs of Theorem A.1 and Proposition A.4 in Appendix A.
The well-known result in the extreme value theory shows that max 1≤ ≤N |Y | = O P ( √ log N ), for independent standard normal random variables Y , = 1, . . . , N (see Example 1.1.7 in [27]). Then, Theorem 4.1 implies This yields the following theorem.

Confidence bands for the k-function
In this section, we discuss methods for constructing confidence bands for the k-function Then, Theorem 4.2 implies that we can construct confidence bands by linear interpolation of simultaneous confidence intervals If the sample size n is sufficiently large, we can take a sufficiently large number of design points N . Therefore, proposed confidence bands can be arbitrary close to uniform confidence bands in such cases. We comment on the asymptotic validity of the confidence bands in Section 6.

Simulation framework
In this section, we present simulation results to see the finite-sample performance of the central limit theorems and the proposed confidence bands in Sections 3 and 4. We consider the following data generating process.
As a kernel function, we use a flat-top kernel, which is defined by its Fourier transform where 0 < c < 1 and b > 0. It must be noted that ϕ W is infinitely differentiable with ϕ ( ) W (0) = 0 for all ≥ 1. This ensures that its inverse Fourier transform W is of infinite order, that is, R x W (x)dx = 0 for all integers ≥ 1 (cf. [63]). In our simulation study, we set b = 1 and c = 0.05. We also set the sample size n and the time span Δ as n = 500 and Δ = 1. Now, we discuss bandwidth selection. We use a method that is similar to that proposed in [50]. They adopt an idea of [9] on bandwidth selection in density deconvolution. From a theoretical perspective, for our confidence bands to work, we have to choose bandwidths that are of a smaller order than the optimal rate for estimation under the loss function (or a "discretized version" of At the same time, choosing a very small bandwidth results in an extremely wide confidence band. Therefore, we should choose a bandwidth "slightly" smaller than the optimal one that minimizes max 1≤ ≤N | k (x ) − k (x )|. We employ the following rule for bandwidth selection. Let k h be the spectral estimate with bandwidth h.
1. Set a pilot bandwidth h P > 0 and make a list of candidate bandwidths h j = jh P /J for j = 1, . . . , J.

Choose the smallest bandwidth
. . , J} for some κ > 1.  In our simulation study, we set h P = 1, J = 20, and κ = 1.5. This rule would choose a bandwidth "slightly" smaller than one that is intuitively the optimal bandwidth for the estimation of k (as long as the threshold value κ is reasonably chosen). Figure 1 shows five realizations of the discretized L ∞ -distance between the true k-function and estimates k for different bandwidth values (left) and between the estimates of k with adjacent bandwidth values (right) when (α, λ) = (2.1, 0.5). We find that the discretized L ∞ -distance between the estimates of k with adjacent bandwidth values behave similarly to that between the true kfunction and estimates k for different bandwidth values. Hence, we can expect that, by using the proposed method for bandwidth selection, we can choose a "good" bandwidth for the construction of confidence bands.
Remark 5.1. In practice, it is also recommended to use visual information to find out on how max 1≤ ≤N | k hj (x ) − k hj−1 (x )| behaves as j increases when determining the bandwidth. Figure 2 shows the normalized empirical distributions of k (x) at x = 1.5 (left), x = 2 (center), and x = 2.5 (right) when (α, λ) = (2.1, 0.5). The number of Monte Carlo iteration is 1,000 for each case. As seen from these figures, the central limit theorem implied by Theorem 3.1 holds true.   Table 1 presents simulation results of the cases when (α, λ) = (2.1, 0.5), (3, 0.5), and (3, 0.75). We find that more accurate results are achieved when α = 3 than when α = 2.1. In general, the empirical coverage probabilities could be more accurate as the intensity of the Poisson process increases (see the comments on Figure 3). Overall, we can also find that the empirical coverage probabilities are reasonably close to the nominal coverage probabilities. Figure 3 shows the 85% (dark gray), 95% (gray), and 99% (light gray) confidence bands for the k-function when (α, λ) = (2.1, 0.5). We find that the proposed confidence bands capture the monotonicity of the k-function and the width of confidence bands tend to increase as the design point becomes distant from the origin. The latter point can be partially attributed to the property of the Lévy measure ν since the k-function is given by k(x) = ν((x, ∞)): For any (Borel) set A ⊂ [0, ∞), ν(A) coincides with the expected number of jumps falling in A in the unit time, that is, where J t− = lim s↑t J s . Therefore, jumps of a larger size are less frequently observed since ν([0, ∞)) < ∞, in our simulation study. Further, the results also correspond to a well-known fact in nonparametric density estimation. Since few observations fall in the tail regions, the nonparametric estimation of a given density function tends to be less accurate in the tail area than in regions where the probability mass is concentrated.

Discussions
In this section, we discuss (1) the regularity condition on the k-function (Condition (iii) in Assumption 3.1) and its relationship with the construction of our estimator, and (2) asymptotic properties of the proposed confidence bands.

Discussion on Condition (iii) in Assumption 3.1
We considered a symmetrized version of the k-function k and presented asymptotic properties of its estimator k . We also assumed a "global" regularity condition of k (Condition (iii) in Assumption 3.1) to obtain a suitable bound of the deterministic bias of k . It must be noted that k is continuous at the origin, and if k has bounded rth derivative on R for some r ≥ 0, then the deterministic bias of k , which is given by Appendix A). However, if we restrict the class of kernel functions, which satisfy Condition (v) in Assumption 3.1, then we can relax the "global" Hölder continuity.
(i) When 1/2 < r ≤ 2, we can use the symmetric second-order kernel functions. In this case, we can replace Condition (iii) in Assumption 3.1 with a "local" Hölder continuity of k on I 0 = {y ∈ R : |x − y| < 0 , ∀x ∈ I}, which does not include the origin. In fact, by taking a symmetric second-order kernel function W 2 , we have, for any x ∈ I, = 0 and 0! = 1 by convention. We note that k = k on I 0 . Hence, we can bound (ii) When r > 2, it would be difficult to weaken the global Hölder continuity assumption on k since symmetric "finite order" kernel functions do not satisfy higher-order properties. However, we can use the flat-top kernel function W ∞ , which is of "infinite order," defined by its Fourier transform ϕ W∞ to relax Condition (iii) in Assumption 3.1. Refer to (5.2) for the definition. Indeed, ϕ W∞ is infinitely differentiable and supported in [−1, 1]; this implies that |W ∞ (x)| = o(|x| − ) as |x| → ∞ for all ≥ 1 (this follows from changes of variables) and |x| r |W (x)| is integrable. Then, we have It is also shown that Based on the discussion above, if we set the kernel function as the flat-top kernel W ∞ , then we can replace the global Hölder continuity (Condition (iii) in Assumption 3.1) with the following local Hölder continuity.

Condition (iii)'
Let r > 1/2, and let p be the integer such that p < r ≤ p + 1. The function k is p-times differentiable on I 0 , which does not include the origin. Additionally, k (p) is (r − p)-Hölder continuous, that is, Now, we set the kernel function W = W ∞ . In this case, we can use another natural (and simple) estimator for k at x > 0, which is given by Additionally, Theorems 3.1 and 4.2 hold by replacing k with k 0 . We summarize the discussion so far as the following theorem.

Theorem 6.1. Suppose Conditions (i), (ii), (iv), (v), and (vi) in Assumption 3.1, and Condition (iii) hold true. Set the kernel function
where I N is the N by N identity matrix and σ n (x) = σ 2 n (x).

(ii) Additionally, suppose that (4.1) holds. Then, we have
We omit the proofs of Theorem 6.1 (i) and (ii) since the proofs are specializations of the proofs of Theorems 3.1 and 4.2.

Discussion on the confidence bands
Our method can be seen as an alternative method for constructing confidence bands based on a functional central limit theorem (FCLT) if the FCLT for the Lévy measure ν is available (but to the best of our knowledge, such a result has not been achieved in the literature on nonparametric inference of Lévy-driven SDEs). Moreover, the proofs clarify that if we strengthen the condition in Assumption 3.1 (vi) to h r nh 2α+1−δ (log n) = o(n −c ) for some (sufficiently small) constant c > 0, then there would exist a positive constant c such that the approximation of the high-dimensional central limit theorem holds at the rate n −c . This shows an advantage of our method to construct confidence bands based on the intermediate Gaussian approximation when compared to a method based on the Gumbel approximation. The coverage error of the latter is known to be logarithmically slow because of the slow convergence of normal extrema; refer to [42]. The proposed method is inspired by the idea developed in [44]. If we take x ∈ I, = 1, . . . , N to satisfy min 1≤k = ≤N |x k − x | = O(h 1/2 ) (in this case, the condition (4.1) is satisfied), then |x − x −1 | → 0 uniformly for = 2, . . . , N. Therefore, for x in I, . . . , N)) can be interpreted as an "asymptotic" 100(1− τ )% uniform confidence band for k on I. In fact, we can show that, as n → ∞, The same comments apply even if we replace k with k 0 . See Appendix B for the asymptotic validity of the proposed confidence bands.

D. Kurisu
Acknowledgements I am grateful to the Editor Domenico Marinucci, an associate editor, and anonymous referees for their constructive comments that helped improve the quality of the paper. One referee kindly pointed out some relevant references that I had overlooked. I am also grateful to Kengo Kato for carefully reading the manuscript and for his helpful suggestions and encouragements. In addition, I thank Hiroki Masuda for his useful comments.
Since u 1 cos(y)y −1 dy is convergent as u → ∞ and k is monotone decreasing function, we have that So, we complete the proof.
For the proof of Theorem 3.1, we prepare some auxiliary results.

Lemma A.1. Assume Conditions (i), (ii) and (iv) in Assumption 3.1. Then
we have that the measure π and x 3 π(dx) has a bounded Lebesgue density on R.
Proof. By Theorem 28.4 in [70], π has a bounded continuous Lebesgue density on R. Also from the relation we see that Therefore x 2 π has a Lebesgue density x 2 π(x) with Here, f L p = R |f (x)| p dx 1/p . Moreover, x 3 π has a Lebesgue density

D. Kurisu
Proof. The first result follows from Proposition 9.4 in [3]. For the second result, we have that We can also evaluate in a similar way.

Lemma A.3. Assume Condition (ii) in Assumption 3.1. Then we have
Proof. This result immediately follows from Remark 3.3.
If we take h sufficiently small, then Lemmas A.2 and A.3 imply that so that with probability approaching one, inf |u|<h −1 | ϕ(u)| h α .

Lemma A.4. Assume Conditions (i), (iv) and (v) in Assumption 3.1. Then we have that
Proof. (Step 1): First, we show that Consider the following decomposition.
We have that

Inference on compound Poisson-driven OU process
In the rest of the proof, we write · [−h −1 ,h −1 ] as · for simplicity. Observe that In fact, since we have that we obtain the second inequality. By Lemma A.2, we also have that We observe that Then we have that Together with (A.1), (A.2), and (A.3), we have that ( Step 2): Next we show that Observe that Moreover, we have that Together with (A.4) and (A.5), we have that we can replace ϕ with ϕ θn in (A.6) and this completes the proof.
With almost the same arguments in the proof of Lemma A.4, we can show that

Inference on compound Poisson-driven OU process 2545
Therefore, together with the result of Lemma A.4, we have that Proof. We first show h α |K n (x)| min(1, 1/x 2 ). We follow the proof of Lemma 3 in [57]. By integration by parts, we have that We also observe that Since ϕ W is supported in [−1, 1] and two-times differentiable, we can show
We can show that h α+1 R | I j,n (t)|dt 1, j = 1, 2, 3 and Therefore, we have the desired result. Since Lemma A.5 implies that each term on the right hand side is bounded (as a function of y) uniformly in n and x ∈ {x 1 , . . . , x N }.

Lemma A.6. Assume Conditions (i), (ii), (iv) and (v) in Assumption 3.1. For any compact set
Since |t| 2α |ϕ W (t)| 2 is integrable and for any |t| > 0, by dominated convergence theorem we have the desired result.
Proof. Let Z n,j (x) = X jΔ K n ((x − X jΔ )/h). By Fubini's theorem, we have that Therefore, we have that Lemmas A.6 and A.7 yield the following result on the lower bound of the variance of Z n,1 (x).
Since min 1≤ ≤N E[ Z 2 n,1 (x )] h −2α+δ+1 by Lemma A.6, we have that Proof. Since x 3 π has a bounded Lebesgue density on R by Lemma A.1 and h 2α |K n | 2 is integrable by Lemma A.5, we first observe that Therefore, by Proposition 2.5 in [35], we obtain Then we have the desired result.
Proposition A.2. Let S n (x) = n j=1 Z n,j (x). Then for any δ ∈ (0, 1/12), we have that Proof. It is easy to show that 1 n Var( S n (x)) = Var(Z n,1 (x)) + 2 By Lemma A.8, we have that Since log(1/h) < C0 2+2α−δ log n for sufficiently large n and For the first term, we have that I n R h r (by Lemma A.9). For the second term II n , Lemma A.4 yields that

Proof. Observe that by a change of variables, [k
If p ≥ 1, then by Taylor's theorem, for any x, y ∈ R, where 0! = 1 by convention. This completes the proof.
We use the following result to show that the asymptotic variances which appear in Theorem 3.1 is a diagonal matrix. Proposition A.3. For any δ ∈ (0, 1/12), we have that With almost the same arguments in the proof of Proposition A.2 yields that Hence it is sufficient to show that If |z| ≤ h −2δ and take h sufficiently small, then we have that Moreover, Therefore we have that

D. Kurisu
Proof of Theorem 3.1. Now we prove Theorem 3.
. First we will show that for 0 < x < ∞. We consider the following decomposition of S n (x).
We (Step 1): In this step, we will show that Note that β-mixing coefficients satisfy n 6 β(n) → 0 as n → ∞, we have that k n β(s n ) → 0 as n → ∞. By the definition of η n,1 (x), we have that Since |η n,j (x)|/(s n h −(1+δ)/2 σ n (x)) is bounded (see the comment after the proof of Lemma A.5), by Proposition 2.6 in [35], Then we have that Therefore, we have that as n → ∞. Likewise, we have that Var(ζ n (x)) = l n + s n n Var(ζ n (x)) → 0, as n → ∞ since n − k n (l n + s n ) (l n + s n ). (Step 2): We set T n (x) = kn j=1 ξ n,j (x). In this step we show that Then it is sufficient to show that for any > 0, lim n→∞ M n < . Note that By Lemma 2.4 in [34] and k n β(s n ) → 0 as n → ∞, we have that A n,1 k n β(s n ) → 0 as n → ∞. Finally we show lim n→∞ A n,2 = 0. This is equivalent to showing that where T n (x) = n j=1 ξ n,j and { ξ n,j (x)} are independent random variables such that ξ n,j (x) d = ξ n,j (x)/σ n (x). It is easy to show that {ξ n,j (x)/σ n (x)} is a sequence of bounded random variables. To show (A.9), it is sufficient to check the following Lindeberg condition.
for any ω > 0. By Hölder's inequality, Markov's inequality and Proposition 2.7 in [35], we have that Step 3): In this step, we complete the proof. Considering (A.8), Condition (vi) in Assumption 3.1 and Lemma A.9 yields that the bias term I n is asymptotically negligible since h r nh 2α+1−δ log n → 0 as n → ∞. This implies that and the asymptotic distribution of √ n( k (x) − k (x)) is the same as that of S n (x). Moreover, Proposition A.3 implies that asymptotic covariance between S n (x 1 )/ √ n and S n (x 2 )/ √ n for different design points 0 < x 1 < x 2 < ∞ is asymptotically negligible. Therefore, we finally obtain the desired result.

A.2. Proofs for Section 4
We note that Lemmas and Propositions in Section A.1 also hold when 0 < x 1 < · · · < x N < ∞, x ∈ I for = 1, . . . , N, and min 1≤k = ≤N |x k − x | h 1−2δ . In particular, we need to take into account the effect of the separation between points in the proof of Lemmas 4.1 and A.10, and Theorem A.1. In the proof of Theorem A.1, we use the lower bound of min 1≤ ≤N σ n (x ) to obtain an intermediate Gaussian approximation result. We also need to take care of the effect of the discretization of a compact set I to obtain the consistency of σ 2 n (x) on the discrete points in Lemma 4.1, that is, max 1≤ ≤N | σ 2 n (x )/σ 2 n (x )−1| P → 0. Moreover, in the proof of Lemma A.10, we use the condition min 1≤k = ≤N |x k − x | h 1−2δ to obtain a result that the variance-covariance matrix a random vector (W n (x 1 ), . . . , W n (x N )) can be approximated by the N × N identity matrix and this yields a Gaussian comparison result (Proposition A.4).
Proof of Lemma 4.1. Since K n R h −α and we can show K n − K n R = O P h −2α n −1/2 log n , we have that Therefore, we have that Then we have that and likewise, Since (h −2α n −1/2 log n) 2 /(h −3α n −1/2 log n) = h −α n −1/2 log n 1, we have that Therefore, to complete the proof, it suffices to prove that To prove (A.10), we use Theorem 2.18 in [35] n, and = 0 (log n) −1 for any 0 > 0 in their notations. Here, [a] is the integer part of a ∈ R. In this case we have that as n → ∞, and likewise, we can show (A.11). Therefore, we complete the proof.
Next we show that the distribution of max 1≤ ≤N |Y n, | can be approximated by that of max 1≤ ≤N |Y | where Y = (Y 1 , . . . , Y N ) is a normal random vector in R N . For this, we prepare two lemmas. Proof. Since the covariance between Z n,j (x ) and Z n,k (x ) for j = k is asymptotically negligible with respect to the variances of each term by the proof of Proposition A.3, it is sufficient to prove max 1≤k, ≤N Cov(Z n,1 (x k ), Z n,1 (x )) Since 1/ min 1≤ ≤N σ 2 n (x) h 2α−δ−1 , from the same argument of the proof of Proposition A.3, we have that max 1≤k, ≤N Cov(Z n,1 (x k ), Z n,1 (x )) Cov(Z n,k (x m1 ), Z n, (x m2 )) σ n (x m1 )σ n (x m2 ) is asymptotically ignorable for 1 ≤ m 1 , m 2 ≤ N . Therefore, the proof of Lemma A.10 yields that max 1≤k, ≤N This completes the proof. This also implies that there exists a sequence of constants n ↓ 0 such that P |U n − V n | > n (log n) −1/2 ≤ n (which follows from the fact that convergence in probability is metrized by the Ky Fan metric; see Theorem 9.2.2 in [28]). Then we have that P (U n ≤ t) ≤ P {U n ≤ t} ∩ {|U n − V n | ≤ n (log n) −1/2 } + P {U n ≤ t} ∩ {|U n − V n | > n (log n) −1/2 } ≤ P V n ≤ t + n (log n) −1/2 + n for any t ∈ R. Theorem 4.1 yields that there exists a sequence of constants n ↓ 0 such that P V n ≤ t + n (log n) −1/2 ≤ P G n ≤ t + n (log n) −1/2 + n for any t ∈ R where G n = max 1≤ ≤N |Y |. From the anti-concentration inequality for the maxima of Gaussian random vector (Theorem 3 in [19]), the right hand side is bounded from above by P (G n ≤ t) + 8 n (log n) −1/2 E[G n ] + n . Since E[G n ] ≤ D log n for some positive constant D which does not depend on n, we have that P (U n ≤ t) ≤ P (G n ≤ t) + 9D n + n = P (G n ≤ t) + o(1) (A. 12) for any t ∈ R. We also have that for any t ∈ R. Therefore, we can show that for any t ∈ R. Combining (A.12) with (A.13), we obtain the desired result.

Appendix B: On asymptotic validity of confidence bands
We use the notations used in the proof of Theorem 4.2 here. Let q Un τ denotes the (1 − τ )-quantile of U n . Theorem 4.2 implies that there exists a sequence n ↓ 0 such that sup t∈R |P (U n ≤ t) − P (G n ≤ t)| ≤ n .
Then we have that where the last inequality holds G n has continuous distribution from the anticoncentration inequality (see Theorem 3 in [19]). This yields the inequality q Un τ ≤ q τ − n . Therefore, we have that Likewise, we have the inequality q τ + n ≤ q Un τ . This yields that Then we obtain P (U n ≤ q τ ) → 1 − τ as n → ∞.