Minimax rates for the covariance estimation of multi-dimensional L\'evy processes with high-frequency data

This article studies nonparametric methods to estimate the co-integrated volatility for multi-dimensional L\'evy processes with high frequency data. We construct a spectral estimator for the co-integrated volatility and prove minimax rates for an appropriate bounded nonparametric class of semimartingales. Given $ n $ observations of increments over intervals of length $1/n$, the rates of convergence are $1 / \sqrt{n} $ if $ r \leq 1 $ and $ (n\log n)^{(r-2)/2} $ if $ r>1 $, which are optimal in a minimax sense. We bound the co-jump index activity from below with the harmonic mean. Finally, we assess the efficiency of our estimator by comparing it with estimators in the existing literature.


Introduction
Lévy processes are the main building blocks for stochastic continuous-time jump models. Whenever the modeling of a stochastic process in finance requires the inclusion of jumps, Lévy processes are those to be considered. They play an instrumental role, for example, in the modeling of financial data, see Carr et al. [2002], Shephard [2004, 2006], Wu [2007], Eberlein and Papapantoleon [2005], Geman [2002].
Consequently, the large amount of applications has given rise to a great demand for statistical methods in the study of Lévy processes, especially nonparametric methods. Using nonparametric methods relaxes any dependency on the model. The problem of estimating the characteristics of a Lévy process has received considerable attention over the past decade. Starting with the work by Belomestny and Reiß [2006], a number of articles have considered nonparamet-serves to fill this gap. Jacod and Reiß [2014] proposed a spectral estimator for integrated volatility achieving minimax rates. In the present work, we generalize their work on finite dimensions. By virtue of simplicity, we will concentrate primarily on a two-dimensional regime, but extensions to the general multidimensional setting are straightforward to obtain as well.
Here, we are interested in investigating the optimal rates for the estimation of co-integrated volatility when the model falls in a class S r,L M of two-dimensional Lévy processes, the co-jump activity r is either of finite or infinite variation. The size of co-jumps (x 1 , x 2 ) satisfy R 2 1 ∧ |x 1 x 2 | r/2 F (dx 1 , dx 2 ) ≤ M with r ∈ (1, 2), M ∈ R and F the co-jump measure. Let r 1 , r 2 be the index of jump activity for the small jump components. We find that r is bounded from below by the harmonic mean of r 1 , r 2 , even in the case of infinite variation jumps. This was not known up to now. Under this assumption for co-jumps we show that our spectral estimate for co-integrated volatility converges at a rate (n log n) (r−2) 2 if r > 1 and 1 √ n if r ≤ 1. Assuming a 2-dimensional Itô semimartingale,  proposed a truncated covariance estimator to estimate co-integrated volatility at the rate 1 √ n when r 1 is small and r 2 is close to 1, n − 1 2 1+ r 2 r 1 −r2 when r 1 , r 2 is much bigger than 1 and close to 2, n r 2 2 −1 when r 1 is small and r 2 is much bigger than r 1 or in case of independent small jump components. However, these rates are sub-optimal for the class which we described in the last paragraph .
Let us describe the outline of this paper. In section 2 we state the underlying model. In section 3 we give the assumptions to be satisfied in order to prove the minimax rates. In section 4 we construct our spectral estimator for the cointegrated volatility. The upper bound for the family of our estimators is also presented. In section 5 we prove the lower bound in a minimax sense. We provide some comparison of our estimator with existent estimators in the literature in section 6. In the last section we provide a simulation study.

The underlying model
We assume equidistant discrete observations with the consecutive time between two observations being i∆ n , i = 0, · · · , n for a mesh ∆ n → 0. Here, we use as a mesh ∆ n = 1 n and n → ∞. Regarding the time horizon of the process, it is observed on a finite time span [0,1]. Let X = (X (1) , X (2) ) be a two-dimensional Lévy process with Lévy-Itô decomposition as x ≤1 x(µ −μ)(ds, dx) + t 0 x >1 xµ(ds, dx). (2.1) Unless stated otherwise, from now on b is a drift vector in R 2 , W = (W (1) , W (2) ) denotes a bivariate Brownian motion with covariance matrix ΣΣ , and µ,μ are the jump measure and its compensator, respectively. The compensator takes the formμ = dsF (dx), where F is the Lévy measure of X.
Due to the independence of the continuous part and the discontinuous (jump part) of a Lévy process, the analysis of X canonically splits into the inference on the covariance matrix and the inference on the jump measure F . Our focus on this paper is to investigate an estimator for the co-integrated volatility of X.
We assume a filtered space (Ω, F, (F t ) t≥0 , P) supporting two independent standard Brownian motions W (1) , W (3) and two Poisson random measures µ (j) for j = 1, 2 on R 2 ×[0, 1]. Recall that W (1) , W (2) are correlated with d W (1) , W (2) t = ρdt, where ρ is a constant on [0, 1]. We construct W (2) as a linear combination of the two independent Brownian motions so W Next we calculate the variances and covariance of W (1) , W (2) , we see that the following holds V ar(W (1) For the covariance we obtain the last equality holds because of W (1) , W (3) being independent. So, without loss of generality we assume that where σ (i) , i = 1, 2 are deterministic. Therefore, the global quadratic variation of X is given by: where the first term is the co-integrated volatility and the second term is the sum of products of simultaneous jumps (called co-jumps). Our target of inference, the co-integrated volatility at time 0 ≤ t ≤ 1, is

Assumptions
To derive an estimator for co-integrated volatility and then prove minimax bound for this estimator, we need to establish some assumptions regarding the behavior of small jumps and the class of our estimator. In particular, our setup is intrinsically nonparametric and related to the properties of the observed path.
The quantity to estimate, namely C 12 t , depends on the space (Ω, F, (F t ) t≥0 , P) and on X. We consider a family which satisfies the following assumption.
Assumption 2. For any M > 0 and for any t ∈ [0, 1], we consider the class S r,L M = the set of all Lévy processes satisfying where r ∈ [0, 2) and the covariance matrix ΣΣ is uniformly bounded from above, component-wise.
The second term of (3.1) concerns the behavior of small jumps. By the above assumption we consider the problem of controlling the activity of co-jumps, i.e. common jumps. Below, a co-jump, say at time t, means that both components jump at this time but their jump sizes are not the same. Ultimately, we are asking that is to say if the small jump components are of finite or infinite variation. This question concerns the behavior of the compensator F , the Lévy measure, near 0. There is always a finite number of big jumps. The question is whether there is a finite or infinite number of small jumps. This is controlled by the behavior of F , near 0. The major difficulty here comes form the possibly erratic behavior of F near 0. The second part of (3.1) aims to control this behavior in a mild and not very restrictive way.
Our interest in identifying the Blumenthal-Gettoor (BG) index lies in the fact that this index allows us to classify the processes from least active to most active, according to the above description. We denote by r the BG index for co-jumps which satisfy (3. 2) The problem of BG index estimation from discrete observations of a Lévy process has drawn much attention in the literature. In the case of high-frequency data, Aït-Sahalia and Jacod [2011] studied the problem of estimating the jump activity index that is defined for any Itô semimartingale. A consistent estimator for the BG index based on low-frequency data was obtained in Belomestny et al. [2010]. The interested reader should refer to Belomestny and Reiß [2015], section 7 for a detailed review of these results. An extension to a more general framework and models can be found in Belomestny and Panov [2012], Belomestny et al. [2013]. Remark 3.1. Notice that the second term of (3.1) in Assumption 2 follows from the classical condition to control the activity of small jumps in two dimensions. Through a trivial calculation We assume that By using this unconventional condition (3.3), we relax the classical condition for small jumps and make our results stronger. Now we will test this assumption about the boundedness of small co-jumps with some trivial examples. Despite its simple nature, the following example offers significant insight and intuitive understanding into co-jumps with infinite variation.
Example 3.2. Suppose we have independent jumps in the coordinates and F (dx) is a Lévy measure on R 2 and x is a vector in R 2 . Then, if the marginals of a two-dimensional Lévy process are finite in the one dimensional case, see the assumption section in Jacod and Reiß [2014].
In this example, we notice that which means that the deterministic error of our estimator is equal to zero, see subsection 4.2 for further details. This example shows us something more: Whenever we have independent jumps, no matter the choice of F , we can always find a control for the activity of small jumps. Even if we have jumps of infinite activity or variation.
Example 3.3. We assume X to be an a-stable Lévy process in R 2 , F (dx) = ||x|| −2−a dx for a ∈ (0, 2). Then it satisfies (3) if a < r. Indeed, In the second equality, we substitute x with its polar coordinates.
We are interested in studying the co-jump activity in the case that at least one of the jump components is of infinite variation; we assume without loss of generality r 2 ≥ 1.
We observe that the index activity of common jumps is bounded from below by the harmonic mean. Each component X (i) has its own index r i , different from the others. In the following we will describe the method for obtaining the index activity of co-jumps. The BG index of a Lévy process depends only on the Lévy measure F . r is an index taking care of positive and negative jumps, for simplicity's sake but without loss of generality we develop our method for the case in which the Lévy measure is one-sided, i.e. X (i) only makes positive jumps. Thus, r will be influenced by the dependent structure between the jump components.
We will use a Lévy copula to describe this dependency. The concept of Lévy copula allows us to characterize in a time-independent scheme the dependence structure of the pure jump part of a Lévy process. Here, we use the Lévy copula, which permits a range from a dependent to a total independent framework. For the definition and concepts of independence and total positive dependence copula we refer to Kallsen and Tankov [2006]. The next definition is taken from .
The small jumps of each X (i) are following a r i -stable Lévy process with Lévy measure the tail integral of the marginal Lévy measure F (i) of the small jumps of X (i) . Note that r i is the BG index of X (i) . We can have only two types of jumps: disjoint jumps or joint ones. The disjoint jumps have sizes of either (x 1 , 0) or (0, x 2 ). This means that we have jumps only on the Cartesian axes. The independent copula regulates such disjoint jumps. On the other hand, the joint jumps are regulated by the dependent copula; their size falls into the point (x 1 , x 2 ). The joint jumps are completely positively monotonic, i.e. there exists a strictly increasing and positive function f such that ∀t > 0, ∆X . This means that when x 1 is a jump realisation so there is a realisation x 2 such as x 2 = f (x 1 ), then x 1 is interpreted as the first component of the joint jump. In fact, the sizes (x 1 , x 2 ) are supported by the graph x 2 = f (x 1 ). For the dependent copula we need the minimum between U 1 (x 1 ) and U 2 (x 2 ), which is attained when U 1 (x 1 ) = U 2 (x 2 ). Hence, the graph x 2 = U −1 2 (U 1 (x 1 )) supports the joint jumps. In our case we assume one-sided r i -stable processes, which means that the union graph of the joint jumps is given by We denote by F γ the Lévy measure in terms of the Lévy copula. Therefore, (3.5) In view of example (3.2), the integral with respect to the independent copula will be equal to zero. Then it turns out that (3.2) will take the form (3.6) Remember r 1 ≤ r 2 and c 1 ≤ c 2 , then for sufficiently small > 0 we have In light of the above calculations, in order for the integral in (3.7) not to be divergent we need r1 r2 + 1 r 2 − 1 − r 1 > −1, which means that r > 2r1r2 r1+r2 . We observe that r, the index activity of co-jumps, is at least the harmonic mean of the indices r 1 , r 2 . In addition, 2r1r2 r1+r2 ≥ r 1 , since we assume r 1 ≤ r 2 . To conclude, the Blumenthal-Getoor (BG) index of the co-jumps will be We see here that the higher the activity of one jump component, the higher the activity of co-jumps. If at least one of the jump components has an index activity of more than 1 then the co-jump activity follows the jump component with the most active small jumps.
Next we proceed to the construction of a spectral estimate for the co-integrated volatility. Given the fact that we know an estimate for the integrated volatility IV , we should consider a straightforward estimate for co-integrated volatility. By polarization, IV X (1) + X (2) /2 − IV X (1) /2 − IV X (2) /2, is a possible estimator for the co-integrated volatility. However, we refrain from using this estimate because the rates of convergence are slower than following the procedure as in section 4. Let us illustrate this argument with an example.
Example 3.6. We assume X (1) , X (2) be independent r i − stable Lévy processes for i = 1, 2 such that 0 ≤ r 1 ≤ r 2 < 2 and r 2 ≥ 1 with triplets 0, 0, F 1 (dx 1 ) and 0, 0, F 2 (dx 2 ) respectively. The Lévy measure of the two-dimensional Lévy process will be given by F (dx) = F 1 (dx 1 ) + F 2 (dx 2 ). Now let us describe the dependent structure between the jump components. Recall the dependence construction from Assumption 3. We assume x 2 = f (x 1 ) = −x 1 and The Blumenthal-Getoor index of the co-jumps will be given by: As a consequence, we find that ρ > r 2 so the Blumenthal-Getoor index is r = r 2 .

Upper bound
We are in a nonparametric setting in which the process X satisfies Assumption 2. We prove that a sequence of estimators C 12 n achieves the uniform rate w n for estimating C 12 t at t = 1 on S r,L M .

Defining the spectral estimator
We adapt an estimator proposed by Jacod and Reiß [2014]. Specifically, we let X be a two-dimensional Lévy process with characteristic triplet (b, C, F ). The characteristic function of X 1/n is given by: is the covariance matrix and u n = (U n , U n ). In the same vein, we define the characteristic function φ(ũ n ) whereũ n = (U n , −U n ).
Let us remember that we are in a high-frequency setting and the consecutive time between two observations is 1 n . Following a trivial calculation we get that We consider, based on the observations, the empirical characteristic function of the increments, at each stage n: We now define the spectral estimator The first main result of this paper is the theorem which will give us an upper bound for the family of our estimators satisfying Assumption 2. Note that the argumentation is in line with the standard nonparametric error decomposition in a bias and variance part. Precisely, recalling the form of our estimator we can write the estimation error C 12 n (U n )−C 12 as the sum of the deterministic and stochastic errors. Specifically, we define the deterministic error as and the stochastic error as As a result, we define the estimation error as C 12 (U n ) − C 12 = D n + H n .

Bounding the deterministic error
Lemma 4.1. Under Assumption 2 the deterministic error satisfies D n ≤ M 2 U r−2 n + KU −2 n , where K is a positive constant. Proof. Recall the characteristic function of X 1/n in (4.1). We define Cũ n ,ũ n +d n . Notice that here we use an argument of complex analysis. After taking the absolute value of the characteristic function, the imaginary part of the exponent is vanishing. Summing up, (4.6) By (4.6), we have where we used the fact that cos x − cos y ≤ 2 ∧ |x 2 − y 2 |. Using the inequality a ∧ b ≤ a r b 1−r for r ∈ [0, 1], the last term can be bounded as follows In the last line the dominant part is the first term of the sum, because the second integral vanishes faster. By Assumption 2 and for some constant K > 0 as required.

Bounding the stochastic error
We want to investigate how close the empirical characteristic function is to the characteristic function of a two-dimensional Lévy process. The variables e i un,∆ n j X are i.i.d. as j varies, with expectation φ n (u n ). The same statement holds true for e i ũ n ,∆ n j X as well. First, we study the variance of the absolute value for the empirical characteristic function.
Lemma 4.2. Let φ n (u n ) and φ n (ũ n ) be empirical characteristic functions of X 1/n then Var(| φ n (u n )|) ≤ 2 n and Var(| φ n (ũ n )|) ≤ 2 n . Proof. Indeed, we get Var| sin u n , ∆ n j X | ≤ 2 n . (4.8) Here we used the fact that e i un,∆ n j X are i.i.d. for the second inequality and the fact that Var| cos u n , ∆ n j X | ≤ 1 and Var| sin u n , ∆ n j X | ≤ 1 for the last inequality. Using the same arguments, we prove that Var | φ n (ũ n )| ≤ 2 n and the claim holds.
whereỸ n,j = e i un,∆ n j X − E[e i un,∆ n j X ]. Therefore, (4.10) Here, we refer again to the fact that the variables e i un,∆ n j X are i.i.d. as j varies and |e i un,∆ n 1 X | 2 ≤ 1. Thus, the claim holds.
We choose u n = U n , U n ) andũ n = (U n , −U n ). Particularly, we choose for M > 0, r ∈ [0, 2) and n large enough Lemma 4.5. By Assumption 2, for some constant Γ > 0 and on the event |V n | ≤ 1 n r/4 the stochastic error satisfies: Proof. The stochastic error is defined as follows: The first quantity we need to bound is (4.13) The last inequality holds because as well as because of Assumption 3 and the result of the one-dimensional case, according to which the marginals C 11 , C 22 are bounded by some constant A, see in Jacod and Reiß [2014] (L-r). We get 1 φ n (u n ) ≤ e 1 n U 2 n (A+M ) .
Next, the form of U n (4.11) implies that where Γ = e A+M . Let us now argue that as soon as n ≥ n 0 = (2Γ) 4 (2−r)∧r and by (4.14) on the set |V n | ≤ 1 n r/4 and |Ṽ n | ≤ 1 n r/4 . Therefore, Vn φn ≤ 1 2 . Accordingly, for the stochastic error on the events |V n | ≤ 1 n r/4 and |Ṽ n | ≤ 1 n r/4 we obtain: . (4.15) The last inequality is true due to the basic inequality log(1 + x) ≤ x and the fact thatṼ n φn(ũn) and Vn φn(un) are small enough by (4.14). Therefore, . (4.16) Henceforth, for n ≥ n 0 , and for some universal Γ > 0, we have The third inequality holds because we applied the Cauchy-Schwarz inequality.
To sum up, as required.
Remark 4.6. Here, we are interested in the events |V n | ≤ 1 n r/4 and |Ṽ n | ≤ 1 n r/4 because the probabilities of the events |V n | > 1 n r/4 and |Ṽ n | > 1 n r/4 are negligible. Indeed, applying the Chebyshev inequality we get which tends towards zero as n → ∞. Likewise, the probability of the event |Ṽ n | > 1 n r/4 tends towards zero as n → ∞.
We are now ready to state that the family of our estimators C 12 n (X) is uniformly tight on S r,L M and thus establish an upper bound for our estimator. Theorem 4.7. Let Assumptions 1-3 hold. Assume M > 0 and r ∈ [0, 2), then the family of estimators C 12 n (U n ) with Particularly, we have that the family of estimators C 12 n is consistent with the theoretical co-integrated volatility C 12 with the exact rates of convergence w n .
Proof. First, we prove that the family of our estimators is asymptotically consistent. Applying the Markov inequality, we get for every > 0 (4.19) Further applying Lemmas 4.5 and 4.1, we see that: which tends to zero as n → ∞. Let us now argue that which means that | C 12 n (U n ) − C 12 | = O P (w −1 n ) as n → ∞. Indeed, following from (4.20) the family of our estimator C 12 n (U n ) converges in probability to C 12 at rates 1 √ n when r ≤ 1 and (n log n) r−2 2 when r > 1. So for any fixed > 0 our family of estimators is uniformly tight in S r,L M with the exact rates w n , and the claim follows.

Lower Bound
In nonparametric statistics it is common to use a minimax approach in order to prove optimal estimators. In the previous section, Theorem 4.7 gave us an upper bound for the estimation of co-integrated volatility using a spectral approach and establishing the rates (5.1) on the class S r,L M . In this section, we want to prove that these rates achieve a lower bound as well. The existence of a lower bound on the class S r M constitutes the exact minimax rates for the estimation of co-integrated volatility. Indeed, we have something more for the lower bound, namely that any estimator on the class S r M of Itô semimartingales achieves a lower bound with rates (5.1). So far, we do not know whether the spectral approach for the upper bound yields the same optimal rate on the larger class S r M . We refer to Chapter 2 in Tsybakov [2009] for the techniques to prove the lower bounds. We establish the lower bound following the argumentation in line with a two-hypothesis test. Next, we introduce a distance between probability measures that will be useful for the lower bound .
To sum up, in order to prove a lower bound on the minimax probability of error for two hypotheses we use the theorem [2.2] in ?. The lower bound is obtained when the following two properties are satisfied. First, we choose the appropriate parameters for the co-integrated volatility to be close enough but distinguished. Second, we bound from below the total variation distance between the two densities probabilities of our parameters.
Let us illustrate the above procedure giving a trivial lemma and proving that the above arguments are adequate, so as to obtain the lower bound corresponding to our family of estimators S r,L M . The interested reader may refer to Lehmann and Romano [2006] who explore a lot of examples for hypothesis testing and distances between Gaussian random variables.
Lemma 5.2. (No jumps for a two-dimensional Lévy process). Assume X is a Lévy process on S r,L M with Lévy-Khintchine triplet (0, ΣΣ , 0). This means that there are no jumps. Then any uniform rate w n within the class S r,L M for estimating the co-integrated volatility σ 1 , at time t = 1, satisfies where A, K are constants and σ n is any estimator for the co-integrated volatility, d is the euclidean distance on R 2 and w n = 1 n . Proof. Consider X and Y are Lévy processes in S r,L M . Also, we assume that no jumps are occurred so the Lévy Khintchine triplets for each process will satisfy (0, Σ X Σ X , 0) and (0, Σ Y Σ Y , 0) respectively. As a result, X will evolve as follows: (1) t is normally distributed with mean 0 and its variance is given by Itô isometry which translates to Therefore, X follows the parametric model We will prove the lower bound using the two-hypothesis test, as mentioned in the beginning of this section. We observe that This is enough to prove a lower bound for the rate w n = 1 n for the class of all Brownian motions.
Intuitively, we perturb the off-diagonal elements, namely the covariance, by the rate we want to achieve. Following the argumentation of the two-hypothesis test, it is sufficient to prove that the total variation distance is bounded. To do so, we use the Pinsker inequality. By Pinsker inequality we have that where KL(P Y , P X ) is the Kullback-Leibler divergence. Next, we show that the Kullback-Leibler distance is bounded. We define the Kullback-Leibler divergence between two multivariate normal distributions. Here, we denote by Σ 1 = Σ X Σ X and Σ 2 = Σ Y Σ Y . Therefore, where | · | denotes the determinant of a matrix. Calculating the appropriate quantities, we obtain |Σ 1 | = | 2 1 1 1 | = 1, Therefore, Consequently, we obtain that KL(P Y , P X ) tends to zero. By Pinsker inequality, the total variation distance tends to zero. Upon using the minimax probability of error is bounded from below by 1/2 and the claim follows.
Next we state the theorem for the bigger class X ∈ S r,L M and a fortiori in the class S r M of Itô semimartingales. Theorem 5.3. Let X be a Lévy process in S r,L M , r ∈ [0, 2) and M > 0, then any uniform rate w n for estimating σ 1 , namely the co-integrated volatility at time t = 1 within the class S r,L M , satisfies the optimal rates w n where A, K are constants and σ n is any estimator for the co-integrated volatility, d is the euclidean distance on R 2 .
To prove this theorem we need to construct the two-hypothesis test in order to bound from below the minimax probability of error.

Two-hypothesis test
We let X, Y be two-dimensional Lévy processes with respective triplets (0, Σ X Σ X , F n ), x 2 ) is a vector in R 2 representing the size of small jumps for each process and M is a constant (below M changes from line to line and may depend on r, but all constants are denoted as M ). We set Σ X Σ X = ( 2 1 1 1 ) and to be the parameters of our two-hypothesis test. Under this setting, we perturb the off-diagonal elements with the rate with which we want to achieve the lower bound. The quantity which we want to recover is the co-integrated volatility, so we need the off-diagonal elements. We use these forms of matrices in order for the Gaussian part to be non-degenerated, namely the eigenvalues of the matrices to be positive. As we discussed in the beginning of this section, it is sufficient to construct two sequences X n , Y n of Lévy processes belonging to the class S r,L M , with the following two properties: Property 1. The two parameters, namely the two covariance matrices are close enough but distinguished.
Note that for this property the object of our study is the distance between matrices. In this case we consider as a distance the Frobenius norm, and everything still holds. By construction and Frobenius norm which means that the parameters are close enough but distinguished.
Property 2. The total variation distance between P X and P Y tends towards zero.
As far as the second property is concerned, the total variation distance tends towards zero is not trivial. In fact, achieving the second property is quite demanding and we prove several lemmas to conclude this property.

Construction of the co-jump measure in R 2
First, we have to construct a measure to satisfy property (5.2). Before we proceed with the technical part of this construction, let us highlight the idea behind it.
Note that we are studying a two-dimensional Lévy process, so it is reasonable to include the possibility of dependence between the two jump components, more specifically the common jumps, i.e. the co-jumps.
Observe here that co-jumps are one-dimensional objects. Co-jumps are the jumps on the diagonal, due to the fact that the two processes jump at the same time with the same jump size. Mathematically speaking, this can be formalized as follows: Definition 5.4. (Co-jump measure) Let X = X (1) , X (2) be a Lévy processes, with ∆X (j) t = 0 for j = 1, 2. Here, ∆X t − denotes the possible jump at time t. The measure on R 2 is defined by: where B = (x 1 , x 2 ) ∈ R 2 : x 1 = x 2 . This is called the Lévy measure in R 2 of co-jumps for X. F n (B) is the expected number of joint jumps, per unite of time, whose size falls into B, and µ is the Poisson random measure of co-jumps where, µ X (ω; t, B) = s≤t 1 B (∆X (1) t , ∆X (2) t ). Because the jump dynamics of the co-jump measure is dictated by its density, say f n , we can write the measure of the co-jumps as following, for The support of the co-jump measure is on R but the co-jump measure lives on R 2 . We focus on the case of co-jumps, i.e., when X (1) and X (2) jump at the same time with the same jump size. We are interested in the jumps on the diagonal.
Further, we do not integrate with respect to the Lebesgue measure, since it is equal to zero on the diagonal. In this case we integrate with respect to a measure that is not absolutely continuous with the Lebesgue measure, which we call co-jump measure. By (5.3) we want to show that without being equal to zero. Being interested in the set of co-jumps, we pass from two dimensions to one dimension. Co-jumps are the concept of total dependency between the small jump components. Indeed, we use the argument of dependency in order to reduce dimensionality. In order to prove this argument, we need the following lemma.
Lemma 5.5. Let g : R 2 → R be a measurable function and F n be the co-jump measure on R 2 . Then dx is the measure of co-jumps, f n the density function of the co-jump measure F n , A ⊂ B, and B = (x 1 , x 2 ) ∈ R 2 : x 1 = x 2 .
Proof. First we use the step functions to prove the lemma.This extends by linearity and by taking limits for all measurable functions g. Indeed, we only need to show the lemma for the case of step functions. Let g(x 1 , (5.5) and the claim follows.
Furthermore, we need to find a measure whose mass is bounded away from the origin but may explode around 0 and integrates x 2 . In order to construct the co-jump measure with the above properties, we need to find an appropriate density function for the co-jumps measure F n (A) so as to satisfy the following condition for r ∈ (1, 2) and x = (x, x): Indeed, the following lemma implies condition (5.2) by choosing properly the density function of the co-jumps.
Proposition 5.6. Let w n be defined by (5.1) and r ∈ (0, 2). Assume the even functions h n : and U n = 2w 1/(r−2) n . Then, for any A ∈ B(R 2 ) where H n is the Fourier transform of h n and F n (A) = Proof. The mathematical tool used for the formation of the density function is the Fourier transform. Intuitively, we use the function h n as a constant inside a fixed interval and which decays exponentially outside this interval. Also, notice that in the exponential we used the power of 3 because we need to differentiate two times, as we shall see later.
Notice that h n has a range on R, which is why we use the Fourier transform on R. The pair of Fourier transform takes the following form: and the respective first derivatives will have the form For a thorough analysis of the Fourier transform the interested reader should refer to Bracewell [1986].
In the next step, the pair of Fourier transform will provide us with a proper and well-defined density function for the co-jump measure. First, we note that the L 2 -norm of h n is bounded. Indeed, (5.7) In the last inequality we used the fact that ∞ 0 e −2K 3 dK is bounded by a constant C. In addition, h n is an L 2 -function. Applying the Plancherel theorem we deduce that H n L 2 = h n L 2 ≤ Cw r−1 r−2 n . (5.8) Similarly, we get a bound for the first derivative of H n ∂ 1 H n ≤ ∂ 1 h n L 2 ≤ Cw n . (5.9) Moreover, H n L 1 is also bounded (5.10) We get the first inequality because of the Cauchy-Schwarz inequality. By means of (5.8) and (5.9) the L 1 -norm of H n is bounded. At this point we are ready to define the co-jumps measures F n (A) and G n (A) in terms of the Fourier transform H n (x).
These measures satisfy the basic properties of Lévy measures. They are nonnegative, integrate x 2 , and may explode around zero since H n (0) → ∞.
|H n (x)| ≤ |e iU x h n (U )|dU ≤ |h n (U ) cos(U x)|dU + i |h n (U ) sin(U x)|dU = |h n (U ) cos(U x)|dU. (5.13) In the first line the second term is equal to zero since it is the integral of the product of an even and an odd function. (5.14) Note that in the second inequality the integral is always bounded from above by a constant C.

Characteristic functions of X 1/n and Y 1/n
At this point, we study the processes X, Y for one observation at the moment t = 1 n . We denote by ψ n (u), φ n (u) the characteristic functions of X 1/n , Y 1/n respectively, and by η n (u) their difference. The characteristic triplet for each process is where Σ X Σ X = ( 2 1 1 1 ) and Σ Y Σ Y = 2 1+2wn 1+2wn 1 . Denote by C X = Σ X Σ X and C Y = Σ Y Σ Y . By (5.11) and the fact that H n is an even function, its Fourier transform will be a real function. The characteristic functions will be defined as follows and ψ n (u) = exp − 1 2n C X u, u + 2φ n (u) + 2ψ n (u) . (5.17) We denote byφ since the Fourier transform of the co-jump measure has support on R and A is a subset of the diagonal. Moreover, C X u, u = 5U 2 and C Y u, u = 5U 2 + w n U 2 .
Next we bound from above (5.18) and (5.19). First, observe that since H n is an even function. We consider the following two cases: (5.20) Now, concerning theφ n (U ) we exploit the same arguments as before and by (5.10) we obtaiñ (5.22)

Total variation distance
In order to establish a lower bound for our class with the rates (5.1), the last ingredient to be shown is that the total variation distance between P X and P Y goes towards zero, property 2. Mathematically speaking, this formulates as As we discussed in the first step, X and Y have a nonvanishing Gaussian part so that the variables X 1/n and Y 1/n have densities. Here, f 1/n and g 1/n denote their densities respectively. Also, k n = f 1/n −g 1/n denotes the difference between the densities. One would be tempted to claim the following (5.24) In the second equality we wrote the density function as the Inverse Fourier transform of its characteristic function. But the last integral is infinite. Hence, this procedure is not working for our goal. Since we want to prove that 2n |φ n (U ) − ψ n (U )|dU → 0, (5.25) we know that the total variation distance between P X and P Y is not more than 2n times |k n (x)|dx. By using the same argument as for the Jacod and Reiß [2014] Theorem [3.1], by Cauchy-Schwarz inequality and Plancherel theorem, we obtain |k n (x)|dx = 1 where ∂ 1 η n is the first derivative of η n (U ). In the second inequality we used the Cauchy-Schwarz inequality, and in the last one we used Plancherel identity. By virtue of simplicity, remember that we use the same coordinates for the vector u = (U, U ). Thus, the only ingredient which remains to be shown is the following lemma in order to satisfy Property 2.
Proof. First, we study the convergence of η n (U ): (5.27) The last inequality holds due to the fact that 1 − e −x ≤ x.
Observe that when |U | ≤ U n , η n (U ) = 0 because of the constant value of h n inside this interval. Thus the difference of the characteristic functions vanishes for |U | ≤ U n because of 2ψ n (U ) = w n U 2 by (5.19).
Therefore by (5.20), (5.21), and the fact that ψ n (U ) ≤ e − U 2 2n , φ n (U ) ≤ e − U 2 2n we get that (5.31) Therefore, (5.32) Now, 1+w r−1 r−2 n n 2 and 1+w r−1 r−2 n n tend towards zero. Additionally, the integrals on the right side can be bounded again using Cauchy-Schwarz inequality, like integral A. Following these, we can calculate the integrals through basic calculus methods exactly. As a result, which also goes to zero as n → ∞ and the proof is completed.

End proof of Theorem 5.3
Lower bound for the rate w n = 1 √ n when r ∈ (0, 2). To prove this bound, it is enough to show that it holds on the subclass of all Brownian motions since S r,L M ⊃ S BM M . Taken together with Lemma 5.2, this bound is achieved. Lower bound for the rate w n = 1/(n log n) 2−r 2 when r ∈ (1, 2). Our goal is to prove by two-hypothesis test that the rate w n achieves a lower bound for the class of estimators on S r,L M , hence a fortiori on S r M . Indeed, the main steps of this proof are to show that Property 1 and Property 2 are satisfied. Now, with reference to Lemma 5.5 for the construction of co-jump measure, Proposition 4.20 and Lemma 5.7 we conclude the proof of Theorem 5.3.

Discussion
In this section we make some important remarks concerning the upper bound and the rates of convergence. First, we want to compare the efficiency of our estimator with the work of  in which she considered at least one jump component of a two-dimensional Itô semimartingale with infinity variation.  introduced the truncated realized covariance (TRC) as an estimator for co-integrated volatility. The proposed estimator is where r h = h 2u is the truncation level with h = 1/n, u ∈ (0, 2) and n → ∞. It is clear that, when r h → 0 then no jumps occurred. It is assumed that the 6.1. Numerical experiments.
In this section we test our estimates with Monte-Carlo experiments. 1 This means that we first have to simulate the sample paths of a bivariate Lévy process on [0, 1].
Section 6 of Tankov [2003] suggested various simulation algorithms for Lévy processes. We extend here Algorithms 6.6, 6.5, 6.3 to a bivariate setting. In addition, we use the generalized shot noise method for series representation of a two-dimensional Lévy process of infinite variation introduced by Rosinski [1990].
We now perform Monte-Carlo tests of our spectral estimate C N 12 (U N ), comparing it to the Truncated Realized Covariance (TRC) estimate IC T of  for a two-dimensional Itô semimartingale. To provide a balanced comparison, we will draw our observations from a process X t = B t + J t , where B t is a two-dimensional Brownian motion and J t is a two-dimensional jump process. Its jumps are driven by a two-dimensional r-stable process. X t thus models a process with both diffusion and jump components. In each run of our simulation, we will generate N = 1000 observations, corresponding to observations taken every 1/1000 over a time interval [0, 1]. The estimates C 12 N (U N ) and IC T depend on a number of parameters. We begin by considering the covariance matrix C = ( 2 1 1 1 ) for two correlated Brownian motions. In our simulations, the cointegrated volatility of X t is equal to 1, and so we may choose the parameters accordingly. In our tests, we found the value M = 4.229 worked well for bounding from above the jump activity in the case of infinite variation jumps. In the case of IC T we chose h = 1/1000, u = 0.387, and as truncation level r h = 1 1000 2 * 0.387 . We found that this truncation level cuts the jumps bigger than 1 1000 2 * 0.387 , which means that almost all jumps were eliminated. Figure 1 plots the simulated distributions of the estimates C 12 N (U N ) and IC T together with the density of a standard Gaussian distribution, shown as a solid line. We can see that in every choice for r 1 , r 2 , the estimates are centered around 1, which is the expected theoretical cointegrated volatility. Figure 2 plots the RMSEs of the estimates C 12 N (U N ), IC T against different choices for the index activity of the co-jumps. We study the performance of the estimates under finite, moderate, and infinite activity of co-jumps. We can see that, as N grows, the RMSE of the spectral estimate C 12 N (U N ) is getting smaller compared with the truncated estimate. However, we observe that the RMSEs of the truncated estimate IC T are smaller compared with the spectral estimate when N = 1000. We observe this behavior in Figure 2 for the truncated estimate IC T because of our choice of truncation level, which is not an optimal. While the threshold r h = 1 1000 2 * 0.387 works well for N = 1000, it does not work well when the number of observations is bigger, for example when N = 10000. Figures 3, 4 give violin plots for the spectral estimate C 12 N (U N ) under a number of choices for the amount of observations N and the index activity for the co-jumps, whilst Figures 5, 6 show violin plots for the truncated estimate IC T under the same settings. The number of observations varies from 1000 to 10000 by step 1000. In Figure 3, we used as an index activity for the jumps r 1 = 0.5,  r 2 = 0.8, while in Figure 4 we set r 1 = 1.2 and r 2 = 1.8. In the case of r 1 = 1.2  and r 2 = 1.8, we see that the estimation for the covariance slightly deviates from the center as N grows. Furthermore, the effect can be expected to disappear as N tends toward infinity. Figures 5 and 6 show again that the truncated estimate IC T deviates strongly from the center as N grows, an expected effect due to the choice of truncation level. The chosen threshold works well for N = 1000 but not when N grows. We expect this effect to disappear once the optimal choice for the threshold r h is established. Finally, U N is the parameter which controls the frequency for our spectral estimate C 12 N (U N ). In view of the form (4.11) for U N we can find a constant to multiply which will give us the optimal choice for U N . The results will still hold.