Bayesian Inference on Volatility in the Presence of Infinite Jump Activity and Microstructure Noise

Volatility estimation based on high-frequency data is key to accurately measure and control the risk of financial assets. A L\'{e}vy process with infinite jump activity and microstructure noise is considered one of the simplest, yet accurate enough, models for financial data at high-frequency. Utilizing this model, we propose a"purposely misspecified"posterior of the volatility obtained by ignoring the jump-component of the process. The misspecified posterior is further corrected by a simple estimate of the location shift and re-scaling of the log likelihood. Our main result establishes a Bernstein-von Mises (BvM) theorem, which states that the proposed adjusted posterior is asymptotically Gaussian, centered at a consistent estimator, and with variance equal to the inverse of the Fisher information. In the absence of microstructure noise, our approach can be extended to inferences of the integrated variance of a general It\^o semimartingale. Simulations are provided to demonstrate the accuracy of the resulting credible intervals, and the frequentist properties of the approximate Bayesian inference based on the adjusted posterior.


Introduction
In the past decade, jumps have played an increasingly important role in asset price modeling. The necessity of jumps is supported by both empirical and realistic considerations such as (i) sudden and relatively large changes observed in real stock prices; (ii) the implied volatility smile phenomenon, which is more pronounced for short maturity options; and (iii) the proper management of risk [43,9]. When jumps were first incorporated in the literature (e.g., Merton's model) the attention was centered on finite-jump activity models (i.e., those exhibiting finite jumps in finite time intervals). However, infinite-activity models are now considered more realistic as suggested by many studies based on real asset returns [4,36,42,47,46]. Here we consider a one-dimensional Lévy process X = {X t } t≥0 defined on some probability space (Ω, F, (F t ) t≥0 , P ) over a fixed time horizon t ∈ [0, T ], which is a fundamental and widely-used tool to model jump processes with infinite activity. Concretely, where μ ∈ R and θ ∈ [0, ∞) are the drift and the variance parameters, respectively, W = {W t } t≥0 is a Wiener process, and J = {J t } t≥0 is an independent pure-jump Lévy process. In financial applications, X t typically represents the log-return or log-price process log(S t /S 0 ) of an asset with price process {S t } t≥0 . In that case, the parameter σ = θ 1/2 is called the volatility of the process and contributes to the total "variability" of the process X. Constant volatility can be generalized to a general Itô semimartingale (see § 7). Further details about the model and its components are given in § 2.
With improvements in computational power and the advent of electronicbased financial markets, intraday high-frequency data (every minute, second, or even nanosecond) has become widely available. While exploiting the convenience of massive data, analyses must also deal with market microstructure frictions (e.g., serial autocorrelation, price discreteness, and temporary demand-supply imbalance) caused by the nature of trading at high frequency. In an attempt to explain the nature of tick-by-tick data, [51] and [48] suggested the concept of microstructure noise, in which the observed transaction log-price Y t at time t is a noisy measure of an underlying "efficient" log-price X t : Our purpose is to estimate the variance parameter θ based on high-frequency sampling observations Y t0 , Y t1 , . . . , Y tn (0 = t 0 < · · · < t n = T ) of the process over a fixed period of time [0, T ]. From the perspective of frequentist point estimation, when there is no microstructure noise, [37] proposed a consistent estimator by eliminating those increments of the process, Δ i Y := Y ti −Y ti−1 , which are larger in absolute value than a suitably chosen threshold. The asymptotic efficiency of the estimator with the restriction of a bounded variation jump process J is proved later in [10]. For a jump component of unbounded variation, there exists a rate efficient estimator introduced by [27] based on the empirical characteristic function. [33] also introduced a closely related estimator and established a central limit theorem for its estimator. When the microstructure noise is taken into account but jumps are not present, several estimators have been proposed. The two-scale estimator in [48] considered two different estimation scales of the process to estimate and eliminate the effect of the noise. It was generalized by [49] to achieve the optimal convergence rate n −1/4 . The preaveraging approach in [25] replaced the increments Δ i Y with a weighted summation over a small window. The realized kernel (RV) method in [2] utilized the weighted realized autocovariances. A quasi-maximum likelihood estimator (QMLE) approach is proposed by [1,18,45]. The stochastic volatility is first misspecified as locally constant, independent with the previous state, or even constant, which allows construction of a quasi-likelihood and derivation of the maximum likelihood estimator (MLE). [11] showed that the estimator is robust to a microstructure noise following an MA(∞) process, and proposed a tuning-free procedure to select the order of noise using the Akaike information criterion (AIC). Then, [8] achieved efficient asymptotic variance reduction for non-constant volatility by aggregating a local version of an RV estimator or QMLE estimator. [19] and [34] propose bootstrap methods to approximate the distribution of realized volatility. A second-order refinement is achieved over the limiting normal approximation. When both noise and jumps are present, [40,41] introduced the modulated bipower variation estimator using the bipower variation of the weighted average of the increments. The estimator is consistent, but cannot achieve the efficient convergence rate n −1/4 , which represents the best rate that can be achieved for this estimation problem in the presence of noise and jumps. [7] proposed two quantile-based realized volatility estimators by employing empirical quantiles of the averaged returns. The estimators are both consistent and asymptotically efficient, but only applicable for processes with finite jumps. More recently, [30] combined the preaveraging method of [25] and the thresholding ideas of [37] to construct a consistent estimator of the integrated variance that is robust to both noise and infinite jump activity. The estimator proposed by [5] is the same as [30] under our settings. [5] considers the estimation of a functional of volatility under a general setting where the interaction of the noise and the underlying process is not merely additive. Consistency and efficiency are also proved. The details of this estimator are explained in § 6.
Whereas there are numerous frequentist estimators available, the development of an explicit and efficient Bayesian approach which can accommodate high-frequency data remains a largely open problem. Genuine Bayesian inference for a parameter of interest can only be based on the corresponding marginal posterior distribution, which is a conditional distribution on the parameter space of the interest parameter, and the conditioning is on the observed data. A fully Bayesian approach requires that the joint posterior of all parameters must be constructed based on the full likelihood function and a joint prior distribution over all the parameters. Then, integrating the joint posterior over the nuisance parameters (in our case, the parameters related to the jump component J and microstructure noise ε) yields a marginal posterior distribution for the parameter of interest. Since analytic derivation of the joint or marginal posterior is often intractable, Markov Chain Monte Carlo (MCMC) methods are typically used to sample from the joint posterior, and then numerical integration over the nuisance parameters is achieved by simply ignoring the corresponding MCMC output for those parameters. MCMC-based Bayesian methods have been applied to the volatility estimation problem by several previous authors. [6] and [13] used MCMC for a diffusion process augmented by a Poisson jump process. More recently, additional model complexity has been accommodated by taking infinite activity into consideration. [47] proposed an MCMC estimation method using both spot and option prices. Their jumps are assumed to follow either a variance gamma process or an α-stable process. [28] developed an automated sequential Monte Carlo algorithm by adding an additional re-sampling step for variance gamma jumps. [22] applied a slice sampling approach with a similar variance gamma assumption. [20] incorporated realized variation and realized power variation into an MCMC procedure, and analyzed a generalized variance gamma process. [46] considered both returns and the Chicago Board of Options Exchange (CBOE) Volatility Index (VIX) to obtain the posterior for the jump part. The variance gamma process and normal inverse gamma process were considered.
Although the papers mentioned above considered Bayesian inference derived from the joint posterior, they all require strong assumptions about the structure of the jumps, which severely limit the practical value of these methods. Without these simplifying assumptions, it is quite challenging to write down the full likelihood function under the semi-parametric setting (1), which means that it is also difficult to obtain the full joint posterior without such assumptions. One of these assumptions is the choice of a particular specification of the jump process J, among many possible jump processes. However, empirical results in [36], [47], and [35] suggested that different jump assumptions lead to different estimation results for volatility. The posterior depends heavily on the structure of the jumps. Thus, sticking to just one jump type increases the possibility of misspecification and, therefore, can lead to inaccurate estimation and inference.
Moreover, specifying and calculating the distribution of the jump component may incur heavy computational costs, especially when working with highfrequency data. For this reason, nearly all of the aforementioned studies consider only daily returns data. Some authors, such as [28] and [20], did apply their methods to hourly data and 5-minute data, respectively. However, both studies fixed one of the parameters of the jump process as constant, in order to reduce the computational load.
The difficulties of deriving the posterior and the associated heavy computational costs are primarily caused by the jumps, which are only related to the nuisance parameters in the present context of inference for volatility. Our target of estimation, the variance or volatility, is not affected by the jumps, and is modeled by a simple Gaussian process, for which Bayesian inference can be more easily obtained. Based on this observation, one plausible idea to tackle the problem is to ignore the nuisance parameters in the nonparametric part of the process, replace the nuisance parameters in the parametric part by their consistent estimators, and construct a posterior only for the parameter of interest. The advantages of such an approach are that one need not specify a prior on the jump process, and it is not necessary to obtain samples from the full joint posterior. By contrast, we will directly obtain an approximation to the marginal posterior for the volatility, which we will show can be used for accurate Bayesian inference. This approach was recently used by [38]. They derived a 'purposely misspecified' posterior for a jump-diffusion model with constant volatility, finite jump activity and without microstructure noise, which targets the parameter of interest, the volatility, directly. Using a misspecified model on purpose, the inherent difficulty of specifying the likelihood function in a nonparametric model is tackled by omitting the complicated nuisance component of the model. The bias and the inaccurate variance caused by the misspecification are later corrected by applying a location shift and rescaling the likelihood using a Gibbs posterior. They showed that the adjusted posterior possesses good asymptotic properties, as guaranteed by a Bernstein-von Mises theorem.
In this paper, we study a 'purposely misspecified' posterior for the variance θ of the model (2), either with or without microstructure noise, which is a considerably more difficult and realistic setting in comparison to the finite jump activity model without microstructure noise that was studied by [38]. Our main result is a Bernstein-von Mises Theorem for the adjusted posterior for the volatility parameter, which shows that the proposed posterior is asymptotically normal and centered at a consistent estimator, and with variance shrinking at rates n −1/2 and n −1 , respectively, depending on whether a microstructure noise is incorporated or not in the model.
The novel contributions of this paper can be summarized as follows. First, we allow the jump process to be any Lévy process, i.e., there is no parametric assumption about the nuisance component, and no assumption of finite jump activity. We also allow for an additive microstructure noise in the data. These relaxations of the stronger assumptions made in the existing literature help to alleviate inaccuracies introduced by model misinterpretation, and also avoid expensive computational costs. In fact, we also show that in the situations when the microstructure noise can be ignored (e.g., when working with medium-range frequencies), our approach can be extended to the estimation of the integrated variance of a general Itô semimartingale X. In particular, we allow stochastic volatility and a general pure-jump semimartingale component J.
It is important to remark that our proposed inference procedure is among the first Bayesian approaches that can accommodate truly high-frequency data; due to high computational costs and the lack of theoretical performance guarantees, most of the existing literature involves methods which are only applicable to low frequency data, such as daily observations. Finally, our results suggest that, under certain circumstances, misspecification on purpose can serve as a vehicle for accurate approximate Bayesian inference about low-dimensional interest parameters in complex models with possibly infinite-dimensional nuisance parameters.
The paper is organized as follows. A detailed description of the setting and model are provided in § 2. Differences between finite and infinite activity when deriving the 'purposely misspecified' posterior are highlighted in § 3. This analysis reveals the importance of proposing a modified version of the Bernstein-von Mises theorem, which is stated in § 4. The misspecified model is presented in § 5, and further extended in § 7. The main results are stated in § 5.2 and § 6. Simulation results given in § 8 illustrate the performance of our procedures. Discussion and concluding remarks are in § 9. The proofs and further technical details appear in the Appendix.

Model setup
As mentioned in the introduction, we consider a one-dimensional continuoustime process of the form (1), X = {X t ; t ∈ [0, T ]}, defined on some probability space (Ω, F, (F t ) t≥0 , P ). It consists of constant drift and diffusion coefficients μ ∈ R and θ ∈ R + , respectively, as well as a pure jump part J = {J t } t≥0 . The parameter space for θ, denoted as Θ, is assumed to be a bounded and open subset of (0, +∞) such that 0 / ∈Θ. The jump component J is assumed to be a pure jump Lévy process, which is used in many fields of science. In mathematical finance, a Lévy process is widely recognized to provide a better fit to intraday returns than plain Brownian motion or even some stochastic volatility models. A comprehensive overview of the applications of Lévy processes can be found in [3] and [9]. A Lévy processes is defined as a càdlàg, real valued stochastic process which has independent and stationary increments, and is stochastically continuous. It is known that a Lévy process X takes the general form (1) with J defined as x μ(dx, ds), where μ is a Poisson random measure on R + ×R\{0} with mean measure ν(dx)dt such that R\{0} (|x| 2 ∧1)ν(dx) < ∞. This is the so-called Lévy-Itô decomposition of X and ν is called the Lévy measure of X.
We also consider the possibility that the observations of the process may be contaminated by random errors. Specifically, we assume that our observations take the form with equally-spaced discrete times 0 = t 0 < t 1 < . . . < t n = T such that t j − t j−1 = Δ n = T/n. To summarize, the data is assumed to be generated by the model (2)-(4) with true volatility value θ * , which is the target to be estimated. The Lévy model with microstructure noise (2) is considered one of the simplest models for financial data at high-frequency (see [15] for an empirical assessment of the model and [39] for a survey of on parametric inference of Lévy models). Constant volatility is a strong assumption, but, as shown in §7, it can be relaxed to stochastic volatility when the microstructure noise can be ignored. The process Y satisfies the following assumptions (see Remark 2.1 below for further comments about these): 1. The microstructure noise components, ε = {ε tj } n j=1 , are independent and identically distributed (i.i.d.), and follow a N (0, σ 2 ε ) distribution. In the Bayesian framework, we assume the i.i.d. property holds conditionally on the unknown parameter σ ε . 2. The processes ε and X are independent.
1. [21] suggested that the independence assumption for ε and X is reasonable for moderate intraday frequency (e.g. 1 minute). The i.i.d. normal assumption is used in §5.1 in order to give an explicit representation of the likelihood function, which will allow us to prove a local asymptotic normality (LAN) property. It may be possible to relax it as in the quasilikelihood method of [1], but this is beyond the scope of this work. 2. For a Lévy process, the Blumenthal-Getoor index α controls the small jump activity of the process: it becomes larger as the small jumps are more persistent. The assumption that α < 1 means that the paths of the process J are of bounded variation, almost surely. This assumption is widely used in the literature (see, e.g., [7], [10], [27], and [30]), and is used later in §6 to apply a central limit theorem (CLT) for the realized quadratic threshold estimator of the volatility. When α ≥ 1, [27] concluded that there is no CLT in general for such an estimator and its rate of convergence to the integrated variance is much slower than n −1/2 . The characteristicfunction-based estimator [27] fills this gap and motivates scenario 2 of Assumption (JD). The restrictions on α with the Lévy measure therein are in fact inherited from [27] (see also [33]). The estimator is actually robust with α = 1, but the asymptotic variance for the estimator when α = 1 is not the same as when α = 1 (see Remark 4 of [27]). Therefore, for consistency and simplicity, we omit the scenario when α = 1. 3. In the absence of microstructure noise, we can accommodate a stochastic volatility model and much more general pure-jump semimartingales J (see § 7). We also don't require Assumption (JF).
For future reference, recall the following common notation for the increments and jumps of an arbitrary continuous-time càdlàg process {U t } t≥0 :

Comparison with finite jump activity models
In this section, we present a motivating example using a simple finite jump activity model, in order to illustrate the usefulness of the approximate Bayesian inference obtained via purposeful misspecification. [38] proposed this approach, but did not make comparisons to the true marginal posterior for the volatility parameter. We then explain what issues will arise when considering the more complicated and realistic setting of infinite jump activity.

An illustration through simulation
We first empirically compare the "purposely misspecified" posterior from [38] with a genuine marginal posterior derived from the full joint posterior. The goals of this comparison are to assess the accuracy of the former method in a situation where the full joint posterior and genuine marginal posterior are tractable, and also to motivate our proposed procedures. Model (1) is used with a compound Poisson jump process J t = Nt i=0 ξ i . Here, N = {N t } t≥0 is a Poisson process with rate λ, and {ξ i } i≥1 are i.i.d. random variables independent of N and W . We assume that {ξ i } i≥1 follow a uniform distribution U (−1, 1). This assumption enables us to derive a joint posterior and perform Gibbs sampling for the parameters Θ = (μ, θ, λ). The other parameter values are taken from [38]: λ = 5, μ = 1, θ = 10, n = 5000, and T = 1. In what follows, we approximate the Poisson process by a Bernoulli process; namely, N is assumed to be a point process such that The joint posterior density based on the data X X X (n) = (Δ 1 X, . . . , Δ n X) can then be written as The priors chosen for μ, θ, λ are a standard Gaussian distribution, an inverse gamma distribution, and a beta distribution, respectively. The posterior for θ is estimated by two methods: (i) Gibbs sampling from the full joint posterior, followed by numerical integration to yield the marginal posterior (i.e., we simply ignore the MCMC output for the nuisance parameters μ and λ); and (ii) a direct posterior for θ obtained by misspecification on purpose. We emphasize that the Gibbs sampling approach, which is exact modulo finite simulation error, is only available here because of the very strong assumptions made regarding the jump process. This method is not available for the more complicated and realistic settings we consider in this paper, whereas the second method works quite well under those settings (as shown later). The direct posterior proposed by [38] is an approximation using a misspecified model to directly obtain a posterior for θ without the need first to obtain the full joint posterior, and then marginalize.   similarity of the direct posterior to the genuine marginal posterior, and also the corresponding credible intervals, demonstrates the accuracy of the 'purposely misspecified' posterior, and therefore, its validity for approximate Bayesian inference.
In general, it is quite complicated to perform fully Bayesian analysis for infinite jump activity models based on high-frequency data because the joint posterior is analytically intractable, and even MCMC-based procedures can be computationally demanding and may require additional assumptions about the underlying process in order to possess good properties. These additional assumptions can limit the flexibility of the analysis or lead to greater risk of misspecification. Moreover, when inference is only required for a low-dimensional interest parameter, it may be wasteful or cumbersome to construct the computationallydemanding full joint posterior. To perform MCMC sampling from the joint posterior, some studies (e.g., [36]) consider the unobserved jump increments Δ i J as a latent parameter. However, with high-frequency data, this may cause numerical difficulties. Indeed, most of the previous Bayesian studies on high-frequency data that incorporate an infinite jump activity component in the model impose strong parametric restrictions in order to conduct MCMC sampling. [38] applied their purposely misspecified approach to the simpler model setting of an uncontaminated jump-diffusion model with constant volatility and finite jump activity. They first constructed a misspecified model by omitting the jump part J. Under this misspecified model, the resulting misspecified posterior was shown to be asymptotically normal conditionally on a given path of J. Since the result works for all possible J, it can be generalized to a version that does not depend on J. Even though such an asymptotic normality does hold for a suitably centered and scaled misspecified posterior for the volatility, the misspecification of the model has the adverse effect of causing this misspecified posterior to center in the wrong place and to have an inefficient variance compared to the true marginal posterior obtained by marginalizing the full joint posterior over the drift and jump parts of the model. Therefore, [38] proposed to correct for the bias and inefficiency of the misspecified posterior by, respectively, shifting the center by an estimate of the bias, and rescaling the log likelihood using a properly chosen temperature parameter. Since the Bernstein-von Mises theorem involves convergence in total variation norm, and this norm is invariant with respect to location shifts, the resulting corrected posterior for volatility still admits a Bernstein-von Mises theorem but with a correct center and efficient variance equal to the Cramér-Rao lower bound.

Theoretical challenges
In a model with infinite jump activity, we can similarly ignore the jump part and consider a misspecified model, but it is impossible to conclude an unconditional Bernstein-Von Mises theorem from the analogous result for the conditional posterior given a fixed path of the jump process J. The main reason is that for a jump process with infinite activity, the realized quadratic variation [J] n := n i=1 Δ i J 2 does not converge to the quadratic variation [J] := 0≤s<T (ΔJ s ) 2 for almost every path of J (i.e., a.s. convergence does not hold but merely convergence in probability). Almost sure convergence is necessary to prove local asymptotic normality of the likelihood and an optimal convergence rate for the posterior mean. The latter two conditions are required to apply a Bernstein-von Mises theorem under misspecification established in [31], which is the main tool behind the result of [38].
For a general semimartingale, it is well-known that [J] n does converge to [J] in probability (cf., [26]). Furthermore, for Lévy processes, a rather good rate of convergence of O p (n −1/2 ) can be obtained (see Lemma A.2). We find that this weaker convergence is enough to prove the desired properties of the posterior by applying an unconditional version of the Bernstein-Von Mises theorem and skipping the intermediate results under the conditional probability measure given the jump part. In addition to the issues created by the presence of infinite jump activity, the parametric part is also affected by the presence of the noise ε. To deal with this, we treat the variance of the noise, σ 2 ε , as an additional nuisance parameter and prove a semiparametric type of Bernstein-von Mises theorem under misspecification. The adjusted posterior for volatility, and the associated Bernstein-von Mises theorem, must now include corrections for deliberately ignoring the presence of microstructure noise.

A semiparametric version of the misspecified BvM Theorem
As explained in the previous section, the misspecified Bernstein-von-Mises Theorem of [31] plays a crucial role in proving the asymptotic properties of the purposely misspecified posterior. To accommodate the more complicated setting of our model, the result needs to be generalized to a semiparametric version, which is stated next.
the Skorokhod space of all càdlàg R-valued functions) and a collection of semiparametric models on Ω (n) , Suppose there are purposely misspecified models for X (n) denoted asP ϑ (·) := P (·|ϑ), ϑ ∈ Θ , which are distributions on R n parameterized by ϑ with densities p ϑ . Let Π be a prior distribution with a density π that is continuous and positive on Θ . Define the misspecified posterior distribution based on Π andP ϑ (·) as Assume {P ϑ , ϑ ∈ Θ } satisfy a stochastic local asymptotic normality (LAN) condition relative to a given sequence δ n → 0 as norming rate, i.e. there exist some random quantities Δ n, and V n such that for every compact set K ∈ R and > 0, Also, for any sequence of constants M n → ∞, the posterior Π n is assumed to satisfy Then, Π n converges to a sequence of normal distributions in total variation: The proof of the above result follows the original proof in [31]. The main modifications are changing the almost sure convergence to convergence in probability, and adding a nuisance parameter which does not affect the proof. The nuisance parameters, both in the parametric and nonparametric parts, can be omitted on purpose by using a misspecified modelP . Thus, the theorem can be used with high flexibility. The theorem has two main assumptions: the LAN property (6), which defines the local smoothness of the model around a given point, and the posterior concentration property (7), which, in particular, determines the rate of convergence of the posterior distribution. Sufficient conditions for (7) can be found in Section 3 of [31]. Under these assumptions, we conclude that the posterior distribution of the parameter of interest can be approximated by a normal distribution. As the sample size grows, the posterior shrinks to a point which minimizes the Kullback-Leibler divergence within the model. It shares the same consistency property as the random quantity Δ n in the LAN assumption. We will see later that Δ n can be taken as the MLE of the misspecified model.
for all ζ, > 0. This is weaker than the misspecified Bernstein-von-Mises Theorem in [31] when applying their theorem with P

The misspecified model
Our methodology starts with a misspecified model ignoring the drift and the jump component. Namely, Y t is assumed to follow This means that we first misinterpret the increments of the underlying process X as independent Gaussian variables, with mean zero, and variance θ. Under the misspecified model (8), our target of estimation is still θ, but what it represents changes because of the misspecification. In the absence of jumps, θ measures the total variation of the underlying process X per unit time and, hence, it can be efficiently estimated by the scaled realized quadratic variation, which coincides with the MLE of the parameter θ in the underlying misspecified model However, under the model X t = θ 1/2 W t + J t , θ merely controls the variation of the continuous component and, in the infill limit, the realized quadratic variation (9) will aggregate both the true volatility, θ * , and the scaled variation introduced by the jump process J, T −1 [J], recalling that [J] = s≤T (ΔY s ) 2 . Throughout, this total variation is denoted as which takes values on the random parameter domain For any sample path of J, Θ is an open set in (0, +∞), and 0 / ∈Θ . Furthermore, there exists some deterministic constant δ 0 > 0 such that Θ ⊂ (δ 0 , +∞) and, hence, Θ ⊂ (δ 0 , +∞).
In § 5.1, we explicitly write the misspecified likelihood function and the corresponding MLE for θ under the model (8). Bayesian inference under this misspecified model is proposed in § 5.2. We will show that, given that the data Y is misinterpreted by the model (8), the posterior of θ can be approximated by a normal distribution. Further extensions are subsequently considered.

Misspecified likelihood function and MLE
Let us first note that, because of the presence of the noise ε, the increments To deal with the dependence and write an explicit likelihood function, we follow [17] (see also [1]) and transform the observed data and P n is a symmetric orthogonal matrix with entries p n ij := 2 n + 1 sin ijπ n + 1 , i,j = 1, 2, . . . , n. [17] showed that, under the misspecified model (8), R j is Gaussian distributed, with mean zero, and variance equal to For future reference, note that under the true model (2), the conditional distribution of R j given J is Based on these Gaussian variables, the likelihood function of the parameters θ and σ 2 ε given the data {Δ j Y } j≤n can be explicitly written under the misspecified model. However, note that only θ is the parameter of interest, while σ 2 ε is merely a nuisance parameter. Instead of writing the likelihood function based on λ n j (θ) and maximizing it over a two dimensional space, we replace the nuisance parameter, σ 2 ε , with its consistent estimatorσ 2 ε = 1 2n n j=1 Δ j Y 2 , and thereby obtain a pseudo-likelihood function for θ. The properties ofσ 2 ε and the rationale of the replacement are further demonstrated in Lemmas B.3 and B.5. Then, it is natural to consider the following misspecified log likelihood functioñ l n of θ given the data ΔY 1 , ΔY 2 , . . . , ΔY n : where we set λ j (θ, x) := θ n + 2x 1 − cos jπ n+1 . The corresponding MLEθ n is the root of the score functioṅ We further assume that the MLEθ n is unique.
Remark 5.1. The misspecified likelihood function (12) can be simplified and directly applied to a model without microstructure noise (i.e., Y = X in (2)) by taking σ 2 ε = 0 andσ 2 ε = 0. Then, In this case, the MLE can be obtained in closed form as Thus, the misspecified model is consistent with the one in [38] and, hence, the model with finite jump activity can be viewed as a special case of our results.

Bernstein-von Mises Theorems
We assume that the prior distribution Π for θ possesses a continuous and positive density π on (δ 0 , +∞). Denote P * as the distribution of the process {Y t } t≥0 under the true model (2), and E * as the corresponding expectation. Based on the prior Π and the likelihood function (12), we introduce the Gibbs posterior Π n with temperature parameters κ n ( [50,29]) as where A is a Borel set of R + . The Gibbs posterior increases the flexibility of the Bayesian procedure, which allows us to further correct for misspecification.
Specifically, the type of misspecification we deliberately utilize is to assume a model with only the interest parameter, ignoring the high-dimensional nuisance parameter. This causes the posterior for volatility to contract too quickly (relative to the correct, high-dimensional parameter model), making the Bayes estimator (e.g., the posterior mean) superefficient. Rescaling the likelihood flattens out the likelihood and also the posterior, slowing down the contraction of the posterior. Choosing the temperature parameter optimally will make the posterior contract at the efficient rate established by frequentist asymptotic analysis. We assume that κ n converges in probability to a random variable κ † as n → ∞ under the true measure P * . Note that κ n may be data-dependent, and therefore it is possible that the random variable κ † also depends on the data under P * .
Our main result states that, as the sample size n increases, the misspecified posterior based on prior Π and the misinterpreted data {ΔY i } will be approximately normal and centered at the MLEθ n obtained from the misspecified likelihood (12) under the true measure P * . The asymptotic variance is equal to the temperature parameter κ † times the inverse of the Fisher information of the misspecified likelihood. We establish our results in two broad settings. The first result covers situations where the microstructure noise can be ignored. This is the case when, for instance, with intermediate frequencies such as 5-minute or daily observations. In that setting, our procedure achieves the standard n −1 convergence rate. The second result covers the more realistic setting where the microstructure noise is explicitly incorporated. This is needed when working with ultra-high frequencies, and comes at the cost of a slower n −1/2 rate. Theorem 5.1. Suppose that the data Y t0 , . . . , Y tn is generated according to (2)-(4) with ε t ≡ 0. Then, the misspecified posterior defined in (16) withl n given as in (14) and κ n P * → κ † , for some positive r.v. κ † , can be approximated by a normal distribution in the sense that where T V represents the total variation distance,θ n is the MLE (15), and θ † is defined in (10).
Proofs of these theorems are given in the Appendix.

Remark 5.2.
It is worth noting that Theorem 5.1 holds without any restriction on the Blumenthal-Getoor index α. In fact, this result holds for a large class of pure-jump semimartingales J and even quite general stochastic volatility models (see §7). The restrictions on α and ν stated in Assumption (JD) are only needed when correcting the posterior as shown in the following section.

Correcting for misspecification
The main conclusions of Theorems 5.1 and 5.2, namely, as n → ∞, where V asy n 2β = 8κ † θ †3/2 σ ε n −1/2 1 σε =0 + 2κ † θ †2 n −1 1 σε=0 , state that the misspecified posterior Π n is approximately normally distributed, and centered at θ n , which is a biased estimator of θ * in the presence of jumps. Furthermore, the asymptotic variance may not be the most efficient either since we ignored the drift and the jump components on purpose. To adjust the bias and variance, what we need is a consistent estimator for the true parameter θ * , which admits a feasible central limit theorem with the right rate of convergence. In what follows, we will first propose a general correction procedure and the corresponding Bernstein-von Mises theorem for any estimator with these two properties. Concrete instances of these estimators for both the no-noise and the general cases are presented thereafter. Suppose we have an estimatorθ n of θ * such that where, in accordance with Theorems 5.1 and 5.2, the rate of convergence β is − 1 4 when σ ε = 0, and − 1 2 when σ ε = 0. Our goal is to adjust the posterior so that it centers atθ n and matches the asymptotic variance ofθ n . For the center, we simply shift the posterior by the right amount, while for the asymptotic variance, we adjust the temperature parameter. Concretely, define the estimator The notation [J] n comes from the fact that this is a consistent estimator for the quadratic variation of the jump component J, because, as shown in the Appendix (see (30) and (60)),θ n converges to θ † = θ * + T −1 [J] andθ n is a consistent estimator of θ * by construction. We will then adjust the location of the posterior by subtracting T −1 [J] n (this operation will necessarily center the posterior atθ n − T −1 [J] n =θ n ). To adjust the variance, we adopt a sequence of temperature parameters and its limit of the form: whereV n andV asy,n are suitable consistent estimators of V and V asy , respectively. The choice of these estimators will be specified below in § 6.1- § 6.2. Finally, we can define the adjusted misspecified posterior Π n as one having the density function π n (ϑ) = π n ϑ + T where π n is the misspecified posterior obtained in Theorems 5.1 and 5.2 with κ n and κ † defined in (19). Asymptotic normality of the adjusted posterior is established by the following result.  (19), the adjusted posterior Π n defined above can be approximated by a normal distribution in the sense that, A location shift in Theorem 5.1 or Theorem 5.2 with κ n defined in (19) gives us the proof of Theorem 6.1.
This theorem illustrates that any type of 1 − α credible interval (CI B,α ) of Π n is asymptotically the same as a 1 − α confidence interval for θ * based on N (θ n , V n 2β ). The upper and lower bounds of the CI B,α can then be approximated byθ n ± √ V n 2β z α/2 as n → ∞, where z α/2 is the α/2 quantile of the standard normal distribution. Sinceθ n satisfies a central limit theorem with asymptotic variance V , we have that Therefore, the 1 − α credible interval has approximately the correct repeated sampling coverage under P * , which indicates frequentist validity of the Bayesian inference based on the adjusted posterior.

Correction for a model without microstructure noise
When the variance σ 2 ε of the noise is 0 and (JD)-1 holds, we can use the thresholded realized quadratic variation of [37], to correct the misspecified posterior. Above, η n is a threshold proportional to n −w for some suitable exponent w. Consistency ofθ n is established in [37] for any w ∈ (0, 1/2) when J consists of the superposition of a general finitejump activity process and an independent Lévy process. [10] showed thatθ n satisfies a central limit theorem (CLT) with asymptotic variance 2θ * 2 n −1 under Assumption (JD)-1 provided that w ∈ 1 4−2α , 1 2 .
Under Assumption (JD)-2, we can adopt the estimator proposed by [27] (see also [33] for a closely related estimator) based on the empirical characteristic function. The corresponding CLT is established in Theorem 5 of [27] with asymptotic variance 2θ * 2 n −1 and rate n −1/2 . The data is divided into k n nonoverlapping blocks, each of length v n . In addition, we also need a scaling sequence u n . It is then assumed that k n and u n satisfy for any > 0 as n → ∞. The estimator is defined aŝ where the notation a defines the largest integer that is smaller than a, and ζ can be taken as any fixed value larger than 1. In fact, according to Theorem 3 of [27], the intermediate statisticĈ(u) n is a consistent and efficient estimator when r < 1. It is asymptotically equivalent to the threshold estimator (22), and therefore can be used under Assumption (JD)-1 instead of (22). With the estimatorθ n described in different scenarios as above, we apply Theorem 6.1 with β = −1/2, V = 2θ * 2 , and the temperature parameters By Slutsky's Theorem, κ n → κ † in P * -probability. We then obtain the following.

Corollary 6.2.
Using the same conditions as in Theorem 5.1 except for the temperature parameter κ n defined as in (24), and assuming (JD), the adjusted posterior Π n with density (20) can be approximated by a normal distribution in the sense that, Remark 6.1. As we will show in §7 below, the result above also holds for stochastic volatility models and more general pure-jump processes J.

Correction for the general model
When the variance of the noise, σ 2 ε , is not zero (the noise is present), one possible solution is to adopt the estimatorΣ n proposed in [30] (see also [5]), which combines the thresholding approach of [37] with the pre-averaging method of [25] (see also [26] for a detailed exposition of the theory). The pre-averaging method is used to mitigate the effect of the noise ε. Utilizing this method, we formulate several overlapping blocks of increments, and calculate proxies of the increments of the uncontaminated process X by taking the weighted average of the increments of Y within each block. Then, the estimator is defined as the sum of the squares of those new quasi-increments that are less than some threshold, and is further debiased using an estimator of the variance of the noise. This estimator meets our requirements, when we include both infinitely many jumps with bounded variation and normally distributed microstructure noise. For completeness, we describe the key aspects of this estimator below.
The estimator depends on two parameters: the length of the block k n and the weight function g. The latter satisfies the following regularity conditions: • g is continuous on [0, 1], piecewise C 1 with a piecewise Lipschitz derivative g , and • g(0) = g(1) = 0, andḡ = 1 0 g 2 (s) ds < ∞. One simple and common choice is g(s) = s ∧ (1 − s). Next, for some constant c, let k n = cn 1/2 , c 1 = cḡ, c 2 = 1 0 (g (s)) 2 ds/c, and also definê where we recall thatσ 2 ε = 1 2n n j=1 Δ j Y 2 and the threshold u n satisfies u n n w1 → 0, u n n w2 → ∞, as n → ∞, for some 0 ≤ w 1 < w 2 < 1/4 and w 1 > 1/(8 − 4β). The estimator is consistent and admits a central limit theorem. More specifically, by Theorems 1 and 3 in [30], (17) holds withθ n =Σ n , β = −1/4, and The temperature parameters in (19) can be defined as The convergence of κ n to κ † can be established through the consistency ofΣ n andσ 2 ε for θ * and σ 2 ε , respectively, as well as the property that when X n = O P * (1) and Y n P * → 0, then X n Y n P * → 0. Then, we have the following corollary of Theorem 6.1.

Corollary 6.3.
With the same conditions as in Theorem 5.2 and with the temperature parameter κ n defined as in (25), the adjusted posterior Π n defined above can be approximated by a normal distribution in the sense that,

Extension to more general semimartingales without noise
Thus far, we have assumed constant parameters for both the drift and diffusion components and a Lévy process for the jump component J. In this section, we show that, in fact, when the microstructure noise can be ignored, the purposely misspecified posterior approach can also be applied to stochastic volatility models and more general jump processes J. As mentioned before, it is generally believed that the microstructure noise is relatively negligible when using medium range frequencies such as 5-minute or daily observations. We consider the model where W is a Wiener process, J is a suitable pure-jump semimartingale, and β = {β t } t≥0 and σ = {σ t } t≥0 are càdlàg adapted processes. The parameter of interest is the scaled integrated variance We again use the misspecified model (8) for X with ε = 0. The corresponding log likelihood function would then be the same as in Remark 5.1 with MLẼ An analysis of the proof of where W is a Wiener process independent of W and μ is a Poisson random measure on R + ×R with predictable compensator ν(ds, dx) = dsdx, independent of (W, W ). The coefficients of σ and J (including δ : Ω × R + × R → R\{0} and δ : Ω × R + × R → R\{0}) are random processes satisfying standard conditions for the integrals therein to be well defined.
As explained in §6, correcting the center and variance of the misspecified posterior Π n requires an estimatorθ n of θ * enjoying a CLT with a rate of n −1/2 . For bounded variation jumps (Assumption (JD)-1), it turns out that the thresholded realized quadratic variation of [37], defined in (22), once again does the job. Specifically, [24] (see Theorems 2.4 and 2.11 therein) establishes a feasible CLT for (22) under the same framework as above.
For the extension to unbounded variation jumps (Assumption (JD)-2), more regularity conditions are required. We summarize them below.

Assumption (JI).
• When α ≥ 3 2 , we assume J is symmetric in the sense that J t and −J t have the same law.
• We have a sequence τ n of stopping times increasing to infinity, a sequence a n of numbers, and a nonnegative Lebesgue-integrable function H on R, such that the processes β,σ, and δ are càdlàg adapted, the coefficients δ,δ are predictable, the processesb,σ are progressively measurable, and t < τ n ⇒ |δ(t, z)| 2 ∧ 1 ≤ a n H(z), Theorem 5 of [27] established a CLT of the characteristic-function-based estimator (23) with robustness under assumption (JI). Then, the misspecified posterior for the unbounded variational jumps can be corrected again by (23).
When the microstructure noise is taken into account, the extension is not as direct as for the no noise case, because after applying an orthonormal transformation to remove the autocovariance introduced by the noise, similar to that at the beginning of Section 5.1, the distribution of the transformed data does not depend anymore only on the target parameter θ * = T −1 T 0 σ 2 t dt. Instead, the variance of each transformed data depends on a weighted sum of the 'volatility' of each increment. Analyzing the transformed data using the same procedure as before can only provide us an estimator of some value larger than the integrated volatility, but not the exact parameter θ * .

Simulation
This section discusses the finite sample performance of the adjusted posterior defined in Theorem 6.1. We aim to show the plausibility of the limit (21) at a large sample size. This is demonstrated by comparing the empirical coverage probability of the credible interval derived from the adjusted posterior and the confidence interval from its corresponding asymptotic normal distribution in the theorem. We also aim to compare the "purposely misspecified" method with the frequentist central limit theorem (CLT) (17).

Infinite jump activity without noise
The jump component is set to be a variance gamma process with c = 0.23, and {B t } t≥0 is an Wiener process independent of the Wiener process W . For the drift and diffusion components, let μ = 0.1 and θ = 0.3.
All parameter values are taken from [37]. For simplicity, we adopt the widelyused threshold η n = n −w , where w ∈ (0, 0.5) and n is the sample size. This is a possible and conventional choice in terms of consistency and efficiency. The threshold η n can also be calibrated using one of the iterative schemes proposed in [16]. These schemes were applied to the same model considered here (i.e., a Lévy process with variance Gamma jump component) and produced good results. For simplicity in what follows we fixed η n = n −w with w = 0.39. For the prior of θ, an inverse gamma distribution is assumed with shape and scale both equal to one. Since the temperature parameters do not affect conjugacy, the misspecified posterior and the adjusted posterior both follow inverse gamma distributions.
First, 5000 equally spaced observations are simulated based on the parameters defined above (sample size n = 5000). The adjusted posterior Π n is constructed as in Corollary 6.2. The results are shown in Figure 2. The adjusted posterior for one possible sample path is plotted as the dashed line and compared with the corresponding asymptotic normal distribution N (θ n , 2θ * 2 n −1 ) (the solid line). These two lines can hardly be distinguished from one another. Moreover, they are both roughly centered at the true volatility 0.3. This true volatility also lies between the dashed vertical lines which correspond to the 95% highest posterior density (HPD) interval of the adjusted posterior. That the adjusted posterior is well-approximated by the asymptotic normal distribution is an illustration of Corollary 6.2. This indicates that the adjusted posterior will be centered at an efficient estimator with optimal variance when the sample size is large enough.
Next, we consider Bayesian point estimators associated with the adjusted direct posteriors for volatility. The biases of the means of two distributions defined in Corollary 6.2 are compared: the mean of the adjusted posterior Π n , and the mean of the asymptotic normal distributionθ n =θ n − T −1 [J] n , which is also the threshold estimator in [37]. We further consider the misspecified posterior adjusted by the latent realized quadratic variation of the jump component The corresponding asymptotic normal distribution has meanθ * n =θ n −T −1 [J] n . The analysis of these four point estimators is based on 1000 simulations. For each simulation, 5000 equally spaced observations are generated and used to calculate the biases.
The distribution of the biases is plotted in Figure 3. The solid line is formed by the biases of the threshold estimator, while the dashed line is formed by the biases of the mean of the adjusted posterior Π n . The dotted line represents the bias ofθ * n . The bias of the mean of the adjusted posterior using the realized quadratic variation is represented by the dashed-dotted line.
The similarity of the solid and the dashed lines as well as the similarity of the dotted and the dashed-dotted lines suggest that the posterior mean and the mean of the asymptotic normal distribution have similar behavior in terms of their difference with the true volatility. The biases are relatively small since the volatility is 0.3 while most of the biases are within ±0.01 of zero. Remark 8.1. It may be possible to improve the accuracy of the adjusted posterior Π n by using a better estimator of the quadratic variation of the jump [J] to correct the misspecified posterior Π n defined in Theorem 5.1. While the dashed and the solid lines have higher probability for the positive values, the dotted and the dashed-dotted lines are more symmetric. This suggests that the right-skewed tendency of our posterior mean might be be due to poor estimates for the jump component.
We next consider frequentist properties of the posterior credible intervals obtained by, respectively, our proposed direct posterior and its large-sample normal approximation established by our theoretical results. Specifically, we study the frequentist accuracy of these credible intervals in the sense of nominal 95% credible intervals achieving the same repeated sampling coverage probability (0.95). This accuracy property is important because, while our Bernstein-von Mises theorems indicate large-sample frequentist validity of the posterior credible intervals (as n → ∞), it is important to assess whether this validity property is approximately true for finite samples, and also if the posterior credible intervals tend to have frequentist coverage which is larger than the nominal level, which could indicate a lack of precision in the approximate Bayesian inference. In addition to assessing frequentist accuracy for these credible intervals, we also include a comparison of the frequentist coverage of nominal 95% frequentist confidence intervals derived using the CLT for the threshold estimator. For notational convenience, in this section, we use "CI" to represent both a credible interval and a confidence interval, with the meaning being clear from the context. For this simulation study, we increase the sample size n to 105,000, which is approximately the sample size corresponding to 5-minute observations during a 1-year time horizon. The empirical coverage probabilities of the 95% CIs based on 1000 repetitions are listed in Table 1. For each repetition, we simulate a sample path with 105,000 observations. Table 1 Empirical coverage probabilities of Bayesian and frequentist interval estimates.

Empirical
coverage Distribution used to obtain 95% interval estimate probability 0.943 Asymptotic normal approximation N (θn, 2θ * 2 n −1 ) from BvM Theorem 0.944 HPD interval based on posteriorπn(θ) = πn(θ + [J] n ) 0.940 Equal-tail credible interval based on posteriorπn(θ) = πn(θ + [J] n ) 0.940 Frequentist CLT for threshold estimator and variance from [37] In our simulations, the HPD interval has the highest coverage probability among all the CIs derived from various distributions defined above. However, all of the empirical coverage probabilities are slightly less than 0.95. Further studies are needed to ascertain whether this undercoverage phenomenon occurs in general, and to investigate the possible causes. Based on preliminary results, we conjecture that using a more refined frequentist estimatorθ n to serve as the center of the posterior, or utilizing a less-misspecified model, may reduce the observed undercoverage. A complete analysis is beyond the scope of this paper.

Remark 8.2.
Another issue which can affect the accuracy of the adjusted posterior or its asymptotic normal approximation is that the MLE is not actually approximating the posterior mean, and the Fisher information is not approximating the posterior variance. Asymptotically the effect of this non-Bayesian centering and scaling is negligible for finite-dimensional parameters in regular models, when considering errors of order smaller than the error incurred in the first-order normal approximation given by the Bernstein-von Mises Theorem (BvM). This is essentially because the BvM Theorem and asymptotic normality of the MLE are first-order approximations having the same order of approximation error. However, [32] show that centering the random parameter at the MLE instead of the posterior mean, and scaling by the Fisher information rather than the posterior standard deviation, can have substantial finite-sample effects on the properties of the centered and scaled posterior (e.g. the posterior cumulants), and can also make the first-order normal approximation given by the Bernstein-von Mises theorem less accurate in finite samples.
It is not necessary to use a conjugate prior as we have done in our simulations. Using certain non-informative priors (e.g. uniform) or an exponential distribution can be easily implemented in the simulations based on the same model. The results for these other priors are comparable with those we have reported for the inverse-gamma prior.

Lévy Model with microstructure noise
In order to illustrate the results in the presence of both infinitely many jumps and microstructure noise, we conduct simulations for the following model from [30]: for i = 1, 2, . . . , n. The jump part J t is a trimmed symmetric β-stable process with β = 0.5. The trimmed process means that after we simulate the increments of all the jumps, the largest 2% of them (ranked by absolute values) will be discarded to match the behavior of high-frequency tick-by-tick data. To allow a comparison, simulations are conducted based on exactly the same parameters described in the paper except one constant c, which determines the length of the preaveraging blocks according to k n = cΔ The choice of c is not clearly stated in [30]. Therefore, we choose the same c = 1/3 as in the original work [25]. The sample size is set as n = 15, 600 and we have Δ n = 1/7800 as in [30].
For the adjusted posterior, the data is divided into two parts. The first half is used to evaluate the estimatorΣ n in [30], which is used in the prior, and the remainder is used to make inference. The prior is chosen to be a truncated normal distribution with lower boundary 0, centered atΣ n obtained using the first half of the data, and standard deviation 0.06. We generate 25,000 MCMC samples and use the last 20,000 samples to construct the adjusted posterior (the first 5000 being discarded as burn-in samples). We repeat the experiment 1000 times, and hence we construct 1000 posteriors, each based on 20,000 MCMC samples.
To compare the point estimators, we compute the average bias and standard error of, respectively, the frequentist estimatorΣ n , and the maximum a posteriori (MAP) point estimator defined asθ MAP = arg max θ Π n (θ). Both the average bias and the standard errors are similar and small, providing some evidence of the accuracy of both the frequentist and Bayesian point estimators. We also consider frequentist accuracy of interval estimates. The empirical coverage probability of the 95% credible interval of the adjusted posterior is slightly better than the that for the confidence interval derived from the CLT of the estimatorΣ n .

Conclusion
In this paper, we consider an infinite activity model with microstructure noise over a fixed time horizon. A "purposely misspecified" posterior is proposed for the volatility, the variation of the diffusion component. We prove that the posterior can be approximated by a normal distribution centered at a suitable estimator with the optimal variance. Simulation experiments illustrate the accuracy and frequentist validity of our proposed approximate Bayesian inference. Compared to [38], we generalize the feature of finitely many jumps to infinite jump activity, propose an extension to handle stochastic volatility and general Itô jump processes, and allow for microstructure noise. Misspecification on purpose is an unusual idea, but we have shown that it can be an effective strategy to directly obtain a posterior on a parameter of interest in this complex model setting for which fully Bayesian inference is currently unavailable or intractable. Our proposal for Bayesian inference on volatility is the first procedure which can provide Bayesian inference on volatility with highfrequency data, while allowing infinite jump activity and microstructure noise. Moreover, our proposed direct posterior completely avoids the need to specify a prior on the complex nuisance component of the model, and does not require computationally-demanding construction of the full joint posterior to obtain the marginal posterior for volatility.
The recentering and rescaling procedure is also highly flexible. Any consistent and efficient estimator can be used as a correction. Furthermore, the variance can be adjusted in response to new information. For example, when the variance of the volatility θ * is known, the temperature parameter can be set to achieve the optimal, efficient variance.

Appendix A: Proof of Theorem 5.1
Before we prove Theorem 5.1, we give some preliminary lemmas regarding the rate of convergence of the realized quadratic variation s . Without loss of generality, we assume T = 1 and let Δ n = 1/n. The first result shows that, for bounded variation processes J, [J] n −[J] = O P (n −1/2 ). The proof is classical and can be found in [44].
The following lemma gives the rate of convergence of the realized quadratic variation of a general Lévy process with nonzero Brownian component (θ > 0). The results is due to [23] (see Theorem 2.6 and Remark 5 therein). Related results for general semimartingales can be found in [24].

Lemma A.2. Let
be the realized quadratic variation of the process X defined in (1) with θ > 0 (i.e., X is a Lévy process). Then, where IV t = θt + s≤t (ΔJ s ) 2 and We shall prove next that the two condition (6) and (7) are satisfied. We start with the condition (7), which requires that, for every sequence of constants M n → ∞,

Q. Wang et al.
Using Markov's inequality, Sinceθ n is the unique MLE, we could approximate the right hand side expectation by the Laplace approximation ( [12]): Since M n → ∞, for condition (7), it suffices to show that, for n → ∞, Sinceθ n is the realized quadratic variation of the Lévy process X, this directly follows from Lemma A.2. Second, the models should satisfy the stochastic local asymptotic normality (LAN) condition (6). That means that, for every > 0, (1).
Using Taylor expansion to approximate the log likelihood and plugging in the first and second derivatives, the left hand side of the LAN condition can be written as Noted that P * (|κ n − κ † | > δ) → 0 when n → ∞ for arbitrary δ > 0 and, from the result obtained in Lemma A.2, the LAN condition holds.

Appendix B: Proof of Theorem 5.2
As in the proof of Theorem 5.1, we apply Theorem 4.1 with As before, there are two conditions that need to be satisfied. The first is the LAN property (6), which will be proved in § B.4. The second condition is (7). By applying the same Markov inequality and Laplace approximation as in the proof of Theorem 5.1, we can conclude that a sufficient condition for (7) is which will be proved in § B.3. Before we give some preliminary lemmas in § B.1.
The following notations are often used throughout the proof: 1. a n b n indicates that there exists a constant C such that |a n | ≤ C|b n | for every n large enough. If a n b n and b n a n , then we write a n b n . 2. To simpify the notation, in what follows, we use p ij to represent p n ij , and λ j to represent λ n j .

B.1. Preliminary lemmas
Without loss of generality, we assume E * [Δ j J] = 0 (otherwise, the drift μ can be redefined to μ + E * [Δ j J]/Δ n . We start we collecting some useful properties of the orthogonal matrix P n defined in § 2.1 (the proof can be found in [44]): Lemma B.1. We have the following relationships: The limiting behavior of the moments of the jump increments will be frequently used later in the proof. We summarize it in the following lemma.
The first statement directly follows from Theorem 4.3 of [14]. The proofs of the second and the third statements can be found in [44].
Recall that σ 2 ε is the variance of the noise, which can be estimated usinĝ The following result states some needed properties ofσ ε . Lemma B.3. Under assumption (JF), we have When H has finite 8th moment, by Markov's inequality, we have It is not hard to see that its kth-derivative M (k) (t) := d k dt k M H (t) takes the form: As it turns out c n, 8 can be expressed as where 8 k=1 km k = 8, m k ∈ N. By (35)- (36), and following a similar procedure as that for proving (35), for any 1 ≤ k ≤ 8, Then, Thus, all the terms inc n, 8  The following Lemma will be used later to prove the asymptotic properties of the log likelihood function (12), and its derivative. Proof. For the lower bound, note that since sin x ≤ x, For the upper bound, we divide the summation into two parts. For j ≤ √ n, For j > √ n, since sin x > 2 π x for 0 < x < π 2 , n 1 − cos jπ n + 1 = 2n sin 2 jπ 2(n + 1) ≥ 2 j 2 n (n + 1) 2 > j 2 n + 1 . Then, Recall that λ j (θ) = θ n + 2σ 2 ε 1 − cos jπ n+1 . Applying Lemma B.4 with a = θ and b = σ 2 ε , and since 0 / ∈Θ , we get sup θ∈Θ n j=1 1 n p λ p j (θ)

B.2. Likelihood functions
In this section, we introduce several properties of the misspecified likelihood functionl n defined in (12). In the misspecified model (8), when the variance of the noise is assumed to be known, the likelihood function of θ is given by where recall that λ j (θ) := λ j (θ, σ 2 ε ) := θ n + 2σ 2 ε 1 − cos jπ n+1 . In what follows, we denote the corresponding first and second derivatives of l n andl n with respect to θ asl n ,l n ,l n , andl n , respectively.
The moments of the variable R j are frequently used below. We summarize them here. From (11) and (35)- (36), the moments of R j are such that The last inequality holds because n −1 ≤ λ j (θ * )/θ * . The following result aims to control the difference between l andl and their corresponding derivatives. Recall that l uses the true variance σ 2 ε , whilel adoptŝ σ 2 ε to replace σ 2 ε . Lemma B.5. Let l andl be given as in (12) and (38), respectively. If assumptions (N) and (JF) hold true, then for any integer k ≥ 1, Proof. The expressions inside the absolute values in (41) can be expressed as We first derive an upper bound for the first and second derivative of g and h. Fixed a τ > 0, and consider all x ≥ τ . Noted that 2n 1 − cos jπ |h j (τ )| 1 n 2 λ 2 j (θ, τ ) , sup x>τ |h j (x)| 1 n 2 λ 2 j (θ, τ ) .
The expectation of the first term of (45) is bounded.
The following lemma states a property of the jump component that will be used in Lemma B.7 below. The proof is classical and hence, it is omitted. Lemma B.6. Let g and h be known deterministic functions such that the expectation below is finite. Then, for any a, b, c, d ∈ {1, 2, . . . , n}, a < b < c < d, The following result will be needed in Lemmas B.8 and B.9. Proof. Denote E J (·) = E * (·|J) and recall (11). Our first step is to take the conditional expectation of the expanded square given J: We compare (46) .
(47) By the mutually independence of R j s, the second term of the right-hand side of (46) is equal to the second term of (47). Then, the absolute value of the difference between the left-hand sides of (46) and (47) is -(40) n j=1 n 2 (λ 2 j (θ) + n −2 )+n 2 λ j (θ) + n −1 2 n 2p λ 2p j (θ † ) , whose square root is O(n 1/4 ) because of (37). Then, to prove the result, it suffices to show By expanding E J R 2 j , note that By Lemma A.1, the first term is such that: Similarly, for the third term of (48) can be shown to be O P * (n −1/2 ). For the fifth term of (48), by (35) and |p ij | ≤ 2(n + 1) −1 , , which is bounded by (37). For the second term of (48), note that ⎛  (49) is By (35), to achieve the convergence rate n 1/2 , we need to prove ess sup