Cointegration in high frequency data

In this paper, we consider a framework adapting the notion of cointegration when two asset prices are generated by a driftless It\^{o}-semimartingale featuring jumps with infinite activity, observed synchronously and regularly at high frequency. We develop a regression based estimation of the cointegrated relations method and show the related consistency and central limit theory when there is cointegration within that framework. We also provide a Dickey-Fuller type residual based test for the null of no cointegration against the alternative of cointegration, along with its limit theory. Under no cointegration, the asymptotic limit is the same as that of the original Dickey-Fuller residual based test, so that critical values can be easily tabulated in the same way. Finite sample indicates adequate size and good power properties in a variety of realistic configurations, outperforming original Dickey-Fuller and Phillips-Perron type residual based tests, whose sizes are distorted by non ergodic time-varying variance and power is altered by price jumps. Two empirical examples consolidate the Monte-Carlo evidence that the adapted tests can be rejected while the original tests are not, and vice versa.


Introduction
It is often the case that time series cruise componentwise, but that a linear combination of the components does not drift apart. Since the seminal papers of [Granger, 1981] and [Engle and Granger, 1987], cointegration has spread across and way beyond the field of econometrics. The authors bring forward a residual based two-step strategy to test for the presence of cointegration. The first step corresponds to estimation of cointegrating relations via regression. The second step is closely related to test of unit roots. More specifically, residual based tests are designed to test the null of no cointegration by relying on a unit root test on the residuals. If the null of unit root test is not rejected, then the null of no cointegration is also not rejected. Among the class of unit root tests, standard Dickey-Fuller (DF) tests originally from [Dickey and Fuller, 1981] and augmented procedure from [Said and Dickey, 1984], and Phillips-Perron tests from [Phillips and Perron, 1988], are among the most popular. [Phillips, 1987] shows that this type of unit root tests is typically robust to many weakly dependent and heterogeneously distributed time series. As for cointegration tests, a large sample limit theory for the residual based tests is investigated in [Phillips and Ouliaris, 1990]. In particular, the critical values tabulated for the pure unit root tests are altered due to the error made in the first step. In the cited paper, and to the best of our knowledge in the rest of the literature on cointegration, the asymptotics is low frequency in the sense that the time gap between two observations ∆ is fixed while the horizon time T → ∞.
There is a solid body of empirical work employing cointegration in financial economics. In that field, the most common specification of cointegration definition is that all the components of the observed vector x t are unit roots (e.g. random walks) and that there exists a vector α so that α x t is stationary. Examples of application include and are not limited to nominal dollar spot exchange rates, e.g. [Baillie and Bollerslev, 1989] and [Diebold et al., 1994], price discovery, e.g. [Hasbrouck, 1995], and pairs trading, e.g. [Caldeira and Moura, 2013] and the review of [Krauss, 2017]. Although the major part of earlier empirical studies was confined to data observed on a daily basis, nowadays they frequently incorporate data observed on the intraday basis, see e.g. [Elyasiani and Kocagil, 2001], [Hasbrouck, 2003], [Pati and Rajib, 2011] or [Yang et al., 2012] among others. This increasing use of high frequency data is perfectly natural, and should even be encouraged given the basic statistical principle that one shall not throw away data. As a matter of fact, it echoes similar changes in other areas of the literature, the best example being the diversification and sophistication of efficient variance estimation measures over the past two decades.
Our concern in this paper is to back up the empirical use of high frequency data by theoretically validated high frequency robust estimation of cointegrating relations and tests. More specifically, the difference between the classical time series framework and the high frequency framework we consider is that in the latter ∆ → 0 while keeping T → ∞, and the observed time series x t will be generated by a driftless Itô-semimartingale. This environment, quite standard in high frequency financial econometrics, typically accommodates for a variety of stylized facts, such as time-varying variance featuring jumps, leverage effect and price jumps, all of which are salient in high frequency data. Price jumps can happen at deterministic times (e.g. macroeconomic news announcements) or not, are usually of random size, and quite frequent, at least ten a year on average according to Table 8 (p. 484) in [Huang and Tauchen, 2005]. For the purpose of simplicity, we restrict ourselves to the two dimensional case.
The question of testing for no cointegration with high frequency data is of practical relevance since some of its stylized facts can lead to significant distortions of the power of the usual tests. In an extensive Monte Carlo experiment, [Krauss and Herrmann, 2017] implement ten leading cointegration tests in a variety of set-up tailored with high-frequency features. They use an AR(1) with normal innovations as benchmark, but also look at non-normality effects employing t-distributed innovations, GARCH effects, nonlinearities and price jumps. They typically find that among a cohort of high frequency stylized facts, price jumps deteriorate the power the most.
In fact, the current theoretical framework to test for no cointegration is quite restricted and only accommodates for ergodic time-varying variance. Spurious cointegration can occur in the presence of a mere jump in long term equilibrium variance, as documented by [Noh and Kim, 2003]. In other words, this means that non ergodic time-varying variance can distort the size of the tests. Obviously, timevarying variance is not solely a high frequency stylized fact, as [Sensier and Dijk, 2004] report that most of the real and price variables in the dataset from [Stock and Watson, 1999] reject the null hypothesis of constant variance against one regime shift alternative. As far as we know, the only attempt to accommodate for time-varying variance in a more flexible cointegration model is [Kim and Park, 2010], but unfortunately in that paper the authors test the reverse null of cointegration.
Our aim is to adapt the residual based DF procedure developed in [Phillips and Ouliaris, 1990] to test for the null hypothesis of no cointegration in high frequency data. More specifically, the residual based tests of the cited paper are not theoretically robust to the two aforementioned high frequency stylized facts pointed by the literature as distorting size and power -time-varying variance and price jumps-. Accordingly, we develop two separate adaptations, one for each feature, that we eventually combine with each other, as the two problems are not of the same nature and cannot simply be tackled with a unique straightforward adaptation. As far as we know, no cointegration test has been tuned to those two high frequency stylized facts, yet there are accordingly two very related fields from the literature.
The first active field is about testing for the presence of a unit root incorporating time-varying variance. [Cavaliere and Taylor, 2008a] apply a time transformation to the data prior to use the standard unit root tests and retrieve the same asymptotics. [Cavaliere and Taylor, 2007] employ the time deformation to simulate valid critical values for standard unit root tests. [Cavaliere and Taylor, 2008b] and [Cavaliere and Taylor, 2009] consider a related method involving the wild bootstrap. Wild bootstrap tests formed from a feasible generalized least squares are developed in [Boswijk and Zu, 2018]. Finally, [Beare, 2018] preestimate volatility and exploit it to deflate the returns prior the use of the standard tests. Unfortunately, we were not able to find track of the use of those nice methods in the context of residual based cointegration test. In addition, the theoretical framework of all those papers is restricted to deterministic volatility and no price jumps, except for [Cavaliere and Taylor, 2009] who allow for random variance.
Another very dynamic field is about residual based cointegration tests with regime changes, which is typically related to price jumps in the Itô-semimartingale setup. In a univariate context, unit root tests were designed in [Perron, 1989], [Perron, 1990], [Banerjee et al., 1992], [Perron and Vogelsang, 1992] and [Zivot and Andrews, 1992]. [Gregory and Hansen, 1996] extend the tests in a cointegration frame-work. In all those papers, the number of shifts is at most equal to unity. Moreover, [Hatemi-j, 2008] allows for two possible breaks. Finally, [Maki, 2012] proposes tests which permits an arbitrary number of breaks in principle, although the method is seldom used with more than five breaks in practice. The strong limitation of the regime change approach is that jumps are required to happen at deterministic times, with deterministic size, known number of jumps, and not very often, all of which are not consistent with the aforementioned high-frequency stylized facts.
To obtain the robustness of the residual based test to time-varying variance, the most natural candidate consists in a residual based test based on the aforementioned time-varying variance robust unit root tests. Unfortunately, a transformation of data solely at the unit root test step entails hybridbehaved tests to the extent that the limit distribution is unidentifiable, at least to us. To circumvent this difficulty, we deflate the returns prior to the two steps. As for that to price jumps, we simply consider a truncation method as in [Mancini, 2009], which is commonly used in the literature on high frequency econometrics when dealing with jumps with random size and times.
Our theoretical contribution is divided into two parts. First, we provide a regression procedure based on deflated and truncated asset price returns in order to estimate the cointegrated system and establish the related consistency and central limit theory when there is cointegration. This is related to Theorem 3.5 in Section 3.3. Second, we develop a DF type residual-based cointegration test, also based on deflated and truncated observations, along with its central limit theory in the absence of cointegration and under local alternatives. We also provide with an equivalent of the expression in the presence of cointegration (see Theorem 4.2 and Proposition 4.3). In particular we show that under the null hypothesis of no cointegration the limit distribution of the statistic is that of a classical DF test, and that no local power is lost due to the truncation and the deflation. Finally, we briefly examine the case of drifting Itô-semimartingales and show that, even though theoretical derivations are possible, the limit distribution of the statistic is already complex in the case of a linear drift, indicating that combining drift and deflation is a difficult task.
The finite sample from this paper corroborates that from cited papers and our theoretical findings. On the one hand, a typical environment of cointegrated assets observed at high frequency distorts badly size and power of the DF or Phillips-Perron type residual based unit root tests, and as expected we can isolate the effect of time-varying variance as impacting the former, and that of price jumps as altering the latter. On the other hand, the proposed test size is appropriate and its power is good in the simultaneous presence of both stylized facts. Two empirical examples illustrate the fact that the proposed test can be in disagreement with the original tests, indicating the practical relevance of asset prices truncation and deflation. In addition, finite sample shows light on the fact that drift does not seem to affect both size and power of all the residual based tests, even in extreme cases where the magnitude is ten times as big as the standard magnitude.
An obvious limitation of our approach is that the framework considered rules out market microstructure noise. This is mitigated by the fact that the vast majority of empirical studies related to cointegration operating with financial time series does not sample with a frequency higher than five minutes so that if the asset is liquid enough the data are reasonably free of market microstructure noise (see, e.g., [Aït-Sahalia and Xiu, 2016]). To temper as much as possible the effect of microstructure noise, we do not sample faster than every ten minutes in both our numerical and empirical studies.
The remaining of this paper is structured as follows. The framework which is a natural adaptation to high frequency data is given in Section 2. Estimation of cointegrated relations, based on regression on the truncated and deflated price, and its related limit theory under cointegration is provided in Section 3. A DF-type test of the null hypothesis that there is no cointegration against the alternative that there is cointegration, together with its limit theory, is developed in Section 4. In particular, this test's robustness to time-varying variance and price jumps, which is based respectively on deflation and truncation, is established. Section 5 is devoted to a Monte Carlo experiment, to shed light on the size and power twist of the original DF and Phillips-Perron based cointegration tests and the good behavior of the adapted DF test in a variety of realistic configurations. In Section 6, a brief empirical study, which corroborates the fact that the proposed test is not always in accordance with the original tests, is conducted. We conclude in Section 7. Proofs can be found in Section 8.

The framework: a natural adaptation to high frequency data
In this section, we introduce our general framework, which in particular accommodates the definition of no cointegration ( [Engle and Granger, 1987]) when two driftless Itô-processes including jumps with infinite activity are observed synchronously and regularly. Our introduced framework also specifies the notion of cointegration and weak cointegration.
We introduce a few key concepts from the aforementioned paper and the existing literature on cointegration for time series that will help motivate the framework of our own study. For the sake of clarity, we restrict ourselves to the case of a pair of processes, but all the definitions can be extended to the multivariate case with no major difficulty. Let us consider two unit root time series x t and y t (i.e. with explosive variance and with stationary increment processes ∆x t and ∆y t ). As pointed out in [Engle and Granger, 1987], in general, any linear combination y t − αx t will be again a unit root process drifting away from zero. In that case, x t and y t are said not to be cointegrated. However, it may also happen that for some α, the time series y t − αx t does not wander far from zero, and not only its increments but also the series itself is stationary. The couple (x t , y t ) is then said to be cointegrated with cointegration vector (1, −α) T . When conducting tests, and for the sake of tractability, both notions are often naturally embedded (see e.g. model (4.7) of [Engle and Granger, 1987]) in an AR(1) model specification as follows. We assume that x t is a unit root process, and that where u t is a non-trivial stationary process possibly correlated with ∆x t . Then, 0 < ρ < 1 yields a cointegrated system, whereas ρ = 1 implies that t is a unit root process, hence no cointegration. Moreover, the regime 0 < ρ < 1 with ρ → 1, yields a process t which is a nearly integrated process (as first introduced in [Nabeya and Perron, 1994]), and accordingly the system (2.1) can be said weakly cointegrated. Residual based tests for the null of no cointegration usually pre-estimate α by, for instance, an ordinary least squares (OLS) regression, and then run a unit root test (i.e ρ = 1 versus ρ < 1) on the estimated residual t = y t − αx t . As detailed in [Phillips and Ouliaris, 1990], it is of importance to mention that such a two-step procedure affects the limit distribution of the test statistic so that testing for cointegration does not amount to directly testing for a unit root in t . In particular and of practical relevance, the critical values are altered. This deviation from the unit root case stems from the inconsistency of α (hence t ) under the null of no cointegration.
In view of the above discussion, our first goal consists in introducing a model similar to (2.1) where now ∆x t and u t are replaced by increments of driftless Itô-semimartingales. We return to the case of drifting processes in a detailed discussion at the end of Section 4.2. We assume that we observe regularly (i.e. at times t 0 := 0, t 1 := ∆, · · · , t n := n∆ with ∆ := T /n) between 0 and the horizon time T (depending on n) two càdlàg (right continuous with left limits) processes X and Y . 1 For any process A, we use the following conventions for i ∈ {0, · · · , n}: In what follows, we assume that X and Y may be further decomposed as where X c and Y c are the continuous parts of X and Y , and J X and J Y are pure jump processes such that, for U ∈ {X, Y }, t ∈ [0, T ], where µ U is a Poisson random measure on R + × E for E some auxiliary Polish space, ν is the compensator of µ U of the form ν U (ds, dz) = ds ⊗ λ U (dz) where λ U is a σ-finite measure, and where δ U is a predictable function on Ω × R + × E. Moreover, we assume that there exists r ∈ [0, 1) such that In particular, although the jump processes may feature an infinite number of jumps on bounded time intervals, (2.6) ensures that the jumps are summable on [0, T ] and of order at most T , since it implies that for some constant K ≥ 0.
We now assume that X and Y satisfy a relation of the same nature as (2.1). Assuming first that J X = J Y = 0, we naturally adapt (2.1) as follows. We assume that there exist c 0 , α 0 ∈ R such that for any i ∈ {1, · · · , n}, we have where ρ ∈ [0, 1], and may depend on n, and where X c and Z are two continuous Itô-martingales of the form where σ X and σ Z are càdlàg adapted processes, and W X and W Z are Brownian motions featuring possibly non-trivial high frequency correlation d W X , W Z t = r t dt. Therefore, at time t ∈ [0, T ], and up to the multiplicative term (σ M t/T ) 2 , the two dimensional process (X, Z) features a squared volatility equal to Finally, σ M , which is the common deterministic volatility component, is a càdlàg function from [0, 1] to R + − {0}. The component σ M can be interpreted as the market volatility (i.e. common to all the stocks), whereas σ X and σ Z correspond to the idiosyncratic component of volatility. For example, σ M can be a linear trend, while σ X and σ Z can both be a product of daily U-shape and random stochastic component such as Heston model. Further examples are considered in our finite sample analysis.
Remark 2.1. The components σ X and σ Z may differ, will be assumed ergodic and typically account for the long time regularities (e.g. seasonality) of X and Z. Having ergodic returns is in line with most of the literature on cointegration, see for instance [Phillips and Ouliaris, 1990], or the more recent work of [Perron and Rodríguez, 2016]. On the other hand, σ M encompasses possibly non ergodic trends in volatility and is assumed to be a common factor in X and Y . As far as we know, and as detailed in Section 4, if the non ergodic components differ in X and Y , then constructing a test statistic which is numerically reliable and whose distribution is identifiable under the null of no cointegration remains an open and difficult question that we set aside in this paper. Note that adding such a non ergodic component scaled in time from 0 to T is common practice in the literature on tests for unit root processes, as in [Cavaliere, 2005], [Cavaliere and Taylor, 2007] and more recently [Beare, 2018] among others. In our case, however, σ M is taken càdlàg whereas earlier works on unit root processes assumed the function to satisfy a Lipschitz condition except for, at most, a finite number of points of discontinuity. Note also that, as mentioned in the aforementioned papers, the fact that σ M is assumed deterministic can be easily relaxed to random and independent of the main filtration F. Then all the convergences can be taken conditionally to σ M . Finally, to our knowledge, there is no existing literature on a test of no cointegration which is robust to the presence of a common non ergodic volatility component.
Just as ∆X c i is the continuous time counterpart of ∆x t , ∆Z i now plays the role of u t in (2.1). Note also that the presence of an intercept c 0 in the regression is just a convenient way to center the residual process without loss of generality. Moreover, as ρ controls how close the residual is from a unit root process in (2.1), ρ now controls how close is from an Itô-martingale in (2.8), with the two extreme cases being ρ = 1 where i = Z i , and ρ = 0 where i = ∆Z i for i ∈ {1, . . . , n}. When 0 < ρ < 1, i lies somewhere between an Itô-martingale and an increment of an Itô-martingale. It is important to note that, by (2.8), Y necessarily depends on n, because does (unless ρ = 1).
In general, when J X = 0 or J Y = 0, it would be natural to simply replace X c and Y c by X and Y in (2.8), but it turns out that imposing such a constraint on the jump processes would imply that ∆J Y = α 0 ∆J X . If the jumps are seen as structural breaks in the processes X and Y , it is a very strong assumption which would require substantial support from empirical data. More importantly, letting J X and J Y free of constraint does not affect our strategy (and the related limit theory) for analyzing X and Y , which will consist in first getting rid of the jump components using the truncation approach of [Mancini, 2009] and then directly work with the estimated continuous components. Accordingly, for the sake of generality, we keep (2.8) even in the presence of jumps. The cointegration relation between X and Y yields for i ∈ {1, . . . , n} that which can be seen as cointegration (if ρ < 1) with multiple level shifts. Cointegration with breaks has been studied in [Gregory and Hansen, 1996] (see Model 2) in the case of a single shift, and extended to the case of an arbitrary large (but known) number of deterministic breaks in [Maki, 2012]. In Equation (2.10), the shifts are ∆J Y s − α 0 ∆J X s for s ∈ [0, T ], which may be in infinite number, are of random sizes, and can feature endogeneity.
We now adapt the notion of no cointegration introduced in [Engle and Granger, 1987] and discussed above to the case of driftless Itô-semimartingales.
Definition 2.2. (no cointegration) Two càdlàg processes X and Y are said not cointegrated if any linear combination Y − αX, α ∈ R, is a driftless Itô-semimartingale whose volatility component σ is such that P − lim inf T →+∞ T −1 T 0 σ 2 s ds > 0. Definition 2.2 is thus a straightforward adaptation where time series have been replaced by càdlàg processes and unit root processes have been replaced by driftless Itô-semimartingales with non-trivial volatility components so that they are indeed explosive as T → +∞. Let us now get back to the model (2.8) and turn our attention to the description of different settings of ρ and their impact on the relationship between X and Y . Following the time series framework, Y and X are not cointegrated if ρ = 1 and, of course, if for any x ∈ R 2 − {0}, P − lim inf T →+∞ T −1 x T T 0 Σ 2 s dsx > 0, since in that case, (2.8) reads for any i ∈ {1, . . . , n} We now turn our attention to the notion of cointegration. We follow again the time series case and say that X and Y satisfying (2.8) are cointegrated if ρ ∈ [0, 1) and does not depend on n. In that case, note that this implies the existence of c 0 , α 0 ∈ R such that where i is not the value of an Itô-martingale (and is of one order of magnitude smaller). Finally, we will also consider the intermediary situation where X and Y are weakly cointegrated, which corresponds to the case where ρ = 1 − β/n for some β > 0. Henceforth we will accordingly always assume that X and Y are generated according to one of the following setting: (i) Cointegration 0 ≤ ρ < 1 (independent of n).

Construction of the estimator
We now focus on estimating the couple (α 0 , c 0 ) based on the discrete observations of X and Y . Naturally, one can expect (α 0 , c 0 ) to be identifiable only when X and Y are cointegrated. Indeed, we will see that when ρ = 1 (i.e. non cointegration case), our proposed estimator is inconsistent. Accordingly, any estimation of the cointegration parameters is untrustworthy without performing a test of no cointegration, question that we set aside in this section, and that is treated in Section 4. In any case, constructing the estimator does not require any knowledge whatsoever about the cointegration level ρ ∈ [0, 1].
We first consider the case where J Y = J X = 0 and σ M = 1. We adapt the classical OLS estimator proposed in [Engle and Granger, 1987], and resulting from (2.8) seen as a linear regression where is the noise process. Recalling that X and Z may be correlated, so that the regression model induced by (2.8) features endogeneity. In particular, this rules out the alternative regression based on the high-frequency returns of X and Y because ∆Y c i , ∆X c i and ∆ i being of the same order √ ∆, the OLS estimator based on (3.1) would be inconsistent due to non-zero correlation between ∆ i and ∆X c i . Conversely, the regression based on (2.8) is robust to endogeneity because X i and Y i are of order √ T whereas when ρ < 1, i remains of order √ ∆.
In general, X and Y contain jumps and a non-constant common volatility component. Accordingly, we now estimate (α 0 , c 0 ) in a three-step procedure consisting in first getting rid of those two features and then applying the aforementioned OLS estimation. At this point, in view of the representation (2.10), it seems natural to adapt the methodology of [Gregory and Hansen, 1996] and derive a modified OLS estimator which estimates and cancels the effect of the jumps seen as level shifts in (2.10). However, the time required to run the break-robust OLS estimation greatly increases with the number of breaks. [Maki, 2012] proposed an alternative and less time-consuming method, but it also presents several drawbacks. First, the number of breaks (or at least an upper bound k) must be known. Second, the limit distribution of the test statistic depends on k, meaning that it may be necessary to calculate critical values if k is larger than five, which is the highest value for which they have been reported. Finally, the method is supported by a numerical study only (again, for models with five breaks at most). Since we allow for a potentially high number of jumps in (2.10), both approaches are inadapted, which is why we henceforth adopt the truncation method of [Mancini, 2009]. As explained in the asymptotic theory section, it is independent of the number of shifts and robust to all the aforementioned features of the jumps, both for estimation and test. Accordingly, we remove the increments of X and Y such that at least one of them is greater than a given threshold in absolute value. More precisely, for U ∈ {X, Y }, we compute the truncated process T (U ) = (T (U ) 0 , . . . , T (U ) n ) such that T (U ) 0 = U 0 , and for i ∈ {1, . . . , n} for some constant a > 0 and some exponent ω ∈ (0, 1/2) satisfying additional constraints stated below. Second, we deflate the returns of both truncated processes T (X) and T (Y ) by a consistent estimator √ C i of σ M t i − up to some multiplicative constant. The procedure is similar to what was proposed in [Beare, 2018] for the case of a unit root process. Hereafter, we take C i as the standard local realized volatility on the truncated returns of X (Using the returns of Y would yield an estimator of σ M t i − up to a coefficient which depends on ρ). First, for two indices 0 ≤ l < k < i, i ∈ {1, . . . , n}, we define where a and ω were defined before. Next, we take is the floor of x, and 0 < γ < γ < 1. The local window considered for (3.2) is thus such that the number of observations k − l → +∞ and at the same time the length of the window in order to preserve the martingale structure of some transformations of when ρ < 1 and thus circumvent some technical difficulties that arise in the proofs. We then define for i ∈ {1, . . . , n} Given that l is negligible with respect to k, it does not affect the limit theory of C i . We then compute the deflated version of Key to our analysis is that both operations T and 'def ' naturally preserve the cointegration relationship, in the sense that for any i ∈ {1, . . . , n} Finally, for two processes A, B, their associated OLS estimator is defined as Remark 3.1. When J X = J Y = 0 and in the absence of truncation, ( α, c) is the OLS estimator of a the linear transformation of X and Y where their respective returns have been multiplied by the weights w i = C −1/2 i . This is similar (yet not equal) to the GLS of [Kim and Park, 2010], where the authors pre-estimate the time-varying variance of the noise by a standard OLS, and then construct the associated GLS cointegration estimator consisting in putting similar weights directly in front of X i and Y i .

Assumptions and high-frequency framework
We now proceed to give an asymptotic framework along with reasonable conditions under which the OLS estimator introduced in (3.4) is consistent (assuming, of course, ρ < 1, i.e. cointegration). We also give a stronger setting on the jump processes which ensures a central limit theory for ( α, c). We will use the same framework when testing for no cointegration in the next section. Our first assumption specifies the high frequency asymptotics that is considered in this paper.
Such a double asymptotic is consistent with the high-frequency context (∆ → 0) and the fact that cointegration is a long-run phenomenon (T → +∞). Next, we assume that the volatility matrix Σ has bounded moments up to some order p 0 and is ergodic.
Moreover, σ X is asymptotically bounded from below with probability 1, i.e. there exists σ X > 0 such that Remark 3.2. The definition of ergodicity stated in [B] is quite flexible. For instance, it encompasses most combinations of stationary ergodic processes and periodic processes. The asymptotic boundedness of σ X away from 0 is assumed to avoid degenerate behaviors of the statistics due to the deflation operation. Similar long-run high-frequency asymptotics and ergodic settings can be found in the recent literature, see e.g. [Christensen et al., 2018] and [Andersen et al., 2019] where the volatility process is the product of a stationary mixing component and a periodic component. Finally, note that Σ t may be correlated with (W X , W Z ) so that the Brownian integrals feature leverage effect.
Now we turn to our third assumption, which states conditions on the truncation parameters, and an additional condition on the relationship between T and n.
Remark 3.3. In particular, the second condition in [C] implies that T must tend to infinity slowly enough compared to n. In the case where Σ t admits moments of any order (p 0 = +∞), we note that Condition [C] can be simplified as 1/(4 − r) < ω < 1/2, and taking ω arbitrarily close to 1/2, the condition on T becomes T 1+η+ 1 1−r n −1 → 0 for η > 0 arbitrarily small. Finally, note that for jumps of finite activity r = 0, this can be further simplified as T 2+η n −1 → 0 which is stronger than the condition ∆ → 0 stated in Assumption [A]. Of course, if J X = J Y = 0 and the truncation step is ignored in the estimators, all the stated results hold with e 1 = 0 in [C]. If moreover p 0 = +∞ then e 2 = 0 too and [C] reduces to ∆ → 0.
Finally, we state an additional more restrictive assumption on the jumps, that we will use only to derive a central limit theorem under cointegration for ( α, c) (Theorem 3.5) and an equivalent of the Dickey-Fuller statistics under the alternative of cointegration (Proposition 4.3). It plays no role in the derivation of the consistency of the OLS and of the Dickey-Fuller test under any of the hypotheses.

Assumption [D]
: J X and J Y are two sequences of jump processes such that for The above assumption states that the jump processes are asymptotically negligible in the regression, and satisfy an additional cointegration condition. Hereafter, we will always implicitly assume that [A]-[C] hold. As for [D], we will explicitly state whether it is assumed or not.

Asymptotic theory of the OLS estimator under cointegration
We now give the asymptotic properties of C i , and of the OLS estimator under the cointegration regime ρ < 1. We start with C i . Since 0 < γ < 1, note that the local time window T γ → +∞ so that the ergodic theory for Σ kicks in, and at the same time the scaled time window Proposition 3.4. For any i ∈ {1, . . . , n}, when n → +∞, As proved in the Appendix (see Lemma 8.3), it turns out that the above L 2 convergence is even uniform outside a set of indices whose cardinality is negligible with respect to n. A full uniformity as in Lemma 4.1 of [Beare, 2018] is impossible here because σ M may have jumps (whereas in the aforementioned paper σ M is assumed differentiable). We now focus on the asymptotic properties of ( α, c) when X and Y are cointegrated. In what follows, we let W = (W 1 , W 2 ) be a standard Brownian motion on [0, 1]. Moreover, defining with r ∞ = ω 12 / √ ω 11 ω 22 , we also let B = (B 1 , B 2 ) = LW be a two dimensional Brownian motion on Theorem 3.5. Assume that X and Y are cointegrated, that is ρ < 1 and is fixed. Assume further that 1/2 ≤ γ < 1 and 0 < γ < γ. Then we have Moreover, under the additional assumption [D], we have the convergence in distribution The fast rate n for the estimation of α 0 is in line with the literature on cointegration estimation, see for instance Proposition 1 of [Engle and Granger, 1987] and Theorem 7 in [Kim and Park, 2010]. Similarly, the fact that X c T and Y c T are of order T 1/2 implies that one can consistently estimate only T −1/2 c 0 , with the same rate n as for α 0 . Note also that in the exogenous residual case ω 12 = 0, the limit distribution for α corresponds to the one of a classical OLS on an homoskedastic cointegration regression as in Lemma 2.1 of [Phillips and Park, 1988] up to the mean terms W 1 , due to the presence of the constant c 0 in our regression. In other words, the jumps and the heteroskedasticity coming from σ M do not impact the limit theory of α and c, and no efficiency is lost due to the truncation and the deflation. Finally, note that, as explained in the following remark, the above limit distribution is actually mixed normal.
Remark 3.6. Rewritting B as LW and conditioning on W 1 , we can specify the above limit as the following mixed normal distribution. Defining Moreover, when B 1 and B 2 are independent, r ∞ = 0, and the above limit becomes Remark 3.6 suggests that it is possible to construct a studentized version of Theorem 3.5, essential for the computation of confidence intervals and significance tests. To do so, we need to estimate the different quantities appearing in the bias and the variance of the mixed normal distribution. We construct the estimated residuals and then estimate v , ρ, and r ∞ as follows. (3.10) Note that for the estimation of ρ, we have preferred the formula of (3.9) over the more classical whereas it is not theoretically clear whether the latter is consistent without Assumption [D]. We also substitute T (X) def to all the quantities involving W 1 appearing in the central limit theorem. That is, letting we introduce the estimators for the asymptotic biases and variances of α and c: we have the following studentized version of the central limit theory for α and c.
Recall that both hypotheses induce respectively the following models on the continuous parts of X and Y : and that the parameter ρ controls how far H 1 is from H 0 . As it is standard in the literature on tests for unit roots and cointegration (see e.g. [Pesavento, 2004], [Beare, 2018]) and useful to derive the local power of our test, we embed H 0 in the family of local alternatives H n,β 1 defined as which implies the following model on the continuous parts of X and Y , that corresponds to the notion of weak cointegration introduced at the end of Section 2 when β > 0, and is simply H 0 when β = 0. The canonical test in the unit root literature is the so-called Dickey-Fuller test on residuals of [Dickey and Fuller, 1981]. It has been extended to many directions, such as, among others, the augmented Dickey-Fuller (ADF) test, robust to a residual following an AR(p) specification, and the Z t and Z α tests of [Phillips, 1987], robust to autocorrelated returns under the null hypothesis of a unit root process. These tests have been later adapted to cointegration, and their asymptotic properties derived in [Phillips and Ouliaris, 1990]. In the present work, we choose to focus on the DF approach, performed on the estimated residuals resulting from the OLS estimation of (3.4). Before we state the main result of this section, we briefly recall the construction of the test statistic.
Recall that the estimated residuals are defined for i ∈ {1, . . . , n} as then, the associated DF statistic Ψ is the t-statistic of the coefficient φ in the linear regression and estimating the standard deviation of φ with We now proceed to derive the asymptotic distribution of Ψ under H n,β 1 , for any β ≥ 0. In particular, recall that H 0 is covered by Theorem 4.2 below, since H 0 = H n,0 1 . We need to define a few quantities before we state the main result. As in the previous section, we consider W = (W 1 , W 2 ) a standard Brownian motion on [0, 1], and we define the two dimensional process on [0, 1] and for β ≥ 0 and finally where we recall that for a process The next proposition shows that the OLS estimator (and therefore the associated estimated residual process) is inconsistent under H n,β 1 .
Then, under H n,β 1 , we have the joint convergences We are now ready to state the limit distribution of Ψ under any local alternative H n,β 1 . Define In particular, under H 0 , we have The next proposition gives the behavior of Ψ (and proves the consistency of the test) under H 1 , and provides an equivalent of the statistics under the stronger assumption [D].
When β = 0, the limit distribution of Ψ in Theorem 4.2 is the same as the one of the ADF statistic in Theorem 4.2 of [Phillips and Ouliaris, 1990], up to the mean component W coming from the fact that an intercept is present in the regression. Therefore, under H 0 , as in the previous section, the truncation and the deflation completely cancel the impact of jumps and that of the non ergodic volatility σ M that may affect Ψ. Moreover, in Proposition 4.3, the divergence rate of Ψ also corresponds to the standard one (see Theorem 5.1 in [Phillips and Ouliaris, 1990]). However, under the local alternative H n,β 1 with β > 0, the limit of Ψ depends on the shape of σ M , so that the local power of the test may be affected by a non ergodic volatility component. This feature was already present in the time-varying variance robust unit root tests of [Beare, 2018]. More importantly, without jumps and if σ M = 1, a careful examination of Theorem 1 in [Pesavento, 2004] shows that the limit distribution of the standard DF and our own modified test coincide for any β ≥ 0: no local power is lost when applying the truncation and the deflation even in the absence of those features. Finally, as a direct corollary of Theorem 4.2 and Proposition 4.3, we conclude this section with the consistency of the modified DF test.

Testing for cointegration with drifting Itô-semimartingales
We now examine how the testing procedure can be adapted if the processes X c and Z feature drift terms. We only partially address the problem, and restrict ourselves to the simple case of linear trends. Dealing simultaneously with general drifts, even ergodic ones, and a non ergodic volatility component is a difficult matter (at least to us) that we set aside in this work. As a matter of fact, we show in this section that even with linear drifts a natural adaptation of our DF statistic already yields a complex limit distribution that depends on the curve u → σ M u even under the null hypothesis (so that critical values must be estimated everytime the test is run). The new model for X and Y is The testing procedure can be modified as follows to be drift robust. We first truncate the returns, then detrend the processes by subracting to each return the quantity D(U ) = n −1 (T (U ) n −T (U ) 0 ), and finally deflate by √ C i each truncated and detrented return. This yields the new process for Next, we apply the testing procedure of the previous section toT ( We denote byΨ the associated statistic. In the following theorem, for (V u ) u∈[0,1] a process, we define (V ) u∈[0,1] such that for u ∈ [0, 1], whenever the integrals make sense.
In particular, note that even in the case of a constant drift, the limit distribution ofΨ now depends on σ M even under H 0 . Therefore, one needs to estimate the curve of σ M and then compute the related critical values by, for instance, Monte-Carlo simulations. This sheds light on the lack of applicability of the above procedure, and also indicates that dealing with a drift and time-varying volatility at the same time is a complex procedure.
Since the drift yields a negligible impact in our numerical studies when it is calibrated to values usually encountered in empirical data, it seems more reasonable to use the simpler statistic Ψ whose critical values are known and independent of the model at hand.

Finite sample
In this section, we conduct a Monte Carlo experiment in two steps. First, we investigate that the deflated and truncated based OLS method to estimate the cointegrated relations performs reasonably well, and that it outperforms the classical OLS procedure in a general model incorporating all the features of high frequency data in case of cointegration. Very related to estimation of relation methods is that of autocorrelation of the residuals' level, which we also look at. Second, we examine the size and power properties of the modified Dickey-Fuller residual based test for the null of no cointegration against the alternative of cointegration. In addition, we explore how the modified test performs relative to four standard residual based tests from the literature on cointegration in a variety of models, and more specifically in the presence of which feature the new test outperforms the standard procedures.

Setup
Overall, eight different models are generated. An overview is reported on Table 1. One model (i.e. Model 8) is general and includes all the aforementioned features of high frequency data, whereas each remaining model (i.e. Model 1-7) includes one specific feature. We simulate M = 1, 000 Monte Carlo paths of high-frequency returns, where each path consists of T = 2 years of generated returns. A year is divided into 252 working days, each of them being set to 6.5 hours of trading activity, i.e. 23,400 seconds. Each path is simulated via an Euler scheme with related step set to 10 seconds.

Sampling gap
In accordance with our empirical examples, we consider the gap between two observations ∆ ranging from 10 minutes, i.e. 600 seconds which sets the number of observations to n = 19, 656 across the two simulated years, to 2 days, yielding n = 252 observations. With one observation every 10 minutes, we are enough in the high frequency regime so that the limit theory related to the truncation method kicks in, but not too much into it so that we prevent as much as possible from market microstructure effects. When the gap is two days, this is purely low frequency setting.

Simulation mechanism
We simulate X c t and Z t as: where the correlation between W X t and W Z t is set to ρ = 0.2, i.e. d W X , W Z t = ρdt. Here, contrary to the theoretical setting in (2.9), the two processes can incorporate non-zero drifts which are set to are two independent Brownian motions. Depending on the model at hands, the market volatility can be constant, linear, or including 1 jump, and may respectively take the following forms: where we fixσ = √ 0.1. For V ∈ {X, Z}, the idiosyncratic component of the volatility is split into a U-shape intraday seasonality component and Heston model with jumps specified as with C = 0.75, A = 0.25, D = 0.89, a = 10, c = 10, the volatility jump process is defined as where the volatility jump magnitude M σ,V t is distributed as N (0.5, 0.1), the signs of the jumps S σ,V t = ±1 are i.i.d symmetric, N σ,V t is a homogeneous Poisson process with parameterλ = 10T /252 (with that setting volatility jumps occur randomly on average ten times a year), α = 5,σ 2 = 1, δ = 0.4,W V t are independent standard Brownian motions such that d W V ,W V t = φdt, φ = −0.75, (σ V 0,SV ) 2 is sampled from a Gamma distribution of parameters (2ασ 2 /δ 2 , δ 2 /2α), which corresponds to the stationary distribution of the CIR process. To obtain more information about the model one can consult [Clinet and Potiron, 2018]. The model is inspired directly from [Andersen et al., 2012] and [Aït-Sahalia and Xiu, 2016].
In addition, for V ∈ {X, Y } the price jumps are generated via where the price jump magnitude M V t is distributed as N (σ/ √ 10,σ/10 3/2 ), the signs of the jumps S V t = ±1 are i.i.d symmetric, N σ,V t is a homogeneous Poisson process with parameterλ = 10T /252 (with that setting jumps occur on average 10 times a year and the contribution of jumps to the total quadratic variation of the price process is around 50%, both of which are roughly in line with empirical findings in [Huang and Tauchen, 2005]).
Finally, the parameter related to the autocorrelation of residuals' level introduced in (2.8) is obviously set to ρ = 1 in case of no cointegration (i.e. null hypothesis) and chosen equal to ρ = 0.8, 0.9 when there is cointegration (i.e. in the alternative).

Concurrent methods
We implement four concurrent leading methods, all of which have already been mentioned: the DF test and the ADF test, and the Phillips-Perron tests Z α and Z τ , which are tuned to cointegration in [Phillips and Ouliaris, 1990].

Remaining tuning parameters
Tuning parameters used for truncation are set equal to that of the numerical study (Section 5, p. 301) in [Clinet and Potiron, 2019]. Also, parameters related to the deflation are set to γ = √ n and γ = 0.01. Table 2 reports the bias and standard deviation of the two parameters of cointegrated relations in the case of the modified OLS and standard OLS in a general model when there is cointegration. It is clear that the standard OLS is defectively biased for any level of subsampling. On the contrary,              the modified OLS works well when the frequency of subsampling is high enough, but is equally biased when the frequency decreases. This is due to the truncation method which performs more poorly when the frequency decreases. Thus, a limitation of our method when there is a price jump component is that it requires to sample at reasonable high frequencies, i.e. up to one hour. Table 3 reports the estimated autocorrelation of the residuals' level ρ. The corresponding signature plots can be found on Figure 1. The standard estimator is off, notably in the presence of cointegration (i.e. ρ < 1). The modified estimator is quite reliable when subsampling up to one hour, but insufficient with lower frequencies. The standard and adapted DF can be seen as testing respectively ρ = 1 and ρ = 1.

Validity of modified DF tests
We turn now to the behavior of the size and power of the tests. Table 4-11 report the size and power of the tests for a variety of models. Table 4 reports the size and power of modified DF and that of the concurrent methods in a pure time series environment, i.e. with no high frequency data feature and constant volatility. Evidently, the DF test, which was designed for such environment performs the best, but there is no substantial difference between the size and power of modified DF and that of the concurrent methods. This seems to indicate that the deflation and truncation do not degrade much the behavior of the statistic in this basic framework. Although we would not recommend to use our more sophisticated test given the setup, it is fairly reassuring to see that the modified test is yet substantially in line with the alternative methods.
Table 5-6 report the size and power of modified DF and that of the alternative methods when there is respectively a linear trend and one break in market volatility. It is clear that sizes of the concurrent methods are distorted when market volatility is non constant. Reversely, sizes of the modified DF are satisfactory at any level of sampling and for both configurations. The powers of all the methods are not affected. This indicates that the deflation provides a real advantage in practice when market volatility is non constant. Table 7-8 report the size and power properties respectively in the presence of U-shape and jumps in the idiosyncratic component of volatility. It seems that both configurations do not affect the size and power properties. It is not surprising as our assumption of ergodicity on the idiosyncratic part of volatility falls within the setting of [Phillips, 1987]. Table 9 reports the size and power properties of modified DF and that of the concurrent methods in the presence of drift. This is considerably important as our theoretical setup does not accommodate with a non zero drift. Comparing to Table 4, there is no visible difference between the results obtained in case of drift inclusion or not. Actually, we have generated models including drift with parameter values ten times as big as the standard values we can find in the financial literature, and still there was no perceptible effect on the statistics. Our conclusion is that the drift does not seem to affect the size and power properties, at least in the range of parameter values we will come across on real stocks. Accordingly, we believe that it is reasonably safe to use our tests on stocks data including drift. Table 10 reports the statistical properties in case of breaks in price process. We can see that the power of the concurrent methods is distorted when price features jumps. In case of the modified DF, the power is adequate when the sampling frequency is high enough, but not suitable when the frequency decreases. This is what to be expected using the truncation method, and definitely a limitation of our method. Nonetheless, we can see that the truncation is beneficial for whoever implements standard residual based tests for no cointegration with high frequency data.
Finally, Table 11 is concerned with a general model featuring all the aforementioned high frequency features. Mostly, the idiosyncratic effects add to each other, although the sizes of the concurrent methods are somehow not as badly impacted as in the pure non constant market volatility case.

Empirical examples
We illustrate our methodology by studying two empirical examples where in particular the modified tests results deviate from that of standard tests. The two pairs of stocks considered are Action Construction Equipment Limited (ACE) -Alexion Pharmaceuticals (ALXN) and CMS Energy Corporation (CMS) -Eversource Energy (ES), all of which traded on the S&P500. 2 In line with our numerical study, we consider a two-year-long period, i.e. 2012-2013, and subsample with frequency ranging from ten minutes to two days to conduct the tests. Table 12 reports the tests results. The corresponding signature plot of estimated autocorrelation of the residuals' level can be found on Figure 2. The modified DF rejects the null of no cointegration at the highest frequencies, with estimated autocorrelation level around 0.90. On the contrary, the concurrent tests do not reject the null hypothesis. This is an echo of the results available on Table 11 in the case ρ = 0.9. It seems that there is cointegration, and that due to price jumps, the alternative tests do not reject the null hypothesis. As in the numerical study, the tests results related to the modified DF are unstable when the subsample frequency is higher or equal to two hours. The signature plot in Figure  2 is also a replica of that in Figure 4 related to the case ρ = 0.9, and corroborates the aforementioned analysis. Table 13 reports the tests results. The related signature plot of estimated autocorrelation of the residuals' level is available on Figure 3. This case is reverse from the previous case. The modified DF does not reject the null of no cointegration at the highest frequencies, while the concurrent tests do reject the null hypothesis. For this particular pair of stocks, results are to be compared with size results in Table 6. It seems that we should trust modified DF, which indicates no cointegration, whereas the concurrent tests are altered due to time-varying market volatility. Here again the test results related to modified DF are unstable when subsampling with lower frequencies.

Final remarks
We have explored the challenges posed by the use of cointegration methods along with high frequency data. In terms of theoretical contribution, we have adapted the problem to the in-fill asymptotics case. We have provided a modified OLS to estimate cointegration relations when there is cointegration, together with its related central limit theory. We have also developed a (non ergodic) time-varying volatility and price-jump robust DF estimator, along with its limit theory.
In terms of applied contribution, we have seen in finite sample that some of the residual based concurrent methods to test for no cointegration are not sufficient when the model accommodates high frequency features, whereas our modified DF showed adequate size and reasonable power. Two empirical examples corroborated the fact that modified DF and standard tests can disagree in practice.

Notation
For the sake of clarity, most quantities (T , ∆, C i ,. . . ) introduced in the main body of the paper and which depend on n are explicitly indexed by n (T n , ∆ n , C i,n ,. . . ) to avoid confusion. In addition to (2.2)-(2.4), we also often introduce for a process A and i ∈ {1, . . . , n}, t ∈ [0, T n ] the notation When ρ < 1, (2.8) defines only for the discrete times t 0 , . . . , t n . For the proofs, it will be more convenient to embed those discrete observations in a process on [0, T n ] as follows. For t ∈ [0, T n ], we let One immediately checks that for i ∈ {1, . . . , n}, i coincides with (2.8). Moreover, since all the estimated quantities are based on the discrete observations only, there is no loss of generality in assuming that is defined as above.
For a càdlàg function f on [0, 1], we write w f (or simply w when there is no room for ambiguity) its associated modulus of continuity as defined in (12.6), p. 122, in [Billingsley, 1999]. We will also often deal with convergence of sequences of càdlàg processes X n from [0, 1] to R k , k ∈ N − {0}. Accordingly, X n → u.c.p X means sup u∈[0,1] X n u − X u → P 0, and X n → d X is the weak convergence with respect to the associated Skorohod topology of the Skorohod space D R k [0, 1] (We also use → d for the convergence in distribution of simple random variables). By Proposition VI.1.17 from [Jacod and Shiryaev, 2003] and Theorem 2.7 from [Billingsley, 1999], note that, if X n → d X and the limit X is continuous, then for any mapping f on D R k [0, 1] which is continuous with respect to the uniform topology, then f (X n ) → d f (X). For instance, if X n → d X and X is continuous, then for any s ∈ [0, 1], X n s → d X s , and also 1 0 X n s ds → d 1 0 X s ds. In the proofs, we will often apply this to several mappings which are clearly continuous for the uniform norm. When we do so, we will simply say "by the continuous mapping theorem. . . " with no further reference to this discussion.
Hereafter, K stands for a positive constant which does not depend on n or any other index but may vary from one line to the next. Finally, for an event E, E c stands for the complementary event.

Estimates and preliminary lemmas
We recall that, under [B], for any q ≤ 2p 0 we have for U ∈ {X c , Z} as a consequence of the Burkholder-Davis-Gundy inequality. We now proceed to derive a few useful estimates for the jump increments.
Moreover, assume r > 0. Then, for q ≥ r, there exists K > 0 which does not depend on i ∈ {1, . . . , n}, such that E sup When r = 0, (8.3) and (8.4) remain true replacing r in the right-hand sides by any positive number arbitrarily close to 0.
Proof. Assume r > 0. The estimate (8.2) is a direct consequence of (2. so that taking the supremum over [0, 1] in u, applying expectation on both sides and applying (8.2) with p = r, s = a∆ ω n yields the claimed result. Now we show (8.4) for U = X. First, note that (8.8) Now we deal with II u . Since we derive separate estimates for and where the last estimate is a consequence of ω < 1 2 − 3 2p 0 ≤ p 0 −1 2p 0 −r by Assumption [C], and where we have applied Hölder inequality at the second step. In view of (8.7), (8.8), (8.11) and (8.12), and using the fact that pq ≥ 1, (8.4) for U = X readily follows. The case U = Y is similar, using that under all the alternatives E|∆ i,uTn | 2p 0 < K∆ p 0 n for any i ∈ {1, . . . , n}. Finally, if r = 0, then note that Condition (2.6) is satisfied for any r > 0 arbitrary close to 0, hence the claimed result. Now we devote the next three lemmas to prove the convergence and the uniform boundedness away from 0 of the local realized volatility used for the deflation. First, we need a technical lemma for the cádlág function σ M .
Lemma 8.2. Let u n ≥ 0 and u n → 0. For i ∈ {1, . . . , n}, define Then, there exists A n ⊂ {1, . . . , n}, such that #A n = o(n) and sup i∈{1,...,n}−An where w is the modulus of continuity of (σ M ) 2 introduced in the notation section and moreover #B η ≤ N η u n ∆ −1 n where N η is the number of jumps of (σ M ) 2 of size at least η. Let a n > 0 such that a n u n → 0 and a n → +∞. Since N η is left continuous in η, if we set η n = sup{η ≥ 0|N η ≥ a n } + 1/n, then N ηn ≤ a n , and it is easy to see that η n must be finite since N η = 0 as soon as η is larger than the greatest jump of (σ M ) 2 , and moreover η n ↓ 0 because otherwise (σ M ) 2 would have an infinite number of jumps of size larger than some η ∞ > 0. Therefore, setting A n = B ηn , we get #A n ≤ a n u n ∆ −1 n = o(n/T n ) = o(n) and sup i∈{1,...,n}−An Now we prove the uniform consistency of C i,n outside of the set A n whose cardinality is negligible with respect to n. The following lemma is a stronger version of Proposition 3.4. Lemma 8.3. Let γ < γ ∈ (0, 1). Let k n = [T γ n ∆ −1 n ] and l n = [T γ n ∆ −1 n ], and finally A n = A n ∪ {1, . . . , 2k n } where A n is as in Lemma 8.2. Then, uniformly in i ∈ {1, . . . , n} − A n Proof. The proof is conducted in two steps.
where we have applied Jensen's inequality at the second step and Lemma 8.2 with u n = T γ−1 n at the last step. Finally, By [B], we immediately deduce that E| A i,kn − (σ M t i −/T ) 2 T γ n ω 11 | 2 ≤ KT 2γ n (T γ n ) = o(T 2γ n ). Combining all those inequalities we get that for any i ∈ {1, . . . , n} − A n for some sequence a n → 0, and we are done.
Next, we prove that C i,n are uniformly bounded from below in probability, which will allow us to greatly simplify the subsequent proofs.
Proof. The proof is conducted in two steps.
Step 1. Let d n = [∆ −2ω n ]. We prove that min j∈{1,1+dn,...,1+[(n−1)/dn]dn} C j,n − min i∈{1,...,n} C i,n → P 0. (8.14) Indeed, note that for 2k n + 1 ≤ j ≤ k ≤ n, and since l n = o(k n ), we immediately deduce that then so that, since d n ≤ k n , we have almost surely for k and j larger than 2k n + 1 max |k−j|≤dn |C k,n − C j,n | ≤ KT −γ n . (8.16) Now, let i n be the random index such that C i,n is minimal. There exists j n of the form 1 + kd n such that |j n − i n | ≤ d n and clearly i n and j n are larger than 2k n + 1. Then (8.14) can be rewritten as min j∈{1,1+dn,...,1+[(n−1)/dn]dn} C j,n − C in,n ≤ C jn,n − C in,n ≤ KT −γ n → P 0, by (8.16) and we are done.
Step 2. In view of Step 1 and since C i,n = +∞ if i ≤ 2k n , we only need to prove that the claimed result holds when the minimum is taken over the subset B n = {1, 1+d n , . . . , 1+[(n−1)/d n ]d n }−{1, . . . , 2k n }. Moreover, letting E n = {inf t≥t kn (σ X t ) 2 > 2c/ min u∈[0,1] (σ M u ) 2 } with c as in the lemma, by [B] we have P(E n ) → 1, so that it is sufficient to prove P min i∈Bn C i,n < c ∩ E n → 0.
We have , and b = RV j,kn,ln − RV j,kn,ln where RV j,kn,ln was defined in (8.13), we get Application of Burkholder-Davis-Gundy inequality and (8.1) yields by Assumption [C]. Moreover, application of (8.4) for p = q = 1 yields for r > 0 . Similar reasoning yields II → 0 for r = 0.
In view of the previous lemma and a standard localization argument, from now on we will always prove the convergences in probability and in distribution assuming that we are on the event {min i∈{1,...,n} C i,n ≥ c}, which is asymptotically of probability 1. This amounts to assuming without loss of generality that C i,n is bounded away from 0 uniformly in i: [H] There exists c > 0 such that for any i ∈ {2k n + 1, . . . , n}, C i,n = (T −γ n RV i,kn,ln ) ∨ c, where x ∨ y is the maximum of x and y.
We now proceed to show that X, Z and , when properly scaled (both in space and time) and seen as càdlàg processes on [0, 1], converge in distribution. Let X c,def,n , Z def,n , and be the processes such that for u ∈ [0, 1], we have Moreover, let def,n be the process such that where for a process V on [0, 1], we write ∆V t i Tn ,u := V t i Tn ∧u − V t i−1 Tn ∧u . We also naturally define the scaled process Y c,def,n as T (X) def,n n,u (8.21) and We first prove in the following Lemma that T (X) def,n (resp. T (Y ) def,n ) is well approximated by its continuous counterpart X c,def,n (resp. Y c,def,n ).
Proof. We prove the convergence for X and r > 0. First, recall that C −1/2 i,n < c −1/2 < +∞ so that and applying (8.4) with p = q = 1 yields E sup by Assumption [C]. The convergence for Y and for the case r = 0 can be proven the same way.
We now show that under the local alternative H n,β 1 , the process n u u∈[0,1] converges in distribution toward a limit which depends on the matrix and on B = LW , where we recall that W is a two dimensional standard Brownian motion on [0, 1], and L was defined in (3.6).
Lemma 8.6. Under H n,β 1 , we have the convergence in distribution with respect to the Skorokhod topol- .
Proof. We first prove the functional convergence in distribution for (U n 1 , U n 2 , G n ), where G n is the process such that for u ∈ [0, 1], G n u = ρ −un n n u = T −1/2 n Z 0 + n j=1 ρ n t j Tn n ∆Z j,uTn , which is a martingale.
We apply Corollary VIII.3.24 p. 476 from [Jacod and Shiryaev, 2003] to the continuous martingale U n . We need to prove for any u ∈ [0, 1] We first prove (8.23). Note that since σ M t−/Tn = σ M t/Tn Lebesgue almost everywhere, and where C .,n is the càdlàg piecewise constant process on [0, 1] such that C t i /Tn,n = C i,n . Now, we have for u ∈ [0, 1]: because on the one hand, as #A n = o(n), we have by application of (8.1), and on the other hand, using Cauchy-Schwarz inequality along with Lemma 8.3, we have Finally, by [B], ω −1 11 T −1 n uTn 0 (σ X s ) 2 ds → P u, and this proves (8.23) for [U n 1 , U n 1 ]. The other components are proved similarly. Now we prove (8.24). Introducingρ n = e −β/n , we have for some constant L > 0 and for n large enough, and for any u ∈ [0, 1], |ρ 2un n −ρ 2un n | ≤ L/n. Moreover, recall that G n u = ρ −un This yields where we have used that the above integrals are all o(T −1/2 n ) except for a number of indices which is negligible with respect to [ √ uT n ], by the same argument as for Lemma 8.2 along with the fact that e 2β. (σ M ) 2 is càdlàg . Moreover, by a Riemann sum argument we also have by the triangle inequality and the ergodicity assumption [B], since T n → +∞, which concludes the proof of (8.24). (8.25) and (8.26) are proved similarly. This gives the functional convergence in distribution for (U n 1 , U n 2 , G n ). Finally, since for any u ∈ [0, 1], U n u,3 = ρ un n G n u and ρ un n → u.c.p e −βu , we get by the continuous mapping theorem the convergence in distribution of U n toward the desired limit. Finally, under H 1 , U n 1 and U n 2 have the same distribution as under H n,β 1 so that the convergence of the subcomponent (U n 1 , U n 2 ) remains true.
Finally, we prove that several Riemann sums are uniformly convergent.
Lemma 8.7. We have (8.27) and by Lemma 8.3 (separating the cases where i ∈ A n and i / ∈ A n ), since outside A n where we have used that C i,n ≥ c > 0. Moreover, we also have E sup separating again the cases i ∈ A n and i / ∈ A n and applying Lemma 8.2 with u n = ∆ n along with the fact that under H n,β n → 0, and we are done. Finally (8.30) is proved similarly.
8.3 Proof of Theorem 4.2 and Theorem 4.5 We first derive the limit distribution of def,n under any local alternative H n,β 1 .
Lemma 8.8. Under H n,β 1 , Jointly with U n , we have the convergence in distribution Proof. Simple algebraic manipulations give which in turn, combined with Lemma 8.7 and another application of the continuous mapping theorem yields and where for a process (V u ) u∈[0,1] , V = 1 0 V u du, and all the above convergences being implicitly obtained jointly with U n . Let us now define the scaled estimated residual process r n as the càdlàg process such that for u ∈ [0, 1], where λ = (r ∞ / 1 − r 2 ∞ , 1) T and r ∞ = ω 12 / √ ω 11 ω 22 , and easy calculation gives the relation F (β) = LH(β) and moreover, letting so that the limit of r n is actually equal to Therefore, letting Q(β) = κ(β) T H(β), we deduce from the above results and the continuous mapping theorem that
Finally we prove Theorem 4.5.
Proof of Theorem 4.5. Since for ∆ n → 0 the increments of the drift term are negligible with respect to the increments of the Brownian integral, we immediately deduce that the lemmas 8.1, 8.3, and 8.4 remain true. Next, we replace C −1/2 i,n ∆X i,uTn 1 A i,n,u and C −1/2 i,n ∆Y i,uTn 1 A i,n,u in definitions (8.21)-(8.22) by C −1/2 i,n (∆X i,uTn 1 A i,n,u − ∆ −1 n ∆ i,n,u D(X)) and C −1/2 i,n (∆Y i,uTn 1 A i,n,u − ∆ −1 n ∆ i,n,u D(Y )) respectively, where ∆ n,i,u = (t i ∧ uT n − t i−1 ∧ uT n ). Similarly, defining for discrete observations V i , i ∈ {1, . . . , n} (8.17) and (8.18). A straightforward application of Lemma 8.1 as in the previous proofs shows that with these new definitions, Lemma 8.5 and Lemma 8.7 remain also true. Moreover, by the continuous mapping theorem and Lemma 8.6 in the case without drift, we have jointly with T

Proofs of Theorem 3.5 and Proposition 4.3
We begin this section with a technical lemma. Let = I + II.
The limit for I and II is derived in two steps.
Step 1. Let us define for all i ∈ {1, . . . , n} with the convention +∞ × 0 = 0 in the last expression, so that if i ≤ 2k n we have def,n ρ k−1−j ( C k,n + C j,n )(C k,n − C j,n ).  (2) j,n | 2 ≤ Kρ ln . Now, note that since C k,n is F j−1 measurable for any k ∈ {j, . . . , j + l n }, we get that i j=2kn+1 δ i,(1) j,n ∆ Z def,n t j Tn is a sum of martingale increments.
Moreover, since we have |x ∨ c − y ∨ c| ≤ |x − y| for any x, y ∈ R, then for k ≥ j ≥ 2k n + 1, recall that ρ k−1−j ( C k,n + C j,n )(C k,n − C j,n ) where, in the above calculation we have used Jensen's inequality at the second step, along with (8.45), and we have used at the third step Cauchy-Schwarz inequality along with (8.1) and the fact that p 0 ≥ 8.
The first term can be treated following exactly the same path as for II − II, multiplying δ i,(1) j,n and δ i,(2) j,n by X c,def,n t j−1 Tn which is L 2p 0 bounded and does not affect the estimates. As for the second term, using (8.46) we have E n j=1 ∆ X c,def,n t j Tn ( def,n (1 − ρ n−j )∆ X c,def,n t j Tn ∆ Z def,n t j Tn + n j=1 n k>j ρ k−j (1 − ρ n−k )∆ X c,def,n t k Tn ∆ Z def,n t j Tn = A 1 + A 2 .
Note that by Jensen's inequality E   n j=1 ρ n−j ∆ X c,def,n t j Tn ∆ Z def,n t j Tn   2 ≤ 1 − ρ n 1 − ρ n j=1 ρ n−j E(∆ X c,def,n t j Tn ∆ Z def,n t j Tn and we have also n j=1 ∆ X c,def,n t j Tn ∆ Z def,n t j Tn → P ω −1 11 ω 12 , so that this proves that A 1 → P ω −1 11 ω 12 . Now, remark that A 2 can be represented as the sum of martingale increments A 2 = n k=1 m k,n ∆ X c,def,n All we need is to show a joint central limit theorem for the extended process ( X c,def,n , Z def,n , V n ) where V n is defined as which is a consequence of Lemma 8.6 along with Theorem 2.2 in [Kurtz and Protter, 1991], with δ = ∞ and Condition C2.2(i) being satisfied for any localizing sequence. We have thus with respect to the Skorohod topology of D R 3 [0, 1] the convergence ( X c,def,n , Z def,n , V n ) → d (ω −1/2 11 B 1 , ω −1/2 11 B 2 , ω −1

11
. 0 B 1 s dB 2 s ), which implies by the continuous mapping theorem along with Slutsky's Lemma the convergence (8.47). Finally, combined with the fact that n −1 n j=1 X c,def,n t j Tn = 1 0 X c,def,n u du + o P (1), the continuous mapping theorem, and Step 1 of this proof, the convergence of g n readily follows.
We are now ready to prove Theorem 3.5.
Proof of Theorem 3.5 and Remark 3.6. We first prove the consistency of the OLS estimator under which, combined with the consistency of v , ρ, and r ∞ along with Slutsky's Lemma and the continuous mapping theorem yields the claimed result.