Truncated sum-of-squares estimation of fractional time series models with generalized power law trend

Abstract: We consider truncated (or conditional) sum-of-squares estimation of a parametric fractional time series model with an additive deterministic structure. The latter consists of both a drift term and a generalized power law trend. The memory parameter of the stochastic component and the power parameter of the deterministic trend component are both considered unknown real numbers to be estimated and belonging to arbitrarily large compact sets. Thus, our model captures different forms of nonstationarity and noninvertibility as well as a very flexible deterministic specification. As in related settings, the proof of consistency (which is a prerequisite for proving asymptotic normality) is challenging due to non-uniform convergence of the objective function over a large admissible parameter space and due to the competition between stochastic and deterministic components. As expected, parameter estimates related to the deterministic component are shown to be consistent and asymptotically normal only for parts of the parameter space depending on the relative strength of the stochastic and deterministic components. In contrast, we establish consistency and asymptotic normality of parameter estimates related to the stochastic component for the entire parameter space. Furthermore, the asymptotic distribution of the latter estimates is unaffected by the presence of the deterministic component, even when this is not consistently estimable. We also include Monte Carlo simulations to illustrate our results.


Introduction
A common approach to time series modeling is to assume an additive structure, where the observed process is a sum of latent stochastic and deterministic components. Regarding the former, the autoregressive moving average (ARMA) class, possibly including unit root nonstationary and noninvertible processes, is dominant. A general model that includes these processes as special cases, and also bridges the gap between stationary and invertible ARMA models and the unit root nonstationary or noninvertible models, is the fractionally integrated time series model. Specifically, a zero-mean fractional model for z t is given by where ε t is a zero-mean and serially uncorrelated sequence, L is the lag operator, and, for any process ξ t , Δ ζ + ξ t = Δ ζ ξ t I(t ≥ 1) = t−1 i=0 π i (−ζ) ξ t−i I(t ≥ 1) with I(·) denoting the indicator function, π i (v) = 0 for i < 0, π 0 (v) = 1, and denoting the coefficients in the binomial expansion of (1 − z) −v . The parameter δ is the "memory" of the process, which satisfies δ = 0 for stationary and invertible ARMA models, δ = 1 for unit root nonstationary models, and δ = −1 for unit root noninvertible models. The process z t generated by (1) has been termed by the literature as a Type II fractionally integrated process of order δ. The Type II specification (1) assumes that the process is initialized at t = 1, but at the cost of more complicated proofs we conjecture that this could be generalized to any initialization under suitable conditions on the initial values. Johansen and Nielsen (2010, 2012a, 2016 developed maximum likelihood-based inference theory for fractional processes under more general assumptions on the initialization, where the inference is conditional on a finite number of initial values. Of course, finite sample behavior may depend on the initialization, as investigated by Johansen and Nielsen (2016) using higher-order asymptotic expansions and numerical methods. To avoid further complications of the theory, we maintain the simpler Type II initialization in (1).
The function ω in (2) characterizes parametrically the short memory dependence present in u t , and hence in z t . It is given as where ϕ is an unknown p × 1 vector collecting the short-memory parameters. For example, we could have where α AR and α MA are polynomials of orders p 1 and p 2 , respectively, with no common zeros and all roots outside the unit circle. Then (1), (2), (5) constitutes the fractionally integrated ARMA, or FARIMA(p 1 , δ, p 2 ), model, where ϕ collects the AR and MA parameters. However, we will maintain the more general short-memory specification (4) instead of (5), so that, under our conditions, z t will not be restricted to belong to the FARIMA class. More precise conditions will be imposed on ω below. An important implication of model (1), (2) is that E (z t ) = 0, so therefore, even if this setting has been employed in theoretical work (e.g., Hualde and Robinson, 2011;Nielsen, 2015), it has limited empirical relevance. In practice, it would be natural to extend (1), (2) to allow for some deterministic structure such as, for example, a non-zero mean or drift. A very simple possibility would be to consider the observable process x t generated as which implies that so x t is a fractionally integrated process with drift if μ = 0. This is a particular case of previous proposals in the literature, like Robinson (1994Robinson ( , 2005 or Robinson and Iacone (2005). An alternative specification for the deterministic component could be see also Robinson (1994). By Stirling's approximation, π t−1 (1 + c) behaves like t c , apart from a constant factor, and π t−1 (1+c) is therefore denoted a generalized power law trend of order c or a generalized polynomial trend of order c. Thus, the presence of a non-zero β in (8) generates a deterministic trend component for x t . Several ideas arise from the previous discussion. First, the specification in (8) shows that, when dealing with fractional time series, the fractional coefficients π t−1 appear to be a more natural and elegant representation of deterministic terms compared with the usual powers of t. Of course, for the commonly applied cases c = 0, 1, π t−1 (1 + c) equals t c for t ≥ 1. More generally, considering π t−1 appears to be a natural approach to introducing a deterministic term that complements the Type II process z t , due to the properties Δ d + π t−1 (c) = π t−1 (c − d) and t−1 j=0 π j (c) = π t−1 (1 + c) .
Second, in view of (1), restricting the fractional order of the stochastic and deterministic components to be identical in (8) seems arbitrary. This motivates the more general model where β and γ are both unknown real-valued parameters and z t is generated by (1), (2), so, in particular, it can be either short or long memory. Under the model (1), (2), (10), the fractional order of the stochastic term is δ and that of the deterministic term is γ, thus allowing these orders to be modeled by two different parameters. In related work, Hualde and Nielsen (2020) analyze the model where t γ−1 + = t γ−1 I(t ≥ 1) (strictly speaking, they considered the power t γ + , but this makes no difference with the obvious re-labeling). Of course, since π t−1 (γ) = t γ−1 I(t ≥ 1) for γ = 1, 2, (10) and (11) are identical specifications for γ = 1, 2. Model (11) is a particular case of those of Robinson (2005) or Robinson and Iacone (2005), where more deterministic terms are included, but the power law parameters were assumed to be known. Also, (11) is embedded in Robinson's (2012) spatial model, where the power law parameters were unknown, but which required a short memory stochastic component. As will be seen below, apart from being more natural in a fractional setting, the specification (10) offers several crucial advantages over (11). Therefore, (11) is more accurately described as an approximation to (10) in a fractional setting with γ = 1, 2. Third, the model (10) (or (11)) is, undoubtedly, quite restrictive. For example, it cannot accommodate the commonly applied constant plus linear trend specification. To illustrate the restrictive nature of (10) or (11), note that for both models x 1 = β + z 1 , which implies that the β parameter is intimately linked to the initial observation. Thus, in practice, a large value of x 1 will lead to a large value of the estimated β, even in cases where the true slope of the deterministic component is small. This discussion just indicates that models (10) or (11) might be misspecified in most cases due to the omission of a drift component that determines the level of the observable time series. Thus, inspired by (7) and (10), we consider instead of (10) the more general structure where for identification γ = 1. The deterministic specification in (12) covers the standard case of constant plus linear trend (when γ = 2). However, since γ is allowed to take any real value, (12) characterizes a wide range of deterministic behaviors. Compared to (10) (or (11)), the generalization in (12) appears to be particularly relevant for γ < 1, because in this case the deterministic structure would approach smoothly the drift μ as time increases. One interpretation of this case, which appears to be both realistic and coherent with e.g. economic time series, is that of a series moving around a deterministic structure that approaches an equilibrium value (given by μ). Furthermore, the parameter μ is the so-called "level parameter" in the terminology of Johansen and Nielsen (2016), the inclusion of which they argue can alleviate bias issues arising from non-zero initial conditions in the Type II context. Fourth, model (12) is closed under fractional differencing in the sense that, for any d ∈ R, has the exact same structure as (12). This property is an important advantage, because, as will be seen below, our proposed estimators depend crucially on fractional differences of the observables, and (13) will simplify the estimation procedure by avoiding the presence of "approximation errors" which routinely appear when taking fractional differences of polynomials of t. It is desirable, both from a mathematical and from a practical point of view, that fractional differences of a process belonging to a class of fractional time series belong to the same class of fractional time series. In other words, it would seem strange if fractional differences of a fractional time series process generated processes that were outside the class.
Unlike (10) and (12), the alternative model (11) given with t γ−1 + instead of π t−1 (γ) is not closed under fractional differencing. Thus, modeling trends by means of π t−1 seems both more natural and more elegant than a power of t, and leads to several advantages, which are even more relevant because of the extra flexibility introduced by the drift term μ. In particular, we note that the properties (9) are only shared approximately by t c + , in the sense that Δ d + t c + is only asymptotically equal to a constant times t c−d + , and only for some values of c and d. From a technical viewpoint, the "closedness" leads to simpler and more elegant proof arguments. From a practical viewpoint, for the model (11) discussed in Hualde and Nielsen (2020), only the case γ > 0 could be considered. Nicely, considering generalized trends (π t−1 ) instead of powers of t permits consideration of any value of γ in an arbitrarily large compact set.
In our model (1), (2), (12), we allow both the stochastic and deterministic components to be of a fractional order, thereby placing them on an equal footing. Specifically, δ (which we permit to lie in an arbitrarily large compact interval) characterizes the behavior of V ar (x t ) and Cov (x t , x t−j ) and γ characterizes the behaviour of E (x t ). Thus, borrowing White and Granger's (2011) terminology, when δ > 1/2, x t has an increasing "stochastic trend in variance" because V ar (x t ) grows at rate t 2δ−1 . Similarly, x t has a "stochastic trend in mean" because E (x t ) grows at rate t γ−1 . In this sense, letting γ be real-valued appears as natural as letting δ be real-valued. Moreover, in the context of fractional models, letting γ be real-valued seems to be a natural alternative to the more standard linear trend model, which might suffer from severe specification problems with important implications for inference.
We note that if γ were known in (12), the estimation problem is simplified greatly. This case has been studied by Robinson (2005) who considered Mestimation of a model like (11) (although involving more deterministic terms) with known γ and allowing for fractional z t . Other works that have considered a similar problem to ours, but assuming that z t is at most a weakly dependent process, include Wu (1981), Phillips (2007), and Robinson (2012). The latter analyzes a more general spatial setting with more deterministic terms, but where the weakly dependent stochastic component is dominated by the deterministic structure. Related to the implications of the relative strength of deterministic and stochastic components for estimation, Johansen and Nielsen (2016) proved consistency and asymptotic normality of a truncated/conditional sum-of-squares estimator that ignores the deterministic components, whenever this component is dominated by the stochastic one. Other contributions that include power law trends, although in slightly different contexts, are Robinson and Marinucci (2000) and Robinson and Iacone (2005). Finally, as mentioned above, Hualde and Nielsen (2020) analyze the model given by (1), (2), (11), using the approximation given by the simple power t γ−1 + instead of the fractional coefficients π t−1 (γ). Their work has two very important limitations with respect to our contribution in the present paper. First, they have no drift term, which, as justified before, limits substantially the empirical relevance of their model. Second, as a consequence of applying the approximation t γ−1 + , they require a strong condition on the power law parameter, namely γ > 0, so cases where the deterministic structure tends very quickly to an equilibrium value (known to be 0 in their model) are not covered.
In the present paper we derive the limiting properties of a truncated (or conditional) sum-of-squares estimator of parameters in model (1), (2), (12). As in Hualde and Nielsen (2020), our setting is substantially more involved than in the related analysis of Robinson (2011) (or Nielsen, 2015), which discussed parametric estimation in model (12) taking μ = β = 0 as known. This avoids the huge complication of dealing with the competition between stochastic and deterministic components, especially related to the proof of consistency (prerequisite for proving limiting normality), which is very delicate because the loss function does not converge uniformly over a large admissible parameter space, even when knowledge of μ = β = 0 is imposed. Additionally, the presence of a second deterministic term (the drift) over the simpler setting of Hualde and Nielsen (2020) introduces very substantial technical challenges.
As in Hualde and Nielsen (2020), we establish the limiting properties of our estimators, noting that results depend on the relative strength of the deterministic and stochastic components. We distinguish different cases depending on the true values of the parameters, which are denoted by subscript zero. When γ 0 − 1/2 > δ 0 and δ 0 < 1/2, we find that the estimators of all parameters are consistent and asymptotically normal. Next, when γ 0 − 1/2 > δ 0 but δ 0 > 1/2, consistency and asymptotic normality hold for all estimators except that of μ 0 . Alternatively, when γ 0 − 1/2 < δ 0 , only the estimators of the parameters related to the stochastic component (δ 0 , ϕ 0 ) are consistent and asymptotically normal. In this case, the joint limiting distribution of the estimators of δ 0 and ϕ 0 is unaffected by the presence of deterministic components that cannot be consistently estimated; a phenomenon that has been noted previously in, e.g., Heyde and Dai (1996), Abadir, Distaso and Giraitis (2007), Iacone (2010), and Hualde and Nielsen (2020). If, in this case, δ 0 < 1/2, we also provide a convergence rate for the estimator of μ 0 .
The rest of the paper is organized as follows. First, in Section 2 we discuss the estimation problem in model (12) and compare with several alternatives. In Section 3 we present the main theoretical results of the paper. Next, a Monte Carlo experiment of finite sample performance is presented in Section 4, and we give some concluding remarks in Section 5. Finally, Section 6 collects the proofs of our main results, while all lemmas are stated in Sections 7 and 8. Proofs of the lemmas can be found in the working paper version (Hualde and Nielsen, 2022).
To illustrate the estimation problem, suppose for now that ω(L; ϕ) = 1, and hence u t = ε t , and ignore ϕ, so that the parameters to be estimated are δ, γ, μ, β. Hualde and Nielsen's (2020) proof methods and results make evident the significant technical difficulties of enlarging their relatively restrictive model (11) with additional deterministic terms to capture a richer structure (e.g., including a drift). Thus, an interesting and tempting proposal is the "differencing and adding back" procedure, where the idea is to eliminate the drift by differencing to simplify the estimation problem. A similar idea has been considered for estimation of nonstationary processes by Velasco (1999a,b) and Chen and Hurvich (2003) in combination with tapering of the periodogram in frequency domain methods. Specifically, consider y t = Δ + x t in (12); that is so that the observable process is y t = β 0 π t−1 (γ 0 − 1) + Δ 1−δ0 + ε t for t ≥ 2 and y 1 = μ 0 + β 0 + ε 1 . We notice that μ 0 only affects one observation, y 1 , so an apparently sensible approach could be to "forget" about this effect and act as if the influence of μ 0 had been completely removed by differencing. In this setting, we will discuss several variations of differencing and adding back, and argue why these all fail for this model.
Supposed we observed Then we could consider the loss function and derive the corresponding estimators. This approach would eliminate the drift parameter μ, and thus simplify the estimation problem. In particular, it would be relatively similar to that in Hualde and Nielsen (2020), with the only relevant difference of dealing with π t−1 (γ − δ) instead of Δ δ + t γ−1 in the loss function. However, in practice we observe x t , or equivalently y t , for t = 1, . . . , T , so that Q T is infeasible because y 1 is unobserved. Specifically, y t = y t for t ≥ 2 but y 1 = β 0 + ε 1 = μ 0 + β 0 + ε 1 = y 1 if μ 0 = 0. We now discuss three feasible alternatives. First, inspired by Q T in (15), we could ignore the presence of μ 0 in the single observation y 1 and set the loss function as We note from (14) that Δ δ−1 and the presence of the additional term μ 0 π t−1 (1 − δ) in Q 1T (compared with Q T ) is undesirable. For example, evaluated at the true values we find where the contribution of the term μ 0 π t−1 (1 − δ 0 ) is non-negligible, and in fact is dominant for δ 0 < 0 if μ 0 = 0. Second, to eliminate the influence of the first observation, suppose we force a zero initial condition and consider y * 1 = 0 and y * t = Δx t = β 0 π t−1 (γ 0 − 1) + Δ 1−δ0 + ε t for t ≥ 2.
We could then use the observed values y * t , t = 1, . . . , T , in the estimation; that is, work with the loss function Comparing again with the infeasible Q T , we find that where the additional term (β 0 +ε 1 )π t−1 (1−δ) in Q 2T causes difficulties similar to the additional term in Q 1T . Furthermore, simply omitting the first observation and defining Q 2T by a summation over t = 2, . . . , T causes identical problems. Third, instead of y t in (14), suppose we consider the forward difference. That is, consider the observable and the corresponding loss function This would appear to eliminate the influence of the first observation and μ 0 . However, because Δ δ−1 (1−δ)ε 1 , the additional term π t (1 − δ)ε 1 again causes difficulties similar to the additional terms in Q 1T and Q 2T . Unreported Monte Carlo simulations confirm that estimators based on Q iT , i = 1, 2, 3, are inconsistent for δ 0 < 0.
The above discussion makes it clear that the "differencing and adding back" procedure cannot be used to simplify the estimation problem in our context. We therefore focus on the observed x t for t = 1, . . . , T . Motivated by the Gaussian log-likelihood and dealing with the general specification for ω given in (4), we consider the sum-of-squares loss function Here we have defined ρ(s; ϕ) = ω −1 (s; ϕ), see the discussion following Assumption A1, ξ t (d) = Δ d + ξ t for an arbitrary process ξ t , and with the convolution coefficient Clearly, for a given ϑ, we can concentrate the loss function (16) with respect to φ, to obtain the concentrated loss function Thus, letting the parameter space for ϑ be denoted Ξ, which will be fully specified in Assumption A3 below, we propose the estimator along with φ = φ( ϑ) and σ 2 = L T ( ϑ, φ).
As proposed by Hualde and Robinson (2011), we call (19) the truncated sumof-squares estimator, though it is also often called the nonlinear least squares or conditional sum-of-squares estimator. Because it is based on the Gaussian likelihood, it is expected to be asymptotically efficient under Gaussianity. For nonfractional ARMA models with a known integer-valued memory parameter, (19) was advocated by, e.g., Box and Jenkins (1971). For fractional time series, the estimator was suggested by Li and McLeod (1986) in stationary FARIMA models. The first rigorous asymptotic analyses of (19) with the memory parameter lying in an arbitrarily large compact interval were given by Robinson (2011) andNielsen (2015).

Main results
We first provide the assumptions needed for consistency of the estimator. Our conditions for the asymptotic analysis are nearly identical to those in Hualde and Robinson (2011) and Hualde and Nielsen (2020), with the only difference being in Assumption A3 below.
For a detailed discussion of A1 and A2 we refer to Hualde and Robinson (2011) and Hualde and Nielsen (2020). Note that writing ω −1 (s; ϕ) = ρ(s; ϕ) = ∞ j=0 ρ j (ϕ)s j , Assumption A1 implies that ρ 0 (ϕ) = 1 for all ϕ and sup where ς > 1/2 is given in A1(ii). A1 is easily satisfied for stationary and invertible ARMA models and for the exponential spectrum model of Bloomfield (1973). It is very similar to other conditions employed in asymptotic theory for the estimate τ , see Hualde and Robinson (2011) and Nielsen (2015), as well as Whittle estimators that restrict to stationarity, e.g. Fox and Taqqu (1986), Dahlhaus (1989), and Giraitis and Surgailis (1990). Assumption A1 can be readily verified because ω is a known parametric function. In fact ω satisfying A1 are invariably employed by practitioners. Assumption A2 does not impose independence or identical distribution of ε t , but requires conditional homoskedasticity. It is standard in the time series literature since Hannan (1973), although it may be quite strong for some empirical applications. We conjecture that this assumption could be relaxed to allow for both conditional and unconditional heteroskedasticiy following recent work by Cavaliere, Nielsen and Taylor (2015, 2017. This would require replacing A2 by more complicated summability conditions on the cumulants of ε t and would substantially complicate our proofs. Consequently, this seems beyond the scope of this paper. Assumption A3 is very similar to A3 in Hualde and Nielsen (2020), but with two important differences. First, due to the inclusion of the extra drift term, we need γ 0 = 1 to guarantee identification of μ 0 and β 0 (see also Section 4.2). This condition, along with the need of dealing with compact parameter spaces in the consistency proof, leads to setting the parameter space for γ as A similar requirement is imposed in the related setting of Robinson (2012). Second, translated to our notation, Hualde and Nielsen (2020) impose the condition that 1 > 0, which implies that just cases where γ 0 > 0 can be considered. This is due to the approximate nature of their model. Specifically, this additional condition helps to guarantee that their model is approximately closed under fractional differencing, i.e. to obtain that Δ d + t c + is approximately equal to a constant times t c−d + . In contrast, apart from the exclusion of the arbitrarily small open interval (1 − κ, 1 + κ), we permit γ 0 to lie in an arbitrarily large set. Also, similarly to Hualde and Nielsen (2020), the parameter space for φ is basically unrestricted, although we need the condition β 0 = 0 to guarantee the identification of γ 0 whenever this parameter can be consistently estimated.
As will be seen, when δ 0 is large, the stochastic signal dominates the deterministic trend. In particular, whenever δ 0 > γ 0 − 1/2, γ 0 and β 0 cannot be consistently estimated, and if δ 0 > 1/2 this problem also affects the estimation of the drift μ 0 . On the other hand, when δ 0 < γ 0 − 1/2 and/or δ 0 < 1/2, at least part of the deterministic structure can be consistently estimated. Interestingly, for small values of δ 0 even very small and vanishing generalized trends (with small or negative γ 0 ) can be consistently estimated. In this sense, the value of δ 0 helps the identification of the deterministic trend.
Because φ = ( μ, β) has an explicit form, a prior consistency proof is not required, and therefore φ is not included in Theorem 1. When applicable, we present directly the asymptotic distribution of φ in Theorem 2(i) below. Specifically, it follows straightforwardly from Theorem 2 that μ is consistent if δ 0 < 1/2 and β is consistent when γ 0 − 1/2 > δ 0 .
Theorem 1(i) shows consistency of ϑ when γ 0 − 1/2 > δ 0 , which we refer to as the strong deterministic trend case. On the other hand, Theorem 1(ii) shows consistency only of τ when γ 0 − 1/2 < δ 0 , which is the weak deterministic trend case. In the latter case, γ 0 (and β 0 ) cannot possibly be consistently estimated. This is easily seen by considering for example δ 0 = 1 (a random walk) in which case the parameters γ 0 and β 0 cannot be consistently estimated when γ 0 < 3/2 because the deterministic signal is drowned by the stochastic noise. This generalizes the well-known result that a level (γ 0 = 1) cannot be estimated consistently for a random walk, whereas a linear trend (γ 0 = 2) can be consistently estimated. Similarly, we note that, as will be discussed in the context of Theorem 2(i) below, the drift parameter μ 0 can be consistently estimated whenever δ 0 < 1/2, i.e. when z t is (asymptotically) stationary.
More generally, Theorem 1(ii) shows that, even in cases where the deterministic signal is not strong enough for consistent estimation of the deterministic structure, the parameter characterizing the stochastic component, τ 0 , can nonetheless still be consistently estimated.
The proof of Theorem 1 is very challenging due to the non-uniform behaviour of the loss function over a large admissible parameter set. In a much simpler setting with absence of any deterministic component, this problem was acknowledged and solved by Hualde and Robinson (2011) and Nielsen (2015), where the difficulty arose due to the nonstationary/stationary behaviour of fractional differences of the observed process. Our proof strategy is similar to that in Hualde and Nielsen (2020) in that it takes advantage of the competition between the deterministic and stochastic components, although this is now more challenging because of the arbitrarily large parameter space for γ, and in particular because of the presence of a second deterministic term. Specifically, the latter complicates matters substantially due to the difficulties outlined in Section 2 and because dealing with the relative strengths of stochastic and deterministic terms is now more involved.
For the asymptotic distribution theory we define the convolution coefficient , which is the Fisher information for the parameter τ under Gaussianity; see Dahlhaus (1989) and Hualde and Robinson (2011). Also, we require an additional regularity condition.
Assumption A4 is identical to A4 in Hualde and Nielsen (2020). It is slightly stronger than A3 in Hualde and Robinson (2011). This strengthening seems necessary to obtain the bounds where ς > 1/2 is given in A4(ii) and ϕ i denotes the i-th element of ϕ; see Hualde and Nielsen (2020) for details. Again A4 is easily satisfied for ARMA models or the Bloomfield (1973) spectral model. For the latter model, the analytical formulas for A, and hence for the asymptotic variance matrix, simplify neatly; see Robinson (1994). In practical implementations, though, numerical derivatives of the objective function will typically be used.
To describe the asymptotic distribution, we introduce additional notation. Let I q and 0 q denote the q-dimensional identity matrix and a q-vector of zeros, respectively, and define where Theorem 2 Let x t be generated by (1), (2), and (12) with true values ϑ 0 and φ 0 , and let Assumptions A1-A4 hold. Then, as T → ∞: where N β is a random variable distributed as If, in addition, δ 0 < 1/2, then for any > 0, The asymptotic distribution results in Theorem 2 are divided into two main cases. In Theorem 2(i) we first present the result for the strong deterministic trend case, γ 0 − 1/2 > δ 0 . Here, the deterministic signal is sufficiently strong, relative to that of the stochastic component, that we can prove joint asymptotic normality for the estimators ϑ and β. Within this case, we distinguish between δ 0 < 1/2, where we can also include μ, and δ 0 > 1/2 where we cannot include μ. Whether μ can be included or not, ϑ and β retain identical convergence rates.
In Theorem 2(ii) we present the result for the weak deterministic trend case, γ 0 − 1/2 < δ 0 . In this case, we can obtain the asymptotic distribution for the estimator of the stochastic component τ 0 only, but when δ 0 < 1/2 we also prove consistency of μ with rate T 1/2−δ0+ for any arbitrarily small > 0. However, the estimator μ is a complicated function of γ, whose behavior is unknown in this case and indeed is not even consistent; see the discussion after Theorem 1. Thus, deriving an asymptotic distribution result for μ in case (ii) with δ 0 < 1/2 does not seem to be possible.
Theorem 2(ii) shows that, even in cases where the deterministic signal is not strong enough for consistent estimation of γ 0 and β 0 (and possibly also μ 0 ), the estimator of the parameter characterizing the stochastic component, τ , has exactly the same limiting properties as in the strong deterministic trend case in Theorem 2(i.a). That is, the asymptotic distribution result for τ in Theorem 2 is unaffected by the relative strengths of the stochastic and deterministic components. In particular, even when γ 0 , β 0 , and μ 0 cannot be consistently estimated, the asymptotic distribution of τ is unaffected by their presence.
It is noteworthy that the asymptotic distribution of τ is unaffected by the presence of the deterministic component in (12), and τ has the same asymptotic distribution as in, e.g., Theorem 2.2 of Hualde and Robinson (2011). Specifically, the variance A −1 in the asymptotic distribution of τ in (26), (27), and (28) is equal to the inverse Fisher information under Gaussianity; see Dahlhaus (1989). Because the estimate τ is also asymptotically independent of the remaining parameter estimates, it therefore follows that τ is asymptotically efficient under the additional assumption of Gaussianity, and this occurs regardless of the relative strength of the deterministic and stochastic components.
More generally, Theorem 2 makes it possible to conduct inference on the model parameters, with the caveat that the joint asymptotic distribution of ϑ, μ, and β given in (26) as well as that of ϑ and β given in (27) are both singular, which makes testing of joint hypotheses on ϑ 0 and β 0 impossible. However, separate inference can straightforwardly be conducted on ϑ 0 and β 0 . For example, it is straightforward given (26) or (27) to construct confidence intervals and/or to test hypotheses such as γ 0 = 2 (deterministic trend in x t is linear) or δ 0 = 1 (stochastic component is of the random walk-type).
To conduct inference on the model parameters using Theorem 2 a consistent estimate of σ 2 0 is needed. To this end, consistency of the estimator σ 2 = L T ( ϑ, φ), see (16), is straightforwardly obtained using the methods in the proofs of Theorems 1 and 2.
Specifically, in view of the above comments, inference on the parameter τ 0 characterizing the stochastic component can be conducted by means of likelihood ratio tests or Wald/t tests. The former do not require estimation of the variance, whereas for the latter a consistent estimate of the variance of τ can be obtained by numerical evaluation of the Hessian matrix. Because the (marginal) asymptotic distribution of τ is the same across the different cases in Theorem 2, and because the Hessian matrix is asymptotically block-diagonal in each case, it is a straightforward consequence of Theorem 2 that such likelihood ratio or Wald tests are asymptotically χ 2 -distributed and t-tests are asymptotically standard normally distributed, regardless of the case. Nonetheless, it may sometimes be of interest to determine which of the cases in Theorem 2 is relevant in a given situation. Of course there are many ways of doing so, and in Section 4.2 we consider a stepwise testing procedure.
We notice from (26) and (27) in Theorem 2(i) that γ is T γ0−δ0−1/2 -consistent whereas β is only T γ0−δ0−1/2 / log T -consistent. In fact, it can be shown that if γ 0 were known, then the least squares regression estimator of β 0 would also be T γ0−δ0−1/2 -consistent. Thus, there is a (small) rate-of-convergence loss from having to estimate the power law parameter γ 0 .
Finally, we also notice from (26) that, in the case where they are both consistently estimable, β and μ are asymptotically independent. This seemingly contradicts Robinson (2012), who considers the weakly dependent case (δ 0 = 0 known) with unknown power law parameters. In contrast, we consider the power law parameter corresponding to β to be unknown (γ 0 ), while the power law parameter corresponding to μ is known (and equal to one). This fundamental difference ensures that, compared with a situation in which the power law parameter corresponding to μ were unknown, μ converges at a rate log T faster, and this in turn guarantees asymptotic independence from the coefficient estimate β. The technical justification for this asymptotic independence result is given in (138).
Results for Monte Carlo bias of δ are presented in Table 1. Here, the performance of δ reflects the asymptotic theory in Theorems 1 and 2. First, for all (δ 0 , γ 0 , β 0 , T ) combinations the bias is negative and it clearly decreases in absolute value as T increases, even for the boundary case γ 0 − δ 0 = 1/2, which is not covered by our theory. Second, when the deterministic signal gets stronger (so γ 0 − δ 0 is higher) results are slightly worse. Third, for fixed γ 0 − δ 0 , results are better for larger δ 0 because then γ 0 is further away from 1, so that the asymptotic multicollinearity problem is less noticeable. Finally, the results in this table are basically unaffected by the value of β 0 .
Monte Carlo SD results for δ are given in Table 2. These again reflect the asymptotic theory and complement nicely the results in Table 1. As expected, the SD results improve as T increases and are largely unaffected by the values of γ 0 , δ 0 , and β 0 .
Next, results for Monte Carlo bias of γ are presented in Table 3. As expected from the asymptotic theory in Theorems 1 and 2, the behaviour of γ is qualitatively different from that of δ. When γ 0 − δ 0 ≤ 1/2, the bias is generally large (in absolute value) and does not decrease as T increases. This is, to some extent, mitigated for β 0 = −5 and β 0 = 10, where the coefficient on the deterministic trend is so large that the theoretically dominant stochastic component appears to be hidden in finite samples. On the other hand, when γ 0 − δ 0 > 1/2 the bias is generally very small and decreases as γ 0 − δ 0 increases, reflecting the fast convergence rates in those cases implied by Theorem 2. This effect is weaker when δ 0 = 0, where γ 0 − δ 0 > 1/2 implies that γ 0 is relatively close to 1, and the asymptotic multicollinearity clearly worsens the bias of the γ estimates.
The Monte Carlo SD results for γ are presented in Table 4. Again, these results are clearly in line with the predictions from asymptotic theory, and are qualitatively different from the SD results for δ in Table 2. For γ 0 − δ 0 ≤ 1/2, the SD is large and does not seem to decrease for larger values of T . The latter is seen regardless of the value of β 0 . On the other hand, for γ 0 − δ 0 > 1/2, Table 1 Monte Carlo bias of δ    the SD clearly decreases as either T or γ 0 − δ 0 increases, although the results are relatively poor for β 0 = 1, δ 0 = 0, due to the asymptotic multicollinearity problem. As was the case with the bias in Table 3, the SD is clearly smaller for β 0 = −5 or β 0 = 10 compared with β 0 = 1, reflecting the fact that the deterministic trend is easier to detect when its coefficient is larger in magnitude.

Testing procedure
Although parameter estimation and inference is straightforward using the results in Theorem 2, and for the parameters of the stochastic component it is not necessary to know which of the cases covered by Theorem 2 applies in any given situation. Nonetheless, it may sometimes be of interest to discern which case applies and, additionally, given that γ 0 = 1 leads to an identification problem, check whether the data support this possibility. Clearly, there are many possible ways of doing so, and we propose here a stepwise testing procedure. We do not pursue a formal analysis, which would involve very lengthy repetitions of techniques already developed in the proofs of Theorems 1 and 2, but instead illustrate the finite sample behavior by means of a small Monte Carlo simulation experiment. Our proposed testing procedure is as follows.
Step 1. Test H 1 0 : δ 0 = 1/2 against H 1 1 : δ 0 < 1/2 in model (12). Rejection of H 1 0 favors the possibility that μ 0 can be consistently estimated. This test is simple to implement because we conjecture that our proposed estimator δ derived from (19) has property (28) even in the case where γ 0 = 1. Assuming this conjecture is true, testing H 1 0 by means of a t-test is immediate. If H 1 0 is rejected, we proceed to Step 2, and if it is not then we proceed to Step 3.
Step 2. Test H 2 0 : β 0 = 0 against H 2 1 : β 0 = 0 in model (12). The result from Step 1 suggests that μ 0 can be consistently estimated, so, noting (12), it is crucial to determine whether γ 0 = 1 or γ 0 = 1. Nicely, the null γ 0 = 1 is equivalent to β 0 = 0 because both conditions lead to models that are observationally equivalent. Testing H 2 0 : β 0 = 0 against H 2 1 : β 0 = 0 is possible although γ 0 is not identified under the null. This is a classical problem in the hypothesis testing literature and several solutions have been provided (e.g. Hansen, 1996). We follow an LM approach. Let ( τ , μ) denote the restricted estimator that imposes H 2 0 , and let γ F ∈ [ 1 , 1−κ]∪[1+κ, 2 ] be a fixed number. Define the LM statistic where σ 2 = L T ( τ , γ F , μ, 0). We conjecture that, as T → ∞, Another possibility is to consider the supremum over γ instead of a fixed γ F , which would imply a nonstandard null limit distribution (e.g. Hansen, 1996). Our small Monte Carlo experiment will rely on (30) for a fixed γ F . Additional, unreported Monte Carlo simulations suggest that results are relatively invariant to the choice of γ F and also that LM (γ F ) only has power if γ 0 − 1/2 > δ 0 . This makes sense because if if γ 0 − 1/2 < δ 0 then the second term on the right-hand side of (12) is drowned by the stochastic component so it would be irrelevant whether β 0 = 0 or β 0 = 0. This is useful because rejection using LM (γ F ) then favors the possibility that the generalized trend can be estimated consistently, whereas non-rejection suggests that the generalized trend is irrelevant (either because β 0 = 0 or because the trend is weak).
Step 3. Test H 3 0 : γ 0 − 1/2 = δ 0 against H 3 1 : γ 0 − 1/2 > δ 0 in model (12) under the restriction that μ = 0. Because Step 1 suggests that μ 0 cannot be consistently estimated, we omit the drift from the analysis and in this step we check whether the generalized trend is strong. The test can be implemented as described in Hualde and Nielsen (2020).
Step 4. The outcomes of the tests in Steps 2 and 3 have the following implications: M1: If H 2 0 is rejected then all model parameters can be estimated by (19) and φ. M2: If H 2 0 is not rejected then we either employ (19) and φ (ignoring the estimates of β 0 and γ 0 ), or we may use a simplified model that imposes β = 0. M3: If H 3 0 is rejected then β 0 , γ 0 , and τ 0 can be consistently estimated. We either employ (19) and φ (ignoring the estimate of μ 0 ), or we may use a simplified model that imposes μ = 0. M4: If H 3 0 is not rejected then only τ 0 can be consistently estimated. We either employ (19) and φ (ignoring the estimates of μ 0 , β 0 , and γ 0 ), or we may use simplified versions of it in which we remove one or both deterministic components (and possibly ignoring estimates of the deterministic structure).
Summarizing, the outcome of the test procedure is one of M1,. . . ,M4. As usual in model specification, we note that wrongly removing a component from the model renders all remaining estimates inconsistent. In the present context, Theorem 2 shows that retaining all deterministic components has no adverse effect asymptotically, in terms of efficiency or otherwise, on the estimation of the parameters associated with the stochastic component. This suggests that a conservative approach of always estimating the full model (12), including all deterministic components, is appropriate in most circumstances, while being careful in the interpretation of the parameters associated with the deterministic components. Nonetheless, we consider next the finite-sample properties of the above testing procedure. Table 5 presents the proportion of cases (out of 10,000 replications) in which the stepwise testing procedure selects the different situations characterized by M1,. . . ,M4. As before, the observable series x t , t = 1, . . . , T , was generated from (12) with u t = ε t being an independent N (0, 1) sequence, and we present results for T = 64, 256, μ 0 = β 0 = 1, δ 0 = 0, 0.4, 0.6, 1, and 9 different values for γ 0 given by γ 0 − δ 0 = 1.6, 1.4, 1.2, . . . , 0.4, 0.0, −0.4. Numbers reported in bold correspond to proportions of correct choices. All tests were implemented with nominal size 0.05 and we fix γ F = 2 in Step 2.  Overall, the behaviour of the testing procedure seems satisfactory. The results generally improve as T increases and correspond to what theory predicts. As expected, the correct identification of a strong trend (when γ 0 − δ 0 > 1/2) is easier for larger values of γ 0 − δ 0 and worsens substantially when γ 0 − δ 0 ≤ 1. This has to do with the relatively low power of Hualde and Nielsen's (2020) onesided LM testing procedure for H 3 0 against H 3 1 when γ 0 − δ 0 is close to 1/2, and our results here are in line with theirs. Additionally, the test of H 1 0 against H 1 1 shows low power when δ 0 = 0.4, which is certainly an adverse situation, though results improve as T increases.

Concluding remarks
We have proposed a parametric time series model that includes both a fractional stochastic component as well as a fractional deterministic component. The stochastic component is a fractionally integrated process driven by a memory parameter, δ, combined with a linear short-memory process. The deterministic component consists of the sum of a constant term and a flexible deterministic trend. The latter is fractional in the sense that it is defined using the same fractional coefficients as the fractional integration operator in the stochastic component, and similarly to the memory parameter δ, the deterministic trend is characterized by a power law parameter γ. Both the memory and power law parameters are assumed to lie in sets which can be arbitrarily large. Thus, our model may display many different behaviours, including various types of dependence (antipersistency, weak dependence, long memory) and very flexible deterministic trend functions. Compared with our earlier work in Hualde and Nielsen (2020), there are three main differences in this paper. First, in this paper we apply the fractional coefficients directly to model the deterministic component instead of the approximation given by powers of t. As argued in the Introduction, the former are much more natural in a fractional context. Second, as a consequence of using the fractional coefficients to model the deterministic trend, we are able to relax the assumption γ > 0 on the power law parameter required by Hualde and Nielsen (2020). Instead, we allow the power law parameter to lie in any (arbitrarily large) compact interval. Third, we include an additional deterministic term such that we have both a constant/level and a trend. This complicates our analysis substantially, but clearly makes the model much more applicable in practice and also has the potential to alleviate bias arising from non-zero initial conditions in the Type II context (Johansen and Nielsen, 2016).
Our asymptotic results depend crucially on the relative strengths of the stochastic and deterministic components, as measured by the memory and power law parameters. Specifically, when the deterministic signal is sufficiently strong, that is if γ 0 −δ 0 > 1/2 (δ 0 < 1/2), the trend parameters γ 0 and β 0 (and level parameter μ 0 ) can be consistently estimated and their estimators are asymptotically normal. When the deterministic signal is weak, the parameters corresponding to the deterministic components (except μ 0 if δ 0 < 1/2) cannot be consistently estimated. Remarkably, the asymptotic results for estimator corresponding to the stochastic part of the model (i.e., τ ) are identical to those achieved in the simpler, purely stochastic, setting of Hualde and Robinson (2011), and are unaffected by the presence of the deterministic component, whether the parameters of the latter can be consistently estimated or not.
There are several interesting issues which have not been addressed in the present paper, but which will the object of future research. First, a semiparametric approach which focuses on estimating γ 0 and δ 0 (and possibly μ 0 ) without making parametric assumptions about the short-memory structure of z t seems possible. Second, the fractional process which characterizes our model is Type II, and it would be of interest to determine whether our theory could also be developed for a Type I fractional process.

Proofs of theorems
Throughout, will denote a generic arbitrarily small positive constant, and K a generic arbitrarily large positive constant. We note from the outset that many steps of the proofs are affected by the asymptotic behaviour of which, depending on the values of γ 0 and μ 0 , is dominated by either the first or the second term on the right hand side of (31). Recalling that γ 0 ∈ [ 1 , 1 − κ] ∪ [1 + κ, 2 ] and also that if γ 0 − 1/2 > δ 0 then β 0 = 0 (see Assumption A3), there are two cases. When μ 0 = 0 or μ 0 = 0, γ 0 ≥ 1 + κ, the second term dominates, whereas if μ 0 = 0, γ 0 ≤ 1 − κ, the first term dominates. We will give the proof for the former case. The proof for the latter case (μ 0 = 0, γ 0 ≤ 1 − κ) is very similar, just adapting many of the steps in the proof below and also some of the lemmas to the case where the first term in (31) is the dominant one, which mainly implies that "1" takes the role of "γ 0 " in many parts of the proof.

Proof of Theorem 1(i): the γ
Strictly, ε should be ε/ √ 2 in (32) and (33), but since ε is arbitrary this is irrelevant and we continue without the √ 2 factor. In (32) we prove consistency of τ , uniformly in γ. In (33) we prove consistency of γ, given that τ is consistent and hence that τ lies in a small neighborhood of τ 0 .
Noting that from (13), we decompose the objective function as defining also the coefficient As in Hualde and Nielsen (2020) the strategy of proof exploits the different orders of magnitude of the stochastic term s t (ϑ) and deterministic term d t (ϑ) in R T (ϑ) in different parts of the parameter space. Dealing with (32) and (33) allows us to consider separately the case where τ is "far" from τ 0 (that is, M ε ) and the case where γ is "far" from γ 0 (that is, N ε ). Two important technical problems are that (a) when γ = γ 0 then d t (ϑ) = 0 and (b) when δ is "far" from δ 0 then u t (δ − δ 0 ) may change from stationary to nonstationary. Both these features complicate greatly the treatment of (32) where γ = γ 0 is admissible. In (33) neither of these problems appear, and in that sense the proof of (33) is simpler than that of (32). Hence, we first give the proof of (33).

Pr inf
Pr inf Pr inf The partitioning of the parameter space into H i , H i , and H i for i = 1, . . . , 5 is useful because of the different behavior (orders of magnitude) of the stochastic term s t (ϑ) and the deterministic term d t (ϑ) on these sets. This motivates a separate analysis of (46), (47), and (48), at least for i = 1, . . . , 4. We first give the main ideas and motivation for separate treatment of each of these sets, and then we give below the details of the proofs for each of these sets in separate subsections.
For i = 5 the stochastic term s t (ϑ) dominates the deterministic term d t (ϑ) in S T (ϑ), and the contribution of d t (ϑ) to S T (ϑ) is negligible. Furthermore, because u t (δ − δ 0 ) is asymptotically stationary for i = 5, the proof for this case can be based on arguments from Hualde and Robinson (2011) for the purely stochastic term, ρ(L; ϕ)u t (δ − δ 0 ).
For i ≤ 4 the deterministic term d t (ϑ) dominates the stochastic term s t (ϑ) in S T (ϑ), but only if γ = γ 0 . Recall that d t (ϑ) = 0 when γ = γ 0 , which necessitates separate consideration of H i , H i , and H i , at least for some i.
Specifically, on H i we have |γ − γ 0 | ≥ , so that we can take advantage of d t (ϑ) dominating s t (ϑ) by an order of magnitude, so that the contribution of s t (ϑ) to S T (ϑ) is negligible. Thus, H i can be dealt with for i = 1, . . . , 4 with one proof that applies a uniform lower bound on, suitably normalized, T t=1 d 2 t (ϑ). On H i we have |γ − γ 0 | < ξT −κi , so that γ = γ 0 is admissible. Because d t (ϑ) = 0 when γ = γ 0 , we cannot exploit the lower bound on T t=1 d 2 t (ϑ). Thus, on H i we need to deal carefully with the stochastic term s t (ϑ) and we divide the parameter space and separately consider i = 1, . . . , 4 as in Robinson (2011), Johansen andNielsen (2012a), and subsequent works. For i = 4, u t (δ − δ 0 ) is asymptotically stationary and we can apply the same proof as for i = 5 using the mean value theorem to show that the contribution of d t (ϑ) is negligible. The cases i = 2, 3 deal with the discontinuity of the objective function, because u t (δ − δ 0 ) is near the border between stationarity and nonstationarity. The proof here uses the result of Hualde and Robinson (2011) that the contribution from the purely stochastic term, ρ(L; ϕ)u t (δ − δ 0 ), can be made arbitrarily large, and we then show that the contribution from the deterministic term is bounded. Finally, for i = 1, u t (δ − δ 0 ) is nonstationary and we can use the method of Johansen and Nielsen (2019) and Hualde and Nielsen (2020), that avoids the strong moment condition (see Johansen and Nielsen, 2012b). The reason is that u t (δ − δ 0 ) has memory δ 0 − δ ≥ 1/2 + η for an arbitrarily small η > 0, and the most direct proof for i = 1 would involve justifying the weak convergence of the appropriately normalized sum of squares of u t (δ−δ 0 ). The difficulty here is that for fixed δ convergence can be established under the condition that u t has q finite moments, where q ≥ 2 and q > (δ 0 − δ − 1/2) −1 . Thus, if δ 0 − δ is close to 1/2, the condition q > (δ 0 − δ − 1/2) −1 is very strong, and, in fact, this is a serious technical problem because Johansen and Nielsen (2012b) showed that this moment condition is necessary and, moreover, in earlier parts of the proof η is required to be arbitrarily small. However, by using the lower bound in Lemma 22, we instead need to show the weak convergence of the appropriately normalized sum of squares of u t (δ−δ 0 −1). This process has memory δ 0 − δ + 1, so this convergence does not require the strong moment condition because for i = 1, δ 0 − δ + 1 ≥ 3/2 + η. In addition, note that Johansen and Nielsen (2019) require 8 moments, but this is not used to establish the bounds that we require. The cases H i for i = 1, . . . , 4 also require joint treatment of the deterministic and stochastic terms in S T (ϑ). In these case we show that, after suitable normalization, the contribution from the deterministic term d t (ϑ) is "large" compared to that of the stochastic term s t (ϑ). In that sense, the proofs here are opposite those of H i for i = 2, 3. Because of the different behavior and normalization of u t (δ − δ 0 ), and hence of s t (ϑ), for each of i = 1, . . . , 4, slightly different proofs are needed for H i in each of these cases. 6.1.3. Proof of (46),(47),and (48) for i = 5 In this case, we give just one proof that covers the whole set H 5 ∪H 5 ∪H 5 , where where It follows that (46), (47), and (48) for i = 5 hold if we show that First, (50), (51), and (52) follow by identical arguments to those in the proofs of (2.8) and (2.9) in Hualde and Robinson (2011). Next, for any θ > 0, by (148) of Lemma 4 with γ 0 − δ ≤ 1 + η and δ 0 − δ ≤ δ 0 − γ 0 + 1 + η, the left-hand side of (53) is O p (T max{θ,1+δ0−γ0+η}+4θ−1/2+η ) which is o p (1) for θ and η sufficiently small. In the same way, the left-hand side of (54) is O p (T 2 max{θ,1+δ0−γ0+η}−1 ) by (185) of Lemma 18, which is again o p (1) for θ and η small enough. This concludes the proof of (46), (47), and (48) for i = 5.

Pr inf
Hi For δ ∈ ∪ 4 i=1 I i it holds that γ 0 −δ ≥ 1+η, so the probability above is bounded by

Pr inf
Hi Fix ζ such that 0 < ζ < η and let as T → ∞, noting the change in the normalization from (47) to (55), which is justified because the right-hand side of the inequality inside the probability in (47) is 0, so multiplying the left-and right-hand sides of the inequality by the same positive number does not alter the probability. By the Cauchy-Schwarz inequality and (39), the probability in (55) is bounded by where Pr inf First, because T 2κ4−2(γ0−δ)+1 = T −1−2ζ , (57) follows immediately by using Lemma 1. Next, fixing c such that 0 < c < 1/2, the probability in (58) equals Pr inf so (58) holds on showing that We first justify (60). Choose an arbitrarily small α > 0. Then we initially show that By the Cauchy-Schwarz inequality, By very similar arguments to those given in the proof of Lemma 2, using also Lemma 11, as well as very similar steps to those in the proof of Lemma 13, approximating sums by integrals (Lemma 10) and noting that the second term in (31) dominates because γ 0 ≥ 1 + κ, it can be shown that where sup H4,1−δ≥1/2+α |r 4T (ϑ)| = o (1). It can be shown that the first term on the right-hand side of (64) equals so by (22), (64), and (65) inf to conclude the proof of (60).

Pr inf
Pr inf First, changing the normalization (T qi instead of T ) and applying the Cauchy-Schwarz inequality, the probability in (69) is bounded by

Proof of Theorem 1(ii): the
so, as in the proof of part (i), the result follows by showing that the right-hand side of (45) is o(1), which, in view of Lemma 1, holds if Pr inf The proofs of (100) for each of i = 1, . . . , 4 follows in the next subsections. Each proof proceeds by showing that, because γ 0 − 1/2 < δ 0 , the deterministic term d t (ϑ) in R T (ϑ) is negligible with the appropriate normalization. We can therefore take advantage of lower bounds on the stochastic terms in R T (ϑ) established in Hualde and Robinson (2011), Johansen and Nielsen (2019), and Hualde and Nielsen (2020). Note again that Johansen and Nielsen (2019) require 8 moments, but that is not used to establish the bounds that we require.

Pr inf
Wi Thus, in view of (86) and (101) For i = 3, both (107) and (108) follow straightforwardly by identical steps as those given in the proofs of (102) and (103) just replacing η by 0. For i = 2, we need to take into account the different normalization, which implies using (192) instead of (191) in Lemma 19,(194) instead of (193) in Lemma 20, and (187) instead of (185) in Lemma 18.