Optimal estimation of the supremum and occupation times of a self-similar Lévy process

In this paper we present new theoretical results on optimal estimation of certain random quantities based on high frequency observations of a Lévy process. More specifically, we investigate the asymptotic theory for the conditional mean and conditional median estimators of the supremum/infimum of a linear Brownian motion and a strictly stable Lévy process. Another contribution of our article is the conditional mean estimation of the local time and the occupation time of a linear Brownian motion. We demonstrate that the new estimators are considerably more efficient compared to the classical estimators studied in e.g. [6, 14, 29, 30, 38]. Furthermore, we discuss pre-estimation of the parameters of the underlying models, which is required for practical implementation of the proposed statistics. MSC2020 subject classifications: Primary 62M05, 62G20, 60F05; secondary 62G15, 60G18, 60G51.


Introduction
During the past decades the increasing availability of high frequency data in economics and finance has led to an immense progress in high frequency statistics. In particular, high frequency functionals of Itô semimartingales have received a great deal of attention in the statistical and probabilistic literature, where the focus has been on estimation of quadratic variation, realised jumps and related (random) quantities. A detailed discussion of numerous high frequency methods and their applications to finance can be found in the monographs [1,31].
Despite large amount of literature on high frequency statistics, the question of optimality has rarely been addressed. To fix ideas we consider a stochastic process (X t ) t∈[0,1] with a known law and an associated random quantity 1 0 f (X s )ds for various Markovian and non-Markovian models. The main focus here is on deriving error bounds and weak limit theorems for Riemann sum type estimators, which heavily depend on the smoothness of f . In several settings they also prove rate optimality in the case of Brownian motion.
The aim of our paper is to study optimal estimation of extrema, local time and occupation time of certain Lévy processes. Accurate estimation of these random functionals is important for numerous applications. For instance, supremum is a key quantity in insurance, queueing, financial mathematics, optimal stopping and various applied domains such as environmental science where maximal level of pollution is often of interest. It is noted that our theory can also be used in Monte Carlo simulation of extrema via discretization, but this is not our main focus since much better algorithms exist [17]; see also [27] for exact simulation of the supremum of a stable process. These algorithms, however, can not handle, e.g., the diameter of the range of X, whereas our estimators still apply. Accurate estimation of local times is required in a number of statistical methods including estimation of the volatility coefficient in a diffusion model [24], estimation of the skewed Brownian motion [34] and estimation of the reflected fractional Brownian motion [28], just to name a few.
The estimation of the aforementioned random quantities has been studied in several papers. The standard estimator of the supremum of a stochastic process is given by the maximum of its high frequency observations. In the setting of a linear Brownian motion the corresponding non-central limit theorem has been proven in [6]; their result has been later extended in [29] to the class of Lévy processes satisfying certain regularity assumption. Statistical inference for local times has been investigated in [14,30], who showed asymptotic mixed normality for kernel type estimators in the framework of continuous SDEs. Finally, [5,38] discussed the estimation of the occupation time measure via Riemann sums.
In this paper we show that the standard estimators proposed in the literature are indeed rate optimal, but they are not asymptotically efficient. Instead, we consider the conditional mean and conditional median estimators, which turn out to be manageable in some important cases. It is well known that the conditional mean E[Q|(X i/n ) i∈[0:n] ] is the optimal L 2 -predictor when E[Q 2 ] < ∞. In many cases considered below, however, the random variable Q will not have a finite second moment. Then we use the conditional median estimator med[Q|(X i/n ) i∈[0:n] ], which is optimal in L 1 sense given that E[|Q|] < ∞. Additionally, we still do consider the conditional mean which is a very natural estimator even when the second moment is infinite. Importantly, it is optimal with respect to the Bregman distance: D(x, y) = φ(x) − φ(y) − φ (y)(x − y) with φ being a strictly convex differentiable function [8]. It is only required here that E[|Q|] and E[|φ(Q)|] are finite. We often have Q ≥ 0 and E[Q p ] < ∞ for some p > 1, and hence we may take φ(x) = x p to produce an optimality statement for the conditional mean estimator. Finally, the conditional median is optimal with respect to D(x, y) = (1 {x≥y} − 1/2)(g(x) − g(y)) for an increasing function g which in our case can be taken as g(x) = x p for p > 0, see [26] and references therein.
In the case of supremum, the conditional mean and median estimators have a rather explicit and simple form, but their performance assessment is not a trivial task. Importantly, self-similarity of X (up to measure change) is the key property when evaluating such estimators and establishing the corresponding weak limit theory. Thus we consider the following two classes of processes: (i) linear Brownian motions and (ii) non-monotone self-similar Lévy processes. In the case of local/occupation time we only work with the class (i) of linear Brownian motions and focus on the conditional mean estimators exclusively, which is dictated by the structure of the problem and the tools currently available. Importantly, our conditional mean estimator of the local time fits the framework of [30] and yields an asymptotically optimal statistic in some large class in the case of continuous SDEs, see Remark 2. We find that our new optimal estimators are considerably more efficient than the standard ones and that they do have narrower confidence intervals. In the case of supremum, this is illustrated by a numerical study. Furthermore, we discuss several modifications of our statistics including pre-estimation of unknown parameters of the underlying model. This paper is structured as follows. §2 is devoted to the supremum and its conditional mean and median estimators with the corresponding weak limit theory in the case of a self-similar Lévy process with a known law. Here we also treat the case of a linear Brownian motion, and comment on the conditional mean estimator of the range diameter. In §3 we present the conditional mean estimators of the local time and occupation time together with the asymptotic theory in the case of a linear Brownian motion. Then in §4 we study modified statistics based on pre-estimation of the unknown parameters of the model. In particular, we show that reasonable pre-estimation of the model parameters does not affect the asymptotic theory. Furthermore, the effect of truncation of the potentially infinite product involved in the construction of the supremum estimators is discussed, and some comments concerning a general Lévy process are given. Numerical illustrations for the case of supremum are presented in §5, where both a linear Brownian motion and a one-sided stable processes are considered. The proofs are collected in Appendix A and Appendix B for the supremum and local/occupation time, respectively. The former also requires some additional theory for Lévy processes conditioned to stay positive which is given in Appendix C.

Optimal estimation of supremum for a self-similar Lévy process
In this section we assume that (X t ) t≥0 is a non-monotone 1/α-self-similar Lévy process, i.e. (X ut ) t≥0 where necessarily α ∈ (0, 2]. Assuming that the law of X (or its parameters) is known, we focus on optimal estimation of the supremum and infimum of X on the interval [0, 1] from high-frequency observations. The case α ∈ (0, 2) corresponds to a strictly α-stable process, whereas for α = 2 we have a scaled Brownian motion, and the respective simplified expressions for the statistics and their limits can be found in §2. 4. In fact, §2.4 considers a more general setting of a linear Brownian motion, which is not self-similar but becomes such under Girsanov change of measure. Some further results concerning estimation of infimum and the range diameter are given in §2.5. We introduce the notation to denote the running supremum and infimum process, respectively. Furthermore, the time of supremum will often be needed, and thus we define The standard distribution free estimator of X 1 is given by the empirical maximum of the observed data: We remark, however, that M n is always downward biased. Finally, estimation of the infimum amounts to estimation of the supremum of −X, and thus no additional theory is needed. The joint estimation of supremum and infimum is discussed in §2.5.
In the following we will often use the notion of stable convergence. We recall that a sequence of random variables (Y n ) n∈N defined on (Ω, F, P) is said to converge stably with limit Y (Y n dst −→ Y ) defined on an extension (Ω, F, P) of the original probability space (Ω, F, P), iff for any bounded, continuous function g and any bounded F-measurable random variable Z it holds that The notion of stable convergence is due to Rényi [39]. We also refer to [2] for properties of this mode of convergence.

Preliminaries
We will now review the asymptotic theory for the estimator M n , which will be useful for studying conditional mean and median estimators. In order to state the limit theorem for M n , we need to introduce an auxiliary process (ξ t ) t∈R . It is defined as the following functional weak limit: where T > 0 is a deterministic time horizon, see [9]. Here and in the following it is tacitly assumed that the left hand side is ∞ when τ T + t / ∈ [0, T ]. The functional convergence is always with respect to the Skorokhod J 1 topology, unless specified otherwise. It may be useful to think of ξ as the process X seen from its supremum as the time horizon tends to infinity.
It is well known that (ξ t ) t≥0 and (ξ (−t)− ) t≥0 are independent finite Feller processes starting at 0. Various representations of these processes exist and a number of important properties have been established, see e.g. [19] and references therein. The latter process when started at a positive level is often referred to as X conditioned to stay positive (the negative of the former is X conditioned to stay negative); here conditioning is understood in a certain limiting sense. The law of the limiting process ξ is not explicit except when X is a Brownian motion and then both parts of ξ are Bessel processes of order 3 scaled by σ, the standard deviation of X 1 . In all cases ξ inherits self-similarity from X, and hence both parts (when started from positive values) are positive self-similar Markov processes admitting Lamperti representation studied in detail in [16].
Due to self-similarity of the process X it holds that where again ξ 1]. In other words, the process ξ arises from zooming-in on X at its supremum point. We refer the reader to [6,29] for the case of a linear Brownian motion and a general Lévy process, respectively.
The following result is an instructive application of the convergence in (2). It is a particular case of [29,Thm. 5] extending the result of [6] for Brownian motion.
Theorem 1. For a non-monotone 1/α-self-similar Lévy process X we obtain the stable convergence as n → ∞: where ξ and the standard uniform random variable U are mutually independent, and independent of F.
Let us mention the underlying intuition, which will be important to understand our main result in Theorem 2 given below. Note the identity where {x} stands for the fractional part of x. The random time τ 1 has a density [18] and thus according to [31,33] which together with (2) hint at (3). It is noted that the convergence in (2) is, in fact, stable with ξ being independent of F. Intuitively, zooming-in at the supremum makes the values of X at some fixed times irrelevant. We stress that this only provides intuition and the proof is far from being complete, see [29] and also [13] providing the necessary corrections.

Optimal estimators
Let us proceed to construct our optimal estimators given by the conditional mean and median. For this purpose we introduce the conditional distribution of X 1 given the terminal value X 1 via F (x, y) := P(X 1 ≤ x|X 1 = y).
We choose a version continuous in y which is, in fact, jointly continuous in (x, y) as will be shown in Lemma 3 below. By self-similarity we also have F 1/n (x, y) := P(X 1/n ≤ x|X 1/n = y) = F (n 1/α x, n 1/α y).
Next, consider the conditional distribution of X 1 − M n given the observations: where Δ n j := M n − X j/n and the second line follows from the stationarity and independence of increments. We note that H n (x) is continuous and strictly increasing in x ≥ 0. Finally, we introduce the conditional mean and conditional median estimators of X 1 : where in the first line we use the integrated tail formula. Interestingly, T mean n < ∞ even when EX 1 = ∞, see Remark 4. When evaluating our statistics defined in (4) and (5) we need access to the function F (x, y). This function, however, is explicit only in the Brownian case analyzed in §2. 4 and is semi-explicit in the case of one-sided jumps, see Proposition 4. Thus, in the case of general strictly stable process one needs to assess F numerically, which may necessitate truncation of the product in the definition of H n . Such modifications are discussed in §4.2.

Limit theory
We start by noting that H n is a random probability measure and H n d → δ X1 Palmost surely, whereas H n (xn −1/α ) has a non-trivial limit. Observe that ξ is the rescaled distance of the jth observation following τ 1 from the supremum. Thus where we tacitly assume that the factors with ξ (n) · = ∞ evaluate to 1. In view of Theorem 1 it is intuitive that the limit is where the random quantities U, ξ and V are defined in Theorem 1. By substitution we obtain the identities which suggest the asymptotic behaviour of our estimators defined in (4) and (5). We formalise this in one of our main results: Assume that X is a non-monotone 1/α-self-similar Lévy process. Then the random function H is continuous and strictly increasing with H(0) = 0 and H(∞) = 1 P-a.s. and Estimation of the supremum and occupation times 899 with respect to the uniform topology, where V and H(x) are defined in (3) and (6), respectively. Furthermore, our estimators satisfy where the limit random variables are finite.
It is noted that the proof of this result is far from trivial, since it requires precise understanding of the tail function 1 − F (x, y) for large x and the rate of growth of ξ (n) t as t → ∞ (uniformly in n) among other things. The identities (7) and (8) show that the statistics T mean n and T med n are first order equivalent to the standard estimator M n , and the knowledge of the distribution of X only enters through the n −1/α -order term. This fact will prove to be important in Section 4, where the parameters of the law of X will need to be estimated.
Recall that EX p 1 < ∞ for p ∈ (0, α). Moreover, all moments of X 1 are finite when X is a Brownian motion or a strictly α-stable process with no positive jumps. In the latter cases the conditional mean estimator is optimal in L 2 sense. In the case α ∈ (1, 2] the conditional median is optimal in L 1 sense and the conditional mean is optimal with respect to the above mentioned Bregman distance D(x, y) = x p −y p −py p−1 (x−y), where p ∈ (1, α). Finally, the conditional median is optimal with respect to the loss function D(x, y) = (1 {x≥y} − 1/2)(x p − y p ) for p ∈ (0, α) and any α.
Interestingly, all the expressions in Theorem 2 stay the same if the process X is replaced by its negative −X, see Proposition 3. In particular, in the spectrallypositive case the difference X 1 − T mean n has moments of all orders even though each term has infinite second moment, see also Remark 4 below.

Linear Brownian motion
Consider a linear Brownian motion X with drift parameter μ ∈ R and scale parameter σ > 0, which is self-similar (and hence Theorem 2 applies) only when μ = 0. Nevertheless, X can be obtained from a scaled Brownian motion by Girsanov change of measure and, in particular, the conditional distribution P(X 1/n ≤ x|X 1/n = y) does not depend on μ, see §A.4.1. Hence our estimators have exactly the same form as in the case of μ = 0, see §2.2. Furthermore, the conditional distribution function F is explicit in this case and is given by which follows from [42] or earlier sources, see also [15, 1.1.8]. Thus and the estimators are then defined by (4) and (5). Interestingly, also the limit theorem has exactly the same form. The main reason for this is that the limit in (2) does not depend on μ either, see [6]. In the following result we prefer to choose the scaling √ n/σ rather than √ n so that the respective quantities correspond to the standard Brownian motion.

Corollary 1. For a linear Brownian motion X with drift parameter μ and scale
where V = min j∈Z ξ j+U and with ξ being the two-sided Bessel process of order 3 and U a standard uniform, which are mutually independent and independent of F.
Additionally, we now show that (12) extends to convergence of moments.

Lemma 1. For a linear Brownian motion X and any p > 0 we have
As a consequence of Lemma 1 we obtain an asymptotic expansion of the mean squared error:

Joint estimation of supremum and infimum
Consider the process −X t and the associated conditional mean estimator T mean n of its supremum sup t∈[0,1] (−X t ) = −X 1 , which is the negative of the infimum of X. According to Proposition 3 there is the symmetry: for all n, and so also the asymptotic theory is the same. Furthermore, we have the following joint convergence (linear Brownian motion included with α = 2 and then the limit corresponds to the case μ = 0): where L and L are identically distributed, mutually independent, and independent of F. Their common distribution is the limiting law in (10).
This, for example, readily yields the limit result for the conditional mean estimator of the range diameter X 1 − X 1 .

Optimal estimation of local time and occupation time for a linear Brownian motion
In this section X denotes a linear Brownian motion with drift parameter μ ∈ R and scale σ > 0, and (L t (x)) t≥0 denotes the corresponding local time process at the level x ∈ R, which is a continuous increasing process given as the almost sure limit: Our aim here is to establish limit theorems for the conditional mean estimators of L t (x) and O t (x).

Basic formulae
An important role will be played by the functions where E 0 corresponds to the law of the standard Brownian motion. Both functions g and G have explicit formulae in terms of the density ϕ and survival function Φ of the standard normal distribution. Some basic observations and these formulae are collected in the following result.

Lemma 2.
There are the identities

J. Ivanovs and M. Podolskij
Moreover, the functions g and G are bounded on R 2 and satisfy g(

Estimators and the limit theory
The conditional mean estimators of L t and O t are easily derived using stationarity and independence of increments of X together with Lemma 2: . It is noted that the lower order terms can be written down explicitly (they are 0 when tn is an integer), but we keep them implicit, because they do not have an influence on the limit theorem presented below.

Theorem 3.
Assume that X is a linear Brownian motion with drift parameter μ ∈ R and scale σ > 0. Then for any x ∈ R we have the functional stable convergence: where W is a Brownian motion independent of F and Importantly, our conditional mean estimator (15) is a particular example of a more general class of statistics investigated in [30] in the context of continuous diffusion processes. The expression for v l in [30] is rather lengthy and hard to evaluate, because of the generality assumed therein. In our case, g(x, X 1 ) = E[L 1 (x)|X 1 ] is the conditional expectation and, in fact, a rather short direct proof can be given yielding the constant v 2 l at the same time, see Appendix B.
6232 obtained when instead of the optimal g(x, z) one uses the kernelĝ(x) = R (|x + u| − |x|)ϕ(u)du depending on x only, see [30, (1.27)]. The corresponding estimator , which does not take the increment following X i−1 n into account.

Remark 2. Consider the class of continuous SDEs defined via the equation
where B is a standard Brownian motion and σ ∈ C 1 (R), μ ∈ C(R) are such that the above SDE has a unique strong solution. In [30] the author considers statistics of the form When σ > 0 and |h(y, z)| ≤h(y) exp(a|z|) withh bounded and satisfying R |y| rh (y)dy < ∞ for some r > 3, the stable convergence holds, see [30,Theorem 1.2]. Furthermore, the positive constant v h (x) (and the proof of stable convergence) stems from the simpler model Hence, we can conclude that our estimator L t (x) is asymptotically optimal within the class of statistics L(h; x) n t in the general setting of continuous SDEs. We believe that the restriction to the class L(h; x) n t is not required and L t (x) is asymptotically efficient for continuous SDEs. Furthermore, when the function σ is unknown the coefficient σ(x) can be estimated with a n 1/3 -accuracy [24] and we can build a feasible statistic without affecting the asymptotic theory (cf. Proposition 2 below).

Some modifications of the proposed statistics
The main goal of this section is to show that the above developed theory also applies in the setting when the law of X is not known, but a consistent estimator of the parameters is available. Furthermore, we construct certain simplified estimators of the supremum in order to cope with potential numerical issues.

Unknown parameters
The main results of Theorem 2 and Theorem 3 above assume that the law of the process X is known, which is hard to accept in practice. At most, we are willing to assume that the process X belongs to some parametric class, and we distinguish between the following two: (i) Linear Brownian motion with drift parameter μ ∈ R and scale σ > 0, where for convenience we set α = 2. As we remarked earlier neither the statistics nor the limits in Corollary 1 and Theorem 3 depend on μ, which, in fact, can not be estimated consistently on finite interval of time. Hence, the only parameter of interest is θ = σ. (ii) Non-monotone self-similar Lévy process which is naturally parameter- , and ρ ∈ (0, 1) for α ∈ (0, 1] which excludes monotone processes. This parametrization, unlike the one with skewness parameter, is continuous in the sense that convergence of parameters holds iff the processes converge.
Suppose now that we have a consistent estimator θ n of the true parameter θ. Feasible estimators for supremum, local time and occupation time are now obtained via the plug-in approach. In particular, we have The construction of estimators θ n of the unknown parameter θ for models (i) and (ii) is a well understood problem in the statistical literature. In particular, in class (i) the maximum likelihood estimator of σ is given by . Numerous theoretical results on parametric estimation of model (ii) can be found in e.g. [35]. Since the maximum likelihood estimator of θ is not explicit, we rather propose to use the following statistics: where q ∈ (−1/2, 0). Additionally, we need to ensure that our parameters are legal, and in particular α n , when larger than 1, is truncated at (ρ n ∨ (1 − ρ n )) −1 . Due to self-similarity of X and the law of large numbers we have that which gives the idea behind the construction of α n . Indeed, all estimators are weakly consistent and since E[|X 1 | 2q ] < ∞ for q ∈ (−1/2, 0) we easily conclude that The proposed estimators are not efficient, but they suffice for our purposes, see Proposition 1 below. It turns out that the limit theory presented in Theorem 2 and Theorem 3 continues to hold under a rather weak assumption on a consistent estimator θ n of θ; in particular, this assumption is satisfied by estimators we proposed above. In other words, the difference between the modified and original estimators is negligible in the right sense.
This shows that the estimators T mean n and T med n are asymptotically efficient in the sense that they are asymptotically equivalent to the respective optimal estimators relying on the knowledge of true parameters. In class (ii) the true α is not known, but in view of Proposition 1 the assumption (α n − α) log n P → 0 guarantees that Furthermore, the limit distributions are well approximated by their analogues corresponding to parameter θ n , and so we may construct asymptotic confidence intervals for the estimators T mean n and T med n . With respect to local/occupation time we have the following result.

Proposition 2. Consider class (i) and assume that
Then for any x ∈ R, T > 0 it holds that This again shows that the estimators L t (x) and O t (x) are asymptotically efficient, and provides the respective asymptotic confidence bounds. Condition (18) is quite expected in the case of local times since n 1/4 is the corresponding rate of convergence in (16), but it is surprising that this condition is also sufficient to conclude the asymptotic efficiency of O t (x). Roughly speaking, the reason for condition (18) to be sufficient in the latter case is that partial derivatives of G correspond to the local time asymptotics thus changing the convergence rate from n 3/4 to n 1/4 . We refer to §B.3 for more details.

Truncation of products in supremum estimators
Here we return to the assumption that the law of X is known. Consider supremum estimators defined in §2.2 in terms of the conditional distribution function H n (x). When the number n of observations is large, it may be desirable to reduce the number of terms in the product defining H n (x), in order to avoid numerical issues and to speed-up the calculations. This is especially true when X is not a linear Brownian motion and so the function F is not explicit.
Intuitively, we may want to keep the terms which are formed from the observations closest to the maximum. Thus, we let H n (x; k) for k ∈ N + be the analogue of H n (x), but such that the product has at most 2k terms and, in particular, the indices j are chosen such that 0 ∨ (I n − k) ≤ j ≤ (I n + k − 1) ∧ (n − 1) with I n being the index of the maximal observation. Define T mean n,k and T med n,k as before but using H n (x; k) instead of H n .
Letting I ∈ Z be the unique number satisfying ξ I+U = V (it achieves the minimum V in (3)), we define We now have the limit result analogous to Theorem 2: Corollary 3. For any α ∈ (0, 2] it holds that H(x; k))dx, It turns out that we do not need to exclude α ∈ (0, 1] in the case of modified conditional mean estimator, because the number of terms is kept finite. In the Brownian case and k = 1 we arrive at the following simple formulas Remark 3. It is likely that Theorem 2 can be generalized to an arbitrary Lévy process satisfying the following weak regularity condition: for some positive function a u and necessarily self-similar Lévy process X. Importantly, the general versions of (2) and (3) are proven in [29]; here the limiting objects correspond to X. There are, however, two very serious difficulties. Firstly, joint convergence does not necessarily imply convergence of the conditional distributions. Thus, one needs to use the underlying structure to show that F 1/n (x/a n , y/a n ) = P(a n X 1/n ≤ x|a n X 1/n = y) → F (x, y).
Secondly, the proof of uniform negligibility of truncation in §A.3.2 crucially depends on X being self-similar. This part may be notoriously hard for a general Lévy process X.

Numerical illustration of the limit laws
In this section we perform some numerical experiments in order to illustrate the limit laws in Theorem 2 and Theorem 3. For simplicity we take X to be a standard Brownian motion and, additionally, a one-sided stable process in supremum estimation which is motivated by the semi-explicit formula for the function F in Proposition 4. All the densities are obtained from 10, 000 independent samples using standard kernel estimates. The number of samples is reduced to 1, 000 in the case of the one-sided stable process.

Supremum estimation for Brownian motion
Consider a standard Brownian motion X and the limiting random variable V in (3), as well as (10) and (11), respectively. Recall that all of these quantities are explicit, see also Corollary 1, but they all depend on infinitely many observations ξ j+U , j ∈ Z of the two-sided Bessel process ξ of order 3. We approximate these quantities by setting ξ j+U = ∞ for j < −50 or j ≥ 50, which effectively amounts to  considering 100 epochs centered around 0; choosing twice as many epochs had negligible effect on the results below. The resulting densities are depicted in Figure 1. In Table 1 we report the L 2 -norm, the L 1 -norm, and the narrowest 95%-confidence interval length for each of the limiting distributions. It is noted that, indeed, V mean has the smallest L 2 -norm and V med has the smallest L 1norm, and the respective distributions are very similar.
Observe that the main problem of the standard estimator M n is that it is downward biased and so V is not centered. This, however, can be easily remedied since according to [6] where ζ is the Riemann zeta function. In other words, we may consider an asymptotically centered estimator M n + 1 √ n EV , which leads to V shift := V −EV . Finally, we also consider the truncated conditional mean estimator T mean n,1 based on H(x; 1), which is a product of two terms and thus only moderately more complicated to evaluate as compared to M n , see §4.2. The respective limit is denoted by V 1 mean . Relative comparison of the latter two together with V mean is provided in Figure 2, see also Table 1.
In conclusion, the conditional mean and conditional median estimators are very similar to each other and considerably better than the standard estimator M n in terms of L 2 -norm and L 1 -norm. Nevertheless, the other simple estimators discussed above are only slightly worse than the optimal ones.

Supremum for one-sided stable process
Here we consider a strictly stable Lévy process with α = 1.8, standard scale and only negative jumps present, i.e., the skewness parameter is β = −1. Note that the results in the opposite case β = 1 must be similar according to Proposition 3. The conditional distribution function F is numerically evaluated using the expressions in Proposition 4, see Figure 3a.
In this case we perform a number of approximations. Firstly, simulation of ξ is not obvious (unlike the Brownian case) and so we approximate the limiting object (V, H(x)) by (n 1/α (X 1 −M n ), H n (xn −1/α )) with n = 300, see (9). Instead of scaling X 1 , M n , Δ i with n 1/α we perform the simulation of the process X on the interval [0, n], which is allowed by self-similarity of X. Furthermore, X is simulated on the grid with step-size 1/m for m = 300, which yields an approximation of X n further corrected by the easily computable asymptotic mean error m −1/α EV , see [7]. Next, we take (at most) 30 terms in the product defining H n based on the observations closest to the maximum, that is we replace it by H n (·; 15) defined in §4.2. Finally, ∞ 0 (1 − H(x))dx is approximated using the trapezoidal rule with step size 0.1 and truncation at x = 3, see Figure 3b; the same approximation is used in calculation of the inverse.
The results are presented in Figure 4 and Table 2. They are quite similar to the results in the Brownian case.

Local time and occupation time for Brownian motion
Let again X be a standard Brownian motion and choose x = 0, t = 1. We use L 1 (0) with n = 10, 000 as a substitute for the true L 1 (0), which then allows to sample (approximately) from the limit distribution in (16). Next, we use the same sample path to construct L 1 (0) with n = 100, which allows to sample from the pre-limit expression in (16). Finally, we also take a standard estimator  Figure 5. The ratio of variances for n = 100 is 1 : 1.64, which can be compared to 1 : 1.35 for the more advanced estimator mentioned in Remark 1 (here we use the exact expressions of the limits).
We perform a similar procedure for the occupation time in (0, ∞). Here the standard estimator is 1 n #{i ∈ [0 : n − 1] : X i/n ≥ 0}. The respective densities are given in Figure 6, and we see a very substantial improvement. The ratio of variances for n = 100 is 1 : 2.64.

Appendix A: Proofs for supremum estimation
In the following all positive constants will be denoted by c although the may change from line to line.

A.1. Duality
In this section we establish a duality result for a general Lévy process X. Even though it is not needed for the proofs, we present this duality, because it explains certain structure in the main results. To this end, consider the process X t = −X t and the associated quantities X 1 , M n , H n (x), F t (x, y), see §2.2.  Local time: the densities of the limit (solid black) and pre-limit (dashed red) in (16) for n = 100, as well as pre-limit for the standard estimator (dotted blue) Fig 6. Occupation time: the densities of the limit (solid black) and pre-limit (dashed red) in (17) for n = 100, as well as pre-limit for the standard estimator (dotted blue) Proposition 3. Let X be an arbitrary Lévy process. Then because X does not jump at j/n almost surely. Letting x j be the observation of X j/n we find that and the same reasoning works for F t (x, y) when time-reverting at t.
In view of Proposition 3, the errors X 1 −T mean n and X 1 −T med n have the same distribution as the respective errors for the process −X. Thus, the corresponding limit results must stay the same when the skewness parameter β is flipped to the opposite. In the proofs we may safely assume that β ≥ 0, say.

A.2. On the function F in the stable case
Before starting the proof of the main result we establish some basic properties of the conditional probability F (x, y) in the case of a strictly α-stable process when it is not explicit. Throughout this subsection we assume that X is a strictly α-stable process with skewness parameter β ∈ [−1, 1]. Note that the boundary values β = −1 and β = 1 correspond to spectrally negative and spectrally positive processes, respectively; in both cases we must have α ∈ (1, 2), because we have excluded monotone processes.
It is well known [41, p. 88] that X t has a continuous strictly positive bounded density, call it f t (x). Moreover, by self-similarity Furthermore, f (x) ∼ cx −α−1 as x → ∞ when β = −1, and otherwise it decays faster than an exponential function [41,Eq. (14.37)]. Let us define the first passage times τ ± x = inf{t ≥ 0 : ±X t > ±x} above and below a given level x. F (x, y) is jointly continuous. Moreover, F (x, y) = 0 for x ≤ y + , and otherwise

Lemma 3. The function
Proof. Assume for the moment that x > y + . By time reversal (or from Proposition 3) we get Using the strong Markov property we find that is a version of the density of the measure on the left of (22). This expression coincides with (20) according to (19). Similarly, (21) is a version of the density of the measure on the right of (22), and hence both expressions coincide for almost all y. Next, we show that the expressions in (20) and (21) are jointly continuous on x > y + , and thus must coincide on this domain. We do this for the first expression only, since the other can be treated in the same way. By the basic properties of Lévy processes [10] we see that τ + x = 1 and (τ + x , X τ + x ) is continuous on an event of probability 1. Hence we only need to show that the dominated convergence theorem applies. Choose an arbitrary sequence (x , y ) converging to (x, y) with x > y + . Now X τ + x − y > x − y > for some > 0 (further down in the sequence). Note that f (−x) ≤ cx −α−1 for some c > 0 and all x > 0; in the spectrally positive case the decay is even faster. Hence the term under the expectation is bounded by c(1 − τ + x ) −α−1 and we are done. It is left to show that either one of (20) and (21) converges to f (y ) as x → x , y → y with x < y + and x = y + (the boundary of the domain); this would imply F (x, y) → 0. In the case y < 0 use (20) and the above reasoning, while for x > 0 use (21). It is left to analyze the case of x = y = 0. Note that (20) is lower bounded by the same expression with the indicator replaced by the indicator of τ + x < 1/2. But now the dominated convergence theorem applies and yields the limit f (0). The upper bound is f (y) by construction, and the limit is again f (0). The proof is thus complete.
We are now ready to provide some bounds on F (x, y). In the one-sided cases the bounds can be considerably improved, but this is not needed in this work and so we prefer a simpler statement.
which for y > 0 leads to the claimed bound c exp(−(x − y)). For y ≤ 0 we find from (20) that which readily implies the bound c exp(−x). Similar analysis yields the bound in the case β = 1.
It is noted that we may also derive a bound for β ∈ (−1, 1) by using (21) instead of (20). This bound is better when y > 0 and worse when y < 0. For our purpose any of these bounds is sufficient. Finally, we derive a semi-explicit expression of F (x, y) in the one-sided case. This expression is in terms of the density f .

Proposition 4.
In the spectrally one-sided cases we have for all x > y + : Proof. It is known that P(X 1 ∈ dx) = αf (x) for x > 0, when X is spectrally negative, see [36]. Moreover, Plugging this into Lemma 3 yields the result. Finally, (21) follows from F (x, y) = F (x − y, −y), see Proposition 3.

A.3. Proof of Theorem 2
In the following we frequently use the inequality Let I (n) = τ n be the index of the first observation to the right of the supremum time, and put In other words, u (n) i are the rescaled distances from the supremum to the observations indexed with respect to the time of supremum. Now we can represent the quantities appearing in (9) as follows: i+1 is infinite. According to [29] (or [6] in the case of Brownian motion) we have the following weak convergence for every k > 0: Intuitively, this limit can be understood as arising from (2) together with the fact that {nτ } converges to an independent uniform on (0, 1). This explains (of course, only intuitively) the form of the result in Theorem 2.

A.3.1. Convergence of the truncated versions
Let H (n) k be the same as H (n) , but with the product running over |i| ≤ k: where again F = 1 when the index is out of range. We also define the analogous object formed from the limiting quantities: Note that H with respect to the uniform topology. Moreover, where the limit variables are finite almost surely.
Proof. In view of (24) we only need to establish the continuity of the respective maps. Consider (2k + 1)-dimensional vectors a (n) and b (n) converging to some vectors a and b, respectively, where the entries of a (n) and a are non-negative and the entries of a, b are finite. Observe using (23) that where convergence of F is uniform in x ≥ 0 since the limit function is continuous and non-decreasing in x ≥ 0, and is upper bounded (Polya's theorem). Thus, the first statement is now proven.
Concerning the second statement, we find that and it is left to show that each summand converges to 0, i.e. that the dominated convergence theorem applies. According to Lemma 4 both F (x + a (n) , because of monotonicity of F in the first argument and the fact that b (n) i → b i < ∞; the decay is even faster in the case of β = ±1 or when X is a Brownian motion. The proof of the second statement is now complete, since finiteness of the limit is shown in the same way.

A.3.2. Uniform negligibility of truncation
Showing that truncation at a finite k is uniformly negligible (in the sense of [11,Thm. 3.2]) is the crux of the proof. Firstly, we will need the following representation-in-law of the sequences u (n) i , which builds on [9] and selfsimilarity of X. Lemma 6. There exists a processξ having the law of ξ and a sequence of random variables τ n such that (τ n ) n>0 and (n − τ n ) n>0 are non-negative non-decreasing sequences and the following is true: Let Proof. By self-similarity (n 1/α X t/n ) t∈[0,n] has the same law as (X t ) t∈ [0,n] . According to [9], the law of the latter process when seen from the supremum, see (1) i . We will also need asymptotic bounds on the process ξ, which can be read of [25,Cor. 3.3] or [20], see also [37] for the Brownian case.

Lemma 7. For any
In particular, the probability of the event The following result establishes convergence of certain series, which is only needed for the case of a stable process with two-sided jumps.
We are now ready to establish that truncation is indeed uniformly negligible: For any > 0 we have Moreover, almost surely it holds that as k → ∞.
Proof. We start by showing (27). Using (23) we find that where the summand is 0 when either of u i+1 is infinite. By monotonicity of F in the first argument, and the fact that P(V (n) > v) can be made arbitrarily where we have replaced u having the same law as defined in Lemma 6. Note also that the sum here runs over i > k since the other part (i < −k) can be handled in the same way.
Choose p ± with p − < 1/α < p + such that the conclusion of Lemma 8 is satisfied when α ∈ (0, 2), β ∈ (0, 1). Note that we may restrict to the event E T,p± for a large enough T > 0, see Lemma 7; that is, we have t p− ≤ξ t ≤ t p+ for all t > T .
Next we show (28). With respect to the second statement we only need to show that In the case α ∈ (1, 2), β ∈ (−1, 1) the upper of Lemma 3 reads for i > T . Integrating over x ≥ 0 we get the bound ci −2αp− (1 + D α+1 i ) and the proof is again completed by the Markov's inequality and Lemma 8. In the case β = ±1 the bound is and a similar bound holds for α = 2.
Finally, similar (but simpler) arguments show that there is convergence in probability in (29). But the product is monotone for each x ≥ 0. Thus we have uniform convergence almost surely. For α ∈ (1, 2] we find using above arguments that the integral

A.4. Related results
Here we provide the proofs (or just the main ingredients) of the results related to Theorem 2.

A.4.1. Linear Brownian motion
Proof of Corollary 1. Firstly, note that the scaling σ can be indeed taken out as in (12) and (13). This is true in general, because we may always rescale the process and the corresponding observations before the analysis. Thus we may assume that σ = 1 in the following. Now suppose that μ = 0 and so X is not self-similar. Recall that the estimators are the same as in the case μ = 0. Furthermore, according to [6] the convergence in (24) is still true, where the limit variables are defined in terms of the Bessel process of order 3. The main difficulty is that Lemma 6 is no longer true and the proof of uniform negligibility of truncation fails.
By Girsanov's theorem, we may introduce arbitrary drift using exponential change of measure dP /dP = exp(aX 1 + b) with appropriately chosen constants a, b ∈ R. But then where c > 0 is arbitrary. But as k → ∞ the lim sup n of this expression converges to P (|X 1 | > c), which can be made arbitrarily small. Thus (27) holds for an arbitrary linear Brownian motion, and the same argument works for (28).

Proof of Lemma 1. It is only required to show that
] is bounded for an arbitrarily large p and all n. Furthermore, we may again restrict our attention to a driftless Brownian motion by change of measure and Cauchy-Schwarz inequality. The fact that E[exp(θV (n) )] for any θ is bounded was established in [6], and so it is sufficient to show that is bounded. The right-hand side is increased by pulling the sum out. Using the explicit expression for F we see that it is left to consider where we used that Φ(4x) < c exp(−x). Moreover, V (n) can be dropped out, because of Cauchy-Schwarz inequality and boundedness of E[exp(pV (n) )]. Finally, use Lemma 6 to get the bound:

J. Ivanovs and M. Podolskij
The above is bounded by The first sum is finite, because the inequality between arithmetic and quadratic means, √ a 2 + b 2 + c 2 ≥ (|a|+|b|+|c|)/ √ 3, and the definition of Bessel-3 process imply that the respective terms are bounded by where Z is standard normal. By Tauberian theorem this quantity behaves as i −3/2 for large i with being a positive constant, and the first sum is indeed finite. The second sum can be treated using the arguments from Appendix C. In particular, we can show that P ↑ x (X 2 < x/2) < c exp(−x), and hence we are left to consider i exp(−ξ i /2) again. The proof is now complete.

A.4.2. Joint estimation: proof of Corollary 2
The only new ingredient needed is the joint convergence of sequences in (24) corresponding to the processes X and −X to their respective limits which are independent. Similar result appears in [7, Lem. 1] and only a minor adaptation is needed.

A.4.3. On simplified estimators: proof of Corollary 3
We only need to show that the analogue of (24) is true, where we take the respective 2k + 1 elements in the vectors on the left. One can not apply the continuous mapping theorem for the infinite sequences though. We consider truncated sequences, apply the continuous mapping theorem, and then show uniform negligibility of truncation. The latter follows from the fact that for any a > 0, which readily follows from the representation of ξ (n) as in Lemma 6 in the self-similar case.

A.4.4. Unknown parameters: proof of Proposition 1
We will show that n 1/α ( T mean n − T mean n ) P → 0 when α ∈ (1,2], and the same is true for the conditional median estimator for all α ∈ (0, 2]. The proof of continuity of the limit disributions follows similar steps, see also [29] for the convergence of the respective processes ξ. The above readily translates into Estimation of the supremum and occupation times 923 respectively. We focus on the class of strictly stable Lévy processes (the proof for the class (i) is similar but easier) and let X n be the process with parameters θ n . Furthermore we write F n and f n for the analogues of conditional distribution F and density f . We claim that it is sufficient to establish that F n converges to F continuously, i.e.
For this note that α n is arbitrarily close to α with high probability, and thus the arguments from the proof of Theorem 2 apply essentially without a change.
Thus we are left to prove (31) by reexamining the proof of Lemma 3. Firstly, we observe that (X τ + xn , τ + xn )1 {τ + xn <∞} under P θn weakly converges to the respective quantity under P, which follows by the (generalized) continuous mapping theorem and weak convergence of the Lévy processes. Secondly, the function converges to the obviously defined g(t, x, y) continuously on the domain t ∈ (0, 1), x ≥ 0, y ∈ R, which follows from continuous convergence of the density f n of X n 1 , see Lemma 10 below. Hence we have weak convergence of the quantity under the expectation in (20), and so it is left to show that the respective quantities are bounded. Lemma 10 completes the proof.
Proof. The characteristic function of X n t is given by exp(−c ± n |z| αn t) according to ±z > 0 with c ± n being a complex constant with positive real part (converging to c ± ), see [43,Thm. C.4]. Thus by inversion formula we have but this converges to 0 by the dominated convergence theorem, since the real parts of c ± n are positive and bounded away from 0. With respect to the second statement we need to show that is bounded for all t ∈ (0, 1), x ≥ and all n, where c n = c + n ; the integral over (−∞, −1] is hadled in the same way, whereas the rest is clearly bounded by 2. Using integration by parts we find that it is sufficient to show that Let us show that h i for i = 2, 4 are bounded and in L 1 (R). By Minkowski's and Jensen's inequality we have the bound h i (y) ≤ 2 i E[L 1 (y) i ]. Using additivity of L we deduce that where the latter moment is finite and τ y is the first passage time of X into the level y. Finally, note that ∞ 0 P(τ y < 1)dy = where the convergence is uniform on compact intervals of time. This immediately yields that The result now follows from [32, Thm. IX.7.28]. Moreover, we have a simple expression for v 2 l = h 2 (y)dy which is evaluated in Lemma 11 below. It is left to calculate v 2 l , which is the integrated reduction in variance when L 1 (y) is replaced by its conditional mean E[L 1 (y)|X 1 ]: Lemma 11. For a standard Brownian motion we have R E[(g(y, X 1 ) − L 1 (y)) 2 ]dy = 2 3 log(1 + locally uniformly in σ > 0. This is so, because g(x/σ, z/σ)/σ 2 satisfies the analogous bound, see (33). Writing x , z for x/σ, z/σ, respectively, we find from Lemma 2 for z ≥ x that ∂g(x/σ, z/σ)/∂σ = z ϕ(z ) − z Φ(z ) 2σϕ(z ) =: h(z ).
By L'Hôpitale and Mill's ratio this quantity tends to 0 as z → ∞, and thus this quantity is bounded for all z ≥ x ≥ 0 locally uniformly in σ > 0. Next, we consider z < x where Note that 2x − z > (x − z ) ∨ |z | and so the above terms stay bounded when 2x − z → 0 implying that x , z → 0. Moreover, z 2 /(2x − z ) 2 is bounded and so it is left to consider (1 + 2x (x − z )) exp(−2x (x − z )) as x → ∞. For x > z + 1 this is bounded by c exp(−x ) and otherwise by c, which is sufficient.
Proof of Proposition 2. Observe that According to (18) we may assume that n 1/4 |σ n − σ| < h for an arbitrary h > 0 and all large n. By mean value theorem and Lemma 13 we have an upper bound whereg(x, z) = c exp(−a|x| + a|z|). Butg verifies condition (B-0) in [30] and thus our upper bound converges to hL in probability, where L is a certain finite random variable, see [30,Thm. 1.1]. The proof is complete since h > 0 can be arbitrarily small. The corresponding proof for the occupation time follows exactly the same arguments.

Appendix C: On X conditioned to stay positive
Throughout this section we assume that α ∈ (0, 2) and β = ±1. Let us recall that (ξ (−t)− ) t≥0 is a Feller process and, as usual, we denote its law when started from x > 0 by P ↑ x ((X t ) t≥0 ∈ ·). Such a process can be seen as X conditioned to stay positive in a certain limiting sense, see [19,16] for the basic properties of this process. The law of (ξ t ) t≥0 is then (−X) conditioned to stay positive, and the following bound holds without a change.

Proposition 5.
There exists a constant c > 0 such that for all x, v > 0 with x > v we have The proof will be at the end of this section. Let us note that the restriction x > v can not be removed in the above bound. We start with a simpler result where h = 0:
Recall that f (y) ≤ c|y| −α−1 as y → ±∞, and hence the first integral is upper bounded by The following is an immediate consequence of the Doob's h-transform representation of the kernel; here h(x) = x αρ .

Lemma 16.
There exists c > 0 such that for all x > v > 0 we have Proof. We only show the first statement, since the second follows the same arguments. According to Lemma 15 we find that We may restrict the integration to the interval [−v/2, v/2] in view of Lemma 14.
Thus it is sufficient to establish that where F (v, y) ≤ cv −α according to Lemma 4; for bounded v the result is obvious. Finally, observe that P(X 1 > −x|X 1 = y) is bounded away from 0; here we may use Lemma 4 applied to the process −X. The proof is complete.
Proof of Proposition 5. Observe that the quantity of interest is upper bounded by Hence the bound follows from Lemma 16, which also holds for time 2 instead of 1; use e.g. self-similarity here.