Regenerative block-bootstrap confidence intervals for tail and extremal indexes

A theoretically sound bootstrap procedure is proposed for building accurate confidence intervals of parameters describing the extremal behavior of instantaneous functionals {f(Xn)}n∈N of a Harris Markov chain X, namely the extremal and tail indexes. Regenerative properties of the chain X (or of a Nummelin extension of the latter) are here exploited in order to construct consistent estimators of these parameters, following the approach developed in [10]. Their asymptotic normality is first established and the standardization problem is also tackled. It is then proved that, based on these estimators, the regenerative block-bootstrap and its approximate version, both introduced in [7], yield asymptotically valid confidence intervals. In order to illustrate the performance of the methodology studied in this paper, simulation results are additionally displayed. AMS 2000 subject classifications: Primary 60G70, 60J10, 60K20.


Introduction
As originally pointed out in [32], the extremal behavior of instantaneous functionals f (X) = {f (X n )} n∈N of a Harris recurrent Markov chain X may, just like the asymptotic mean behavior, be described through the regenerative properties of the underlying chain.Following in the footsteps of this seminal contribution (see also [2]), the authors have recently investigated the performance of regeneration-based statistical procedures for estimating key parameters related to the extremal behavior analysis in the Markovian setup; see [10].
In particular, special attention has been paid to the problem of estimating the extremal index of the weakly dependent sequence f (X), which measures to what extent extreme values tend to come in "small clusters"; refer to [15,11,18] for an account of this notion.Various extremal index estimators have been recently proposed in the statistical literature; see [1,21,23,29,30] for instance.These estimators generally rely on blocking techniques, where data segments of fixed (deterministic) length are considered in order to account for the dependence structure within the observations.Alternatively, an asymptotically valid methodology specifically tailored for regenerative sequences or pseudo-regenerative sequences has been proposed, based on data blocks of random length, corresponding to cycles in between successive regeneration times or approximate regeneration times.
Proceeding in the same vein, it has been established in [10] that a regenerative version of the Hill estimator, computed from the set of cycle submaxima, namely maximum values observed in between consecutive renewal times, yields consistent estimation of the tail index of f (X)'s 1-d marginal distribution in the (supposedly existing) stationary regime, in the case where the latter belongs to the Fréchet maximum domain of attraction.
It is the purpose of this paper to continue this approach by investigating the problem of constructing confidence intervals for the extremal and tail indexes.We first prove the asymptotic normality of the regeneration-based estimators considered and then show how to studentize the latter in order to build asymptotic Gaussian confidence intervals.Next, we propose to extend the range of application of the regenerative block-bootstrap (RBB in abbreviated form), respectively the approximate regenerative block-bootstrap (ARBB in abbreviated form), originally introduced in [8] for bootstrapping Markovian sample means, to the present setting.Asymptotic validity of the RBB and ARBB procedures, when applied to the regeneration-based index estimates, is established and empirical simulations have been carried out, in order to evaluate empirically its performance when compared to Gaussian asymptotic intervals.
The article is structured as follows.Notations are first set out in Section 2 and crucial notions related to the renewal properties of Harris Markov chains, that will be needed throughout the paper, are also briefly recalled.In Section 3, central limit theorems are stated for the regenerative versions of the "runs" and "blocks" estimators of the extremal index.Asymptotic normality of the regenerative Hill estimator is established and the studentization of these estimators is also investigated.Section 4 is devoted to the study of the RBB and ARBB methodology, when applied to the construction of confidence intervals based on the specific regeneration-based estimators considered.Finally, Section 5 displays preliminary simulation results, comparing the performance of bootstrap and Gaussian intervals.Technicalities are treated in the Appendix.

Preliminaries
Throughout the article, we will denote by X = {X n } n∈N a time-homogeneous Harris recurrent Markov chain, valued in a measurable space (E, E) with transition probability Π(x, dy) and initial distribution ν; see [28] for an account of the Markov chain theory.We also denote by P ν (respectively, by P x with x ∈ E) the probability measure on the underlying space such that X 0 ∼ ν (resp., X 0 = x) and by E ν [.] (resp., E x [.]) the corresponding expectation.We start off with recalling basic renewal properties of Harris Markov chains, while enhancing their connection with extremal behavior analysis.

Regenerative chains
Recall first that the chain X is said to be regenerative when it possesses a Harris recurrent atom, i.e., a Harris set A such that: In the atomic case, by virtue of the strong Markov property, the sequence {τ A (k)} k≥1 of successive return times to the atom forms a (possibly delayed) renewal process and more generally, the data segments, called regeneration cycles, determined by the times at which X forgets its past are i.i.d random variables valued in the torus T = ∪ ∞ n=1 E n : B 1 = (X τA(1)+1 , . . ., X τA(2) ), . . ., B j = (X τA(j)+1 , . . ., X τA(j+1) ), . . . .We denote by P A the conditional probability measure given X 0 ∈ A and by E A [.] the P A -expectation.
In the regenerative setup, stochastic stability properties classically boil down to checking conditions related to the speed of return to the regenerative set.It is well-known for instance that X is positive recurrent if and only if α = E A [τ A ] < ∞ [see Theorem 10.2.2 in 25], and its (unique) invariant probability distribution µ is then the Pitman's occupation measure given by µ The following assumptions are involved in the subsequent analysis.Let κ ≥ 1 and ν be any probability distribution on (E, E).

H(κ) : E
Cycle submaxima.Let f : (E, E) → R be a measurable function.Consider the submaximum of the instantaneous functional f (X) = {f (X n )} n∈N over the j-th cycle, j ≥ 1: It has been established in [32], see Theorem 3.1 therein, that, in the positive recurrent case, the distribution of the sampling maximum M n (f ) = max 1≤i≤n f (X i ) can be successfully approximated by the distribution of the maximum of ⌊n/α⌋ (roughly the mean number of cycles within a trajectory of length n) independent realizations of the cycle submaximum as n → ∞, provided that the first (non regenerative) data segment plays no role in the extremal behavior, i.e.
More precisely, under these assumptions we have where G f (x) = P A (max 1≤i≤τA f (X i ) ≤ x) for all x ∈ R.This shows that the tail behavior of the cycle submaximum's distribution G f (dx) governs the extremal behavior of the sequence f (X).

Regenerative extensions of general Harris chains
Although the class of regenerative Markov chains includes all chains with countable state space as well as many Markov models used in Operations Research for modeling queuing/storage systems, the existence of Harris regenerative set is a very restrictive assumption in practice, that is not fulfilled by most Harris chains.Here we briefly recall a theoretical construction, termed the splitting technique and originally introduced in [26], extending in some sense the probabilistic structure of a general Harris chain, so as to artificially build a regeneration set, together with a practical method for approximating the regenerative extension.
It is based on the notion of Harris small set.Recall that a Harris set S ∈ E is small for the chain X if there exist m ∈ N * , a probability measure Φ supported by S, and δ > 0 such that where Π m denotes the m-th iterate of Π. Roughly speaking, the small sets are the ones on which an iterate of the transition probability is uniformly bounded below.When (2) holds, one says that X fulfills the minorization condition M(m, S, δ, Φ).We point out that small sets do exist for Harris chains, see [22].Suppose now that condition (2) is satisfied.Rather than replacing the original chain by the chain {(X nm , . . ., X n(m+1)−1 )} n∈N , we take m = 1.The regenerative Markov chain into which X is embedded is constructed by expanding the sample space in order to define a specific sequence (Y n ) n∈N of independent Bernoulli r.v.'s with parameter δ.The joint distribution is obtained by randomizing the transition Π each time the chain X hits S, which occurs with probability one (recall that the chain X is Harris).In order to obtain an insight into this construction, observe first that, when X n ∈ S, the conditional distribution of X n+1 given X n may be viewed as the following mixture of which the second component is independent of X n .More precisely, the sotermed split chain {(X n , Y n )} n∈N is built the following way: suppose that X n ∈ S, if Y n = 1 (which occurs with probability δ ∈ ]0, 1[), X n+1 is drawn from Φ, otherwise (i.e. if Y n = 0, which happens with probability 1 − δ), X n+1 is drawn from (1 − δ) −1 (Π(X n , .)− δΦ(.)).Clearly, S × {1} is an atom for the split chain, the latter inheriting all the communication and stochastic stability properties from X.In particular the data segments in between consecutive visits to S × {1} are independent.
On approximating the regenerative extension.Unfortunately, the split chain is a theoretical construction and the Y n 's cannot be observed in practice.
A "plug-in" approach has been nevertheless proposed in [8], in order to generate, conditionally to X (n+1) = (X 1 , . . ., X n+1 ), a random vector ( Y 1 , . . ., Y n ) from (supposedly known) parameters (S, δ, Φ) in a way that its conditional distribution approximates the distribution of (Y 1 , . . ., Y n ) conditioned upon X (n+1) in a certain sense that will be specified below.Here we assume that the conditional distributions Π(x, dy) with x ∈ E are dominated by a σ-finite measure λ(dy) of reference, in a way that Π(x, dy) = π(x, y) • λ(dy) for all x ∈ E. This clearly implies that Φ(dy) is also absolutely continuous with respect to λ(dy), and that ∀x ∈ S, π(x, y) ≥ δφ(y), λ(dy) almost surely, where Φ(dy) = φ(y) • λ(dy).Given the sample path X (n+1) , the Y i 's are independent random variables.To be more precise, the conditional distribution of Y i is the Bernoulli distribution with parameter A natural way of mimicking the Nummelin splitting construction consists of computing first an estimate π n (x, y) of the transition density over S 2 based on the available sample path and such that π n (x, y) ≥ δφ(y) a.s.for all (x, y) ∈ S 2 , and then generating independent Bernoulli random variables Y 1 , . . ., Y n given X (n+1) , the parameter of Y i being obtained by plugging π n (X i , X i+1 ) into (4) in place of π(X i , X i+1 ).We point out that, from a practical viewpoint, it actually suffices to draw the Y i 's only at times i when the chain hits the small set S, Y i indicating whether the trajectory should be cut at time point i or not.Let l n = 1≤k≤n I{X k ∈ S, Y k = 1}.Proceeding this way, one gets the sequence of approximate regeneration times, namely the successive time points τ S (1), . . ., τ S ( l n ) at which (X, Y ) visits the set S × {1}.One may then form the approximate regeneration blocks B 1 , . . ., B ln−1 , as well as the approximate cycle submaxima: Knowledge of the parameters (S, δ, φ) of condition (3) is required for implementing this approximation method.A practical method for selecting those parameters in a fully data-driven manner is described at length in [9].The question of accuracy of this approximation has been addressed in [8].Under the following assumptions, a sharp bound for the deviation between the distribution of ((X i , Y i )) 1≤i≤n and that of the ((X i , Y i )) 1≤i≤n in the sense of the Mallows or Wasserstein distance has been established, which essentially depends on the rate ρ n of the uniform convergence of π n (x, y) to π(x, y) over S × S.
A1.The MSE of π is of order ρ n when error is measured by the sup norm over S 2 : where (ρ n ) denotes a sequence of nonnegative numbers decaying to zero at infinity.A2.The parameters S and φ are chosen so that inf x∈S φ(x) > 0. A3.We have sup (x,y)∈S 2 π(x, y) < ∞ and sup n∈N sup (x,y)∈S 2 πn (x, y) < ∞ P ν -a.s. .

Regeneration-based extreme value statistics
In this section, we recall how to construct estimators of the extremal and tail indexes based on the (approximate) cycle submaxima following in the footsteps of [10].For each estimator considered, asymptotic normality is established and the standardization problem is tackled.

Asymptotically normal estimators of the extremal index
A key parameter in the extremal behavior analysis of an instantaneous function {f (X n )} n∈N of the chain X is the extremal index θ ∈ (0, 1), measuring to what extent extreme values tend to come in "small clusters"; refer to [15,11] and [18] for an account of this notion.For a positive recurrent Markov chain X with limiting probability distribution µ and any measurable function for any sequence of real numbers ) in steady-state, i.e. under P µ .As already observed in [10], a positive recurrent chain is a fortiori strong mixing (cf Theorem A in [4]) and consequently satisfies Leadbetter's mixing condition D(u n ); see [24].
In the remainder of this subsection, the function f (x) is fixed and the index θ is assumed to be strictly positive.We point out that [31] have proved, under an extra technical assumption, that the extremal index of any geometrically ergodic Markov chain is strictly positive; refer to Theorem 4.1 therein.

The regenerative "blocks" estimator
As originally shown in [32], it follows from ( 1) and ( 6) that, for any sequence the survivor function of any cdf G(x), and the convention that 0/0 = 0.
In the regenerative case, from expression (7), which may be viewed as a regenerative version of the popular "blocks" estimator (see §8.1.2 in [15]), it has been proposed in [10] that: where, for all u ∈ R, i=1 I{X i ∈ A}, and the usual convention regarding empty summation and 0 0 = 0. Expectedly, a counterpart of this quantity in the general Harris case is obtained by replacing the regeneration cycle submaxima by their approximate versions in (8): where, for all These estimators have been proved consistent in [10] under mild moment assumptions, see Proposition 4 therein.For clarity's sake, we recall the related result.

Proposition 1 ([10]
). Suppose that θ > 0. Let (r n ) n∈N increase to infinity in a way that (i) In the regenerative case, suppose that H(ν, 1) and H(2) are fulfilled.Then, (ii) In the general case, assume that moment assumptions H(ν, 1) and H(4) are fulfilled by the split chain and in addition that conditions A 1 − A 3 are satisfied.Then, Remark 1 (On moment assumptions for the split chain).We point out that, in the pseudo-regenerative setup described in §2.2, a sufficient condition for condition H(κ) (respectively, for condition Practically, drift conditions of the Foster-Lyapounov type are used for checking such moment conditions; refer to Chapter 11 in [25] for further details. Remark 2 (On the empirical choice of the threshold sequence).In practice, the threshold sequence {v n } must be picked by the statistician.A natural choice, based on the available sample, consists of taking in the pseudoregenerative case) and one may easily show that assertion (i) (resp., assertion (ii)) of Proposition 1 remains valid.
The next result reveals that, for a fixed threshold u ∈ R, the asymptotic distribution of the quantity (8), respectively (9), is Gaussian.The technical proof is given in the Appendix section.
(i) In the regenerative case, under assumptions H(2) and H(ν, 1), there exists where ⇒ denotes the convergence in distribution.(ii) In the pseudo-regenerative case, if the moment assumptions H(ν, 1) and H( 4) are fulfilled by the split chain and if conditions A 1 −A 3 are in addition satisfied, then As shown in Theorem 2's proof, the asymptotic variance is given by where These quantities may be straightforwardly estimated by computing their empirical counterparts based on the (approximate) regeneration cycles.However, the following result shows that, for a properly chosen threshold sequence {v n }, increasing to infinity at a suitable rate, the second and third terms on the right hand side of ( 14) vanish, while the first one converges to (αη (i) In the regenerative case, provided that assumptions H(2) and H(ν, 1) are fulfilled, the following convergence in distribution holds: 1232 P. Bertail et al.
(ii) In the pseudo-regenerative case, if the split chain satisfies H(ν, 1) and H(4) and conditions A 1 − A 3 hold, we have the following convergence: We point out that, under the maximum domain of attraction (MDA) assumption combined with additional technical conditions, the asymptotic bias may be proved to vanish.Indeed, recall that, under the assumption that θ > 0, the probability distributions G f (dx) and F (dx) necessarily belong to the same MDA.Suppose for instance that they belong to the Fréchet MDA.There exists then a > 0 such that one may write Ḡf ( where L 1 (x) and L 2 (x) are slowly varying functions.In this setup, the extremal index is thus proportional to the limiting ratio of these two functions: Assume in addition that some second-order Hall-type conditions are fulfilled, in the regenerative case, and a similar result holds true in the pseudoregenerative case.

The regenerative "runs estimator"
Using the regenerative method, it has been proved in [32] that θ may be expressed as a limiting conditional probability: Based on a path X 1 , . . ., X n , the natural empirical counterpart of (17) in the regenerative setting is Insofar as (17) measures the clustering tendency of high threshold exceedances within regeneration cycles only, it should be seen as a "regenerative version" of the runs estimator obtained by averaging over overlapping data segments of fixed length r.
In the pseudo-regenerative case, a practical estimate is built by means of the approximate regeneration times: Beyond its practical advantage (blocks are here entirely determined by the data), the estimator (18) may be proved strongly consistent as stated in the first part of the next theorem, while only weak consistency has been established for (19) but for a wider class of weakly dependent sequences; see [21].
Theorem 4. Let r n increase to infinity in a way that r n = o( n/ log log n) as n → ∞.
(i ′ ) Similarly, if the split chain fulfills moment conditions H(ν, 1) and H(4) and conditions A 1 − A 3 hold, then weak consistency holds in the pseudo regenerative case: (ii) In the regenerative case, provided that assumption H(ν, 1) is fulfilled, the following convergence in distribution also holds: (ii ′ ) In the pseudo-regenerative case, if the split chain satisfies H(4) and H(ν, 1) and conditions A 1 − A 3 hold, we have the following convergence: The last statement of the preceding theorem and Proposition 3 (i) constitute the regenerative versions of Theorems 3 and 4 in [33], who first proved the CLT for the classical runs estimator (based on blocks of fixed length, cf. ( 19)).The proof of the preceding theorem follows the lines of those of Proposition 1, Theorem 2 and Proposition 3, as sketched in the appendix section.

Asymptotic normality of the regeneration-based Hill estimator
In this section, we assume that θ > 0 and hence, as recalled in the previous section, the distributions G f (dx) and F (dx) belong to the same MDA.We assume here that they belong to the Fréchet MDA.In the regenerative setting, a natural way of estimating F 's tail index, proposed in [10], thus consists in computing a Hill estimate of G f 's tail index from the observed cycle submaxima: with 1 ≤ k ≤ l n − 1 when l n > 1, denoting by ζ (j) (f ) the j-th largest submaximum.As l n → ∞, P ν -almost surely as n → ∞, asymptotic results established in the case of i.i.d.observations extend straightforwardly to our setting, see part (i) of Theorem 5 below.We point out that in the i.i.d.setup one may take the whole state space as an atom, i.e.A = E, each cycle comprises then a single observation and ( 23) reduces to the standard Hill estimator.
In the general Harris case, one may naturally build an estimate by replacing the cycle submaxima by their approximate versions: with 1 ≤ k ≤ ln −1 when ln > 1 and denoting by ζ (j) (f ) the j-th largest approximate submaximum.It is shown in Proposition 5 of [10] that the approximation step does not compromise the consistency of the estimator, provided that the estimator of π(x, y) over S 2 is accurate enough.In order to establish a rate of convergence, we will also consider the case where the transition estimate used in the approximation stage is computed from a trajectory of length N >> n and will denote by k,n, the corresponding estimator.The consistency and the asymptotic normality of these estimators have been shown in [10] under the Von Mises condition recalled below; see Proposition 5 therein.
VM assumption (Von Mises condition, [19]).Let ρ ≤ 0. Suppose Ḡf where b(x) is a measurable function of constant sign, and with, by convention, (t Here, we formulate a central limit theorem in a more general fashion, revealing a bias-variance trade-off similarly to [13] in the i.i.d.setup.The proof is omitted as it follows by a straightforward modification of the proof of proposition 5 in [10] and the references therein. Theorem 5. Assume that F belongs to the Fréchet MDA and the VM assumption holds and consider an increasing sequence of integers {k(n)} such that: (i) then, in the regenerative case, the following convergence in distribution holds (ii) in the pseudo-regenerative case, if conditions A 1 − A 3 are in addition fulfilled, let (m n ) n∈N be a sequence of integers increasing to infinity such that

Regenerative block-bootstrap confidence intervals
In this section, we recall the principle underlying the (approximate) regenerative block-bootstrap, originally introduced in [8] for bootstrapping Markovian sample means, and establish its asymptotic validity when applied to the estimators described in the preceding section.

The RBB and ARBB principle
Practically, the RBB algorithm, respectively the ARBB algorithm, applies to any statistic T n = T (B 1 , . . ., B ln−1 ), based on the cycles, respectively approximate cycles, with standardization σ n = σ [T (B 1 , . . ., B ln−1 )].For notational simplicity, regeneration cycles and their approximate versions are here denoted in the same manner.The resampling scheme consists of mimicking the underlying renewal structure by drawing data blocks with replacement until a trajectory of roughly length n is built.In this way, the randomness in the number of renewals is reproduced during the procedure and, conditionally to the original data, the bootstrap series thus generated is regenerative.
If one is just interested in asymptotic results, one may just draw l n − 1 i.i.d blocks (conditionally to the trajectory so that l n is fixed in the bootstrap procedure).(1/2, 1) for the parameter of interest are obtained by computing the bootstrap root's quantiles q * α/2 and q * 1−α/2 , of orders α/2 and 1 − α/2 respectively (in practice, the latter are approximated in a Monte-Carlo fashion by iterating steps 2-3): the basic Percentile bootstrap CI is simply the Percentile bootstrap CI is defined as and the t-Percentile bootstrap CI is given by where t * p is the p th quantile of the studentized bootstrap root Remark 3 (Gaussian confidence intervals).These bootstrap CI's can be compared to asymptotic CI's classically built from the statistic and its standardization where Φ −1 p is the p th quantile of the standard normal distribution, or replacing σ n / √ n with a new standardization estimator defined as the empirical standard deviation of T * n given by σ

Asymptotic validity of RBB and ARBB distribution estimates
The results stated below show that the bootstrap procedure described in the previous subsection is asymptotically valid.Let P * (.) be the conditional probability given the observed trajectory.The following assertions hold true.
Theorem 6. 1. ("Blocks" estimator) Suppose that the assumptions of Theorem 2 are fulfilled.Let θ n (u) denote the estimator θ n (u) in the regenerative case, θ n (u) in the pseudo-regenerative case, and let θ * n (u) be its bootstrap counterpart.Then, we have, as n → ∞: 2. ("Runs" estimator) Suppose that the hypotheses of Theorem 4 are satisfied.Denote by θ ′ n (u) the estimator θ ′ n (u) in the regenerative case, θ ′ n (u) in the pseudo-regenerative case, and let θ ′ * n (u) be its bootstrap counterpart.Then, we have, as n → ∞: Such results may also be used to estimate the mean-square error of √ n(θ n (u)− θ(u)) and to calibrate the level u by minimizing the MSE, in the same spirit as [20] or [12], and as illustrated in the simulation section.

Markov subsampling and the Hill estimator
As claimed by the following proposition, the RBB and ARBB algorithms can also be successfully applied to tail index estimation provided that the sequential drawing (step 2 in the previous algorithm) is replaced with a subsampling drawing without replacement; see [14].Proving that the procedure is still valid in the absence of subsampling deserves a much more thorough analysis, far beyond the scope of this paper.We thus introduce the following subsampling variant of Algorithm 1.Let m n > 1 such that m n → +∞ and m n /n → 0 as n → +∞.If we assume in addition that k( l mn )/k(l n ) → 0, we then have, as n → +∞, where In the subsampling context, higher order accuracy cannot be established.It is thus sufficient to consider a simple form of the standardization in order to prove the asymptotic validity.The issue of choosing the subsampling size m n and the tuning parameter k is discussed in the next section.

Simulation results
In this section, we present illustrative simulation results to provide empirical evidence of the nice behavior of the estimators and confidence intervals proposed in this paper.Whenever possible, a comparison with other estimators and confidence intervals is conducted.

Regenerative examples
Considering waiting times of certain queuing processes, we compute and discuss the regeneration-based "blocks" and "runs" estimators of the extremal index and the regeneration-based Hill estimator of the tail parameter.
Regeneration-based extremal index estimators.We first consider the waiting times of an M/M/1 process (cf [3]) with parameters λ = 0.2, µ = 0.8 and sample path length n.As underlined in [10], there exists a closed analytical form for the extremal index in this case; it is equal to θ = (1 − λ/µ) 2 = 0.5625 and all the required assumptions are satisfied.The estimators θ n (u) and θ ′ n (u) of the extremal index proposed in this paper are both defined based on a threshold u, supposed to be large.RBB confidence intervals.Figures 1(a) and 1(b) show the asymptotic and bootstrap confidence intervals of the regenerative "blocks" estimator and the regenerative "runs" estimator, respectively.These CI's are quite similar except for the largest values of u.In the sequel, when a bootstrap CI is computed, it will be the basic percentile bootstrap confidence interval.The coverage probabilities of the basic bootstrap percentile CI for the M/M/1 waiting process is estimated over M = 300 trajectories, as shown in Figure 2.
Choosing the threshold.As mentioned after Theorem 6, the threshold u can be chosen by minimizing an estimation of the mean-square error of √ n(θ n (u) − θ(u)).The optimal threshold value u * can therefore be determined as RBB confidence intervals for extremes 1239 0.90 0.92 0.94 0.96 0.98 1.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 (a) Regenerative "Blocks" estimator 0.90 0.92 0.94 0.96 0.98 1.00 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 (b) Regenerative "Runs" estimator Fig 1 .Extremal index estimation for waiting times of the M/M/1 queue with λ = 0.2, µ = 0.8, θ = 0.5625 (the x-axis gives the percentiles of the simulated (Wn), n = 1000, B = 199 bootstrap samples, solid red for the regenerative estimator, solid black for the mean bootstrap estimator, dashed red for the basic percentile bootstrap CI, dotted red for the percentile bootstrap CI, dashed green for the t-percentile bootstrap CI, dashed blue for the asymptotic CI based on the regenerative standardization, dashed light blue for the asymptotic CI based on the bootstrap standardization, horizontal black line is θ, vertical dashed red line is the optimal u value as determined by minimizing (28).Coverage probabilities of the basic percentile bootstrap CI for the regenerative "blocks" estimator and the regenerative "runs" estimators.M/M/1 queue with λ = 0.2, µ = 0.8, θ = 0.5625 (the x-axis gives the percentiles of the simulated (Xn), n = 1000, 1 − α = 95%-CI, B = 199 bootstrap samples, M = 300, the solid blue curve is that of the "blocks" estimator, the dashed red curve is that of the "runs" estimator). with , where θ * n (u) is the mean of the bootstrap statistics.The same process can be applied to the regenerative "runs" estimator.Applying this to the M/M/1 queue yields θ * θ n (u * ) = 0.5263 with CI (0.4431 .6610)(which includes the targeted extremal index 0.5625).Comparison of our bootstrap basic Percentile CI to that proposed in [16].(B = 199, dashed blue for ours and dashed green for theirs, solid black is the true θ).
Another possibility (which does not require any bootstrap) arises from the fact that the ratio of the asymptotic variances of our 2 regenerative estimators is asymptotically constant for a properly chosen sequence of thresholds u n , see Theorem 3, assertion (i) and theorem 4, assertion (ii)1 .)Hence, one may define an optimal threshold value u * , and hence a unique estimator of the extremal index, by minimizing in u the function and defining θ * = θ n (u * ).Applying this process to the MM1 queue yields θ * = 0.5179 with CI (0.4947 .6370)(which covers the targeted extremal index 0.5625).
Alternative estimators.In [10], the regenerative blocks estimator was compared to the intervals estimator proposed by [16] and to various fixed lengths block estimators and runs estimators (see Fig. 2 therein).Its mean squared error was generally lower than those of the alternative estimators.As far as CI's are concerned, the authors of [16] also proposed a bootstrap procedure based on an automatic declustering of the process relying on the estimation of the extremal index (see section 4 therein).Figure 3 illustrates that our bootstrap CI is much sharper than theirs on this example.This remains true for values of θ slightly higher (e.g.0.75, 0.81).
Regeneration-based Hill Estimator.We now consider the waiting times of an M/G/1 process with Pareto service times, with parameters λ = 0.2 and a = 3.The subsampling size was fixed at m n = ⌊n/ log(n)⌋.For each of the M trajectories, for each of the B bootstrap samples, the regenerative Hill estimator is first computed for various values of k, from k = 10 to the number of blocks k = l mn .The optimal k is then determined by computing a bias corrected Hill estimator (as in [6,17]) and choosing the value k * that minimizes the estimated MSE where H k, n is a bias corrected version of the Hill estimator.The regenerative standardization is then computed as H k * ,n / √ k * .Results of this simulation are presented in Table 1.Note that the basic percentile CI and asymptotic CI with bootstrap variance have the best coverage probabilities and are also very easy to compute (it does better than the asymptotic CI which has however the advantage of not requiring the bootstrap resampling).Regarding the choice of the subsampling size m n , various values were tested and larger values do keep a nice coverage probability with reduced mean length.The application of Algorithm 1 yields particularly nice results questioning the validity of such procedure for the regenrative Hill estimator and hence the validity of the bootstrap of the Hill estimator in the i.i.d.case as well (theoretical work in progress).
Alternative estimator.In [10], the regenerative Hill estimator is compared to the standard Hill estimator computed directly from the longest waiting times, as proposed by [27].The same bias correction method was applied to the standard Hill estimator in order to determine the optimal k value.In their paper [27], the authors do not propose any confidence interval for their estimator but one could compute a bootstrap CI as proposed in [5] in the iid case (the principle is to resample directly the log differences that are iid exponential rather than the upper statistics).This approach results in very small CI's that fail to compensate for the fact that the Standard Hill estimator is quite bad on this example and hence have a null coverage probability; see the last line of Table 1.

Pseudo-regenerative examples
We now turn to examples for which a regenerative extension must be approximated and show that this additional step does not damage the accuracy of the method.Approximate regeneration-based extremal index estimator.For the pseudo regenerative case, we consider a first order autoregressive model with Cauchy noise, with parameters ρ = 0.8 and σ = 1, yielding an extremal index θ equal to 1 − ρ, see [10] for details, namely section 5.2 therein for a precise description of the construction of the pseudo-blocks.The bootstrap CI's and their coverage probabilities are shown in Figure 4.Note that the percentiles of X used for the "runs" estimator are a lot lower than those used for the "blocks" estimator.The CI's for the "blocks" estimator are better than those of the "runs" estimator in terms of coverage probability.
Approximate regeneration-based Hill estimator.With the AR(1)-Cauchy example again, we investigated the estimation of the tail index equal to 1 here, see [10] for details.The regenerative Hill estimator was computed for M = 100 trajectories of length n = 10, 000, using a subsampling size m n = n/ log(n) = 1, 085 and B = 199 bootstrap replications in each case: we obtained H n, k * = 1.14 for k * = 104 (sd = 0.111) with a basic percentile bootstrap CI of (0.592 − 1.945); a coverage probability of 94% and a mean length of 1.457.Again, when the subsampling size is increased, the coverage probability remains aroung the desired 95% while the mean length of the CI is drastically reduced, which puts questions to the validity of the full regenerative bootstrap for the regenerative Hill estimator as proposed in Algorithm 1 and used for the regenerative extremal index estimators.

Appendix A: Technical Proofs
A.1.Proof of Theorem 2 For assertion (i), observe that θ n (u) is simply the ratio of the components of the bivariate vector ( Ḡf,n (u), Σ f,n (u)) ′ , which is asymptotically normal under the specified moment conditions (see the proof of the CLT stated in Theorem 17.2.2 of [25]) for the atomic case: as n → ∞, and with Application of the Delta method finally yields (12), with 4  .
The demonstration of assertion (ii) relies on similar arguments regarding the asymptotic normal behavior of the bivariate vector (1 − G f,n (u), Σ f,n (u)) ′ obtained from the CLT stated in Theorem 17.3.6 of [25].

A.2. Proof of Proposition 3
A preliminary step consists of studying the behavior of the various components of σ 2 f (v n ) as n → ∞, as stated in the next lemma.Lemma 8. We have H3 Finally, the condition on the variance is also satisfied since: denoting by V A (.) the variance under P A (.).Note that the assumptions H(2) and H(ν, 1) are only needed to ensure that σ 2 (u) is defined for all u, and the length of the first non regenerative block has a finite first order moment.Consequently, our regeneration-based extremal index estimator does not differ much from that with denominator (l n − 1)/n n i=1 I{f (X i ) > u}.

A.3. Proof of Theorem 4
Writing (17) as the ratio of G 1 f (u) = P A ({max 2≤i≤τA f (X i ) ≤ u}∩{f (X 1 ) > u}) and F 1 (u) = P A (f (X 1 ) > u), then the regenerative "runs" estimator given in ( 18) is simply the ratio of the empirical counterparts of the probabilities, which are denoted G 1 f,n (u) and F 1 n (u) in the sequel.Hence, the proof of (i) and (ii) exactly follows that of the strong consistency of the regenerative blocks estimator, (8), provided in [10], and that of its asymptotic normality given above, provided that the next lemma holds true under the stated assumptions.Lemma 9. Since Ḡf (u) ∼ G 1 f (u), F (u) ∼ F 1 (u), we can state a LIL for G 1 f (u), a LIL for F 1 (u) and get asymptotic equivalences similar to those stated in the previous lemma.Let r n ↑ ∞ in a way that r n = o( n/ log log n) as n → ∞, considering (v n ) n∈N such that r n (1 − F (v n )) → η < ∞ as n → ∞, we have r n F 1 (v n ) → η/θ.
More precisely, we can state the asymptotic normality of the bivariate vector (G 1 f,n (u), F 1 n (u)) ′ , for all fixed u, . Note that because of the specific "probability" form of the covariance terms here (they are all lower or equal to one), we do not need extra moment assumptions as we did in the case of the regenerative blocks estimator.
An application of the Delta method finally yields the following asymptotic variance for all fixed u Since r n F 1 (v n ) → η/θ, we can identify s 2 in ( 21) and ( 22) as αθ 2 (1 − θ).A formal application of the Lindeberg-Feller theorem similarly to the regenerative blocks estimator proof (the X n 's are bivariate) leads to the same result and it is easily seen that, because only indicator functions are involved, no moment assumption on τ A is needed.
For the pseudo-regenerative version (i') and (ii'), it is sufficient to observe that under the stated assumptions, we can prove similarly to theorem 2 in [10], that sup x∈R | Ĝ1 f,n (x) − G 1 f,n (x)| = O Pν (R n (π n , π) 1/2 ), as n → ∞, and similarly to Lemma 6.2 in [8], that we have

A.4. Proof of Theorem 6
The bootstrap version of the "Blocks" estimator of the extremal index is given by the ratio of the bootstrap version bivariate vector ( Ḡf,n (u), Σ f,n (u)) ′ .Since by [8], the RBB and ARBB are asymptotically valid, it follows immediately that, for fixed u, √ n(θ * n (u) − θ n (u)) has the same limiting distribution as √ n(θ n (u) − θ(u)).The same result remains valid even if l n is fixed in the bootstrap procedure.The same arguments may be used for the "runs" estimator.

A.5. Proof of Theorem 7
When using l mn , the proposed procedure boils down to a subsampling procedure in an i.i.d framework.Using continuity and standard U-statistics arguments (see [14]), by mimicking the proof of [14]  The first equality is a straightforward consequence of the continuity of the limiting distribution of the Hill estimator and of the assumption stating that k(l n mn n )/k(l n ) → 0. Now using the fact that m n → ∞ and the fact that the Hill estimator so normalized has a nondegenerate distribution, we get the result of Theorem 7.

Algorithm 2 .Theorem 7 .
RBB subsampling 1. (Blocks.)As described in step 1 of Algorithm 1. 2. (Subsampling drawing.)Choose a subsampling size m n large enough but small compared to n and compute l mn as the observed number of blocks in a stretch of length m n : typically, l mn is of order [ mn EAτA ] and is thus asymptotically equivalent to l mn = l n mn n , where [x] is the integer part of x.Draw l mn bootstrap data blocks B * 1 , . . ., B * k by sampling without replacement in the blocks B 1 , . . ., B ln−1 .3. (Subsampling statistics.)Apply steps 3 and 4 of Algorithm 1 to the reconstructed RBB sample path X * (n) = (B * 1 , . . ., B * lm n −1 ).Suppose that the assumptions of Theorem 5 are fulfilled.Denote by ξ n,k the estimator ξ n,k in the regenerative case, ξ n,k in the pseudo-regenerative case, and let ξ * n,k be its subsampling counterpart.1238 P. Bertail et al.
Fig 3. Comparison of our bootstrap basic Percentile CI to that proposed in[16].(B = 199, dashed blue for ours and dashed green for theirs, solid black is the true θ).

Table 1
Confidence intervals around tail index estimators: M/G/1 queue with Pareto service times λ = 0.5, 1/a = 1/3, sample path of length n = 10, 000, mn = [n/ log(n)] = 1, 085, B = 199 bootstrap samples, M = 300 Monte-Carlo replications to compute the coverage probabilities, mean lengths of the CI's and mean squared error of the estimator (MSE) -a Based on the bootstrap variance -b the last line refers to the Standard Hill estimator while the rest of the table refers to the Regenerative Hill estimator