Electronic Journal of Statistics

In extreme value theory, the so-called extreme-value index is a parameter that controls the behavior of a distribution function in its right tail. Knowing this parameter is thus essential to solve many problems re- lated to extreme events. In this paper, the estimation of the extreme-value index is considered in the presence of a random covariate, whether the conditional distribution of the variable of interest belongs to the Frechet, Weibull or Gumbel max-domain of attraction. The pointwise weak consis- tency and asymptotic normality of the proposed estimator are established. We examine the finite sample performance of our estimator in a simulation study and we illustrate its behavior on a real set of fire insurance data.


Introduction
The problem of studying extreme events arises in many fields of statistical applications. In hydrology, one could for instance be interested in forecasting the maximum level reached by the seawater along a coast over a given period, or studying extreme rainfall at a given location; in actuarial science, it is of primary interest for a company to estimate the probability that a claim which represents a threat to its solvency is filed. The pioneering result in extreme value theory, known as the Fisher-Tippett-Gnedenko theorem (see Fisher and Tippett [13] and Gnedenko [19]) states that if (Y n ) is an independent sequence of random copies of a random variable Y such that there exist normalizing nonrandom sequences of real numbers (a n ) and (b n ), with a n > 0 and such that the sequence 1 a n max 1≤i≤n Y i − b n converges in distribution to some nondegenerate limit, then the cumulative distribution function (cdf) of this limit can necessarily be written y → G γ (ay + b), with a > 0 and b, γ ∈ R where G γ (y) = exp −(1 + γy) −1/γ if γ = 0 and 1 + γy > 0, exp (− exp(−y)) if γ = 0.
If the aforementioned convergence holds, we shall say that Y (or equivalently, its cdf F ) belongs to the max-domain of attraction (MDA) of G γ , with γ being the extreme-value index of Y , and we write F ∈ D(G γ ). Clearly, γ drives the behavior of F in its right tail: • if γ > 0, namely Y belongs to the Fréchet MDA, then 1 − G γ is heavytailed, i.e. it has a polynomial decay; • if γ < 0, namely Y belongs to the Weibull MDA, then 1 − G γ is shorttailed, i.e. it has a support bounded to the right; • if γ = 0, namely Y belongs to the Gumbel MDA, then 1 − G γ has an exponential decay.
The knowledge of γ is therefore necessary to tackle a number of problems in extreme value analysis, such as the estimation of extreme quantiles of Y , which made its estimation a central topic in the literature. Recent monographs on extreme value theory and especially univariate extreme-value index estimation include Beirlant et al. [3] and de Haan and Ferreira [21].
In practical applications, it is often the case that the variable of interest Y can be linked to a covariate X. In this situation, the extreme-value index of the conditional distribution of Y given X = x may depend on x; the problem is then to estimate the conditional extreme-value index x → γ(x). Motivating examples in the literature include the description of the right tail of the distribution of claim sizes in insurance or reinsurance (see [3]), the estimation of the maximal production level as a function of the quantity of labor (see Daouia et al. [6]), studying extreme temperatures as a function of various topological parameters (see Ferrez et al. [12]), the estimation of some quantitative physical characteristics of Martian soil (see Gardes et al. [17]), or analyzing extreme earthquakes as a function of the location (see Pisarenko and Sornette [26]).
In most recent works, this problem has been addressed in the "fixed design" case, namely when the covariates are nonrandom. For instance, Smith [27] and Davison and Smith [10] considered a regression model while Hall and Tajvidi [22] used a semiparametric approach in this context; a nonexhaustive list of fully nonparametric methods include Davison and Ramesh [9] for a local polynomial estimator, Chavez-Demoulin and Davison [5] for a method using splines, Gardes and Girard [14] for a moving window approach and Gardes and Girard [15] who used a nearest neighbor approach.
By contrast, the case when the covariate is random, which is very interesting as far as practical applications are concerned, has only been tackled in even newer works. In the actuarial science setting, one could for instance think of a situation in which an insurance firm covers damage done to policyholders by natural disasters: a typical covariate in this case is the location where a natural disaster happens. Another situation, which will be examined in this paper, is the case of an insurance firm covering damage done by fire accidents: a possible covariate of the claim size is the total sum insured by the firm. We refer to Wang and Tsai [28] for a maximum likelihood approach, Daouia et al. [7] who used a fixed number of nonparametric conditional quantile estimators to estimate the conditional extreme-value index, Gardes and Girard [16] who generalized the method of [7] to the case when the covariate space is infinite-dimensional, Goegebeur et al. [20] who studied a nonparametric regression estimator and Gardes and Stupfler [18] who introduced a smoothed local Hill estimator. Besides, the method of [7] was recently generalized in Daouia et al. [8] to a regression context with a response distribution belonging to the general max-domain of attraction: the latter study is the only one in this list which is not restricted to the case of the Fréchet MDA.
The aim of this paper is to introduce a moment estimator of the conditional extreme-value index, working in the three domains of attraction. In Section 2, we define our estimator of the conditional extreme-value index. The pointwise weak consistency and asymptotic normality of the estimator are stated in Section 3. The finite sample performance of the estimator is studied in Section 4. In Section 5, we illustrate the behavior of the proposed estimator on a real set of fire insurance data. Proofs of the main results are given in Section 6 and those of the auxiliary results are postponed to Section 7.

Estimation of the conditional extreme-value index
Let (X 1 , Y 1 ), . . . , (X n , Y n ) be n independent copies of a random pair (X, Y ) taking its values in E × (0, ∞) where E is a metric space endowed with a metric d. For all x ∈ E, we assume that the conditional survival function (csf) F (·|x) = 1 − F (·|x) of Y given X = x belongs to D(G γ(x) ). Specifically, we shall work in the following setting: (M 1 ) Y is a positive random variable and for every x ∈ E, there exist a real number γ(x) and a positive function a(·|x) such that the left-continuous inverse U (·|x) of 1/F (·|x), defined by U (z|x) = inf{y ∈ R | 1/F (y|x) ≥ z} for every z ≥ 1, satisfies Model (M 1 ) is the conditional analogue of the classical extreme-value framework, see for instance [21], p.19. In this model, for every x ∈ E, the function U (·|x) has a positive limit U (∞|x) at infinity; the function U (∞|·) is called the conditional right endpoint of Y .
We now introduce our estimator, which is an adaptation of the moment estimator of Dekkers et al. [11]. To this end, we let, for an arbitrary x ∈ E and h = h(n) → 0 as n → ∞, N n (x, h) be the total number of observations in the closed ball B(x, h) having center x and radius h: where 1l {·} is the indicator function. The purpose of the bandwidth sequence h(n) is to select those covariates which are close enough to x. Given N n (x, h) = p ≥ 1, we let, for i = 1, . . . , p, Z i = Z i (x, h) be the response variables whose associated covariates W i = W i (x, h) belong to the ball B(x, h). Let further Z 1,p ≤ · · · ≤ Z p,p be the related order statistics (this way of denoting order statistics shall be used throughout the paper) and set for j = 1, 2 . . , p − 1} and 0 otherwise. Given N n (x, h) = p, the random variable M (j) n (x, k x , h) is then computed by using only the response variables whose values are greater than the random threshold Z p−kx,p and whose associated covariates belong to a (small) neighborhood of x. For j = 1, this statistic is an analogue of Hill's estimator (see Hill [24]) in the presence of a random covariate; see also [15] for a nearest neighbor analogue of this quantity in the fixed design case. Our estimator, in the spirit of [11], is then The assumption that Y is a positive random variable makes the quantities M (j) n (x, k x , h) well-defined for every k x . This simplifies somewhat a couple of technical results (see for instance Lemma 3). We point out that since we shall only compute our estimator using upper order statistics of the Z i , this hypothesis may be replaced by the assumption U (∞|x) > 0 for every x ∈ E, at the price of extra regularity conditions on the joint cumulative distribution function F of the pair (X, Y ).

Weak consistency
We first wish to state the pointwise weak consistency of our estimator. To this end we let, for x ∈ E, n x = n x (n, h) = nP(X ∈ B(x, h)) be the average total number of points in the ball B(x, h) and we assume that n x (n, h) > 0 for every n. Let k x = k x (n) be a sequence of positive integers; furthermore, let F h (·|x) be the conditional cdf of Y given X ∈ B(x, h): and U h (·|x) be the left-continuous inverse of 1/F h (·|x). For u, v ∈ (1, ∞) such that u < v, we introduce the quantity Recall the notation a ∧ b = min(a, b) and a ∨ b = max(a, b) for a, b ∈ R. Our consistency result is then: We assume that n x → ∞, k x → ∞, k x /n x → 0 and for some δ > 0 Theorem 1 is the conditional analogue of the consistency result proven in [11]; see also [21], Theorem 3.5.2. As far as the hypotheses of Theorem 1 are concerned, note that conditions n x → ∞, k x → ∞ and k x /n x → 0 are standard hypotheses for the estimation of the conditional extreme-value index: they are the exact analogues of the conditions n → ∞, k = k(n) → ∞ and k/n → 0 needed to ensure the convergence of Hill's estimator. Moreover, condition n x → ∞ is necessary to make sure that there are sufficiently many observations close to x, which is a standard assumption in the random covariate case.
Condition (1) is somewhat harder to grasp. To analyze this hypothesis further, we introduce the following conditions: is a continuous increasing function on (1, ∞) and for every y ∈ R, the function F (y|·) is continuous on E.
We may now state the following result, which relates the behavior of the function log U h (z|·) around x to that of log U (z|·): Then for every x ∈ E and for every z > 1, it holds that Note that if E = R d with X having a probability density function f on this space and with x being such that f (x) > 0 and f is continuous at x, the condition ∀x ′ ∈ B(x, h), ∀r > 0, P(X ∈ B(x ′ , r)) > 0 appearing in Proposition 1 is satisfied when n is large enough.
With this result at hand, we define for u, v ∈ (1, ∞) such that u < v:

Proposition 1 entails that
Consequently, if conditions (E) and (A 1 ) are satisfied, a sufficient condition for (1) to hold is which is a hypothesis on the uniform oscillation of log U in its second variable.
To understand more about condition (2), we introduce an additional regularity assumption: The function γ is a continuous function on E.
If we omit the case γ(x) = 0 of the Gumbel MDA, then under (A 2 ), condition (2) can be made more explicit: • If γ(x) > 0, namely F (·|x) belongs to the Fréchet MDA, then Lemma 1.2.9 in [21] entails that a(·|x)/U (·|x) converges to γ(x) at infinity. Condition (2) then becomes Since γ is continuous, one has γ(x ′ ) > 0 for x ′ close enough to x. Corollary 1.2.10 in [21] then yields for n large enough and every where for every x ′ ∈ B(x, h), L(·|x ′ ) is a slowly varying function at infinity. Letting be Karamata's representation of L(·|x ′ ) (see Theorem 1.3.1 in Bingham et al. [4]), where c(·|x ′ ) is a positive Borel measurable function converging to a positive constant at infinity and ∆(·|x ′ ) is a Borel measurable function converging to 0 at infinity, condition (3) is thus a consequence of the convergences log n x sup Besides, if γ, log c and ∆ satisfy some sort of Hölder condition, for instance sup and sup for some α ∈ (0, 1] as n → ∞, then condition (3) becomes h α log n x → 0 as n → ∞. The regularity conditions above are fairly standard when estimating the conditional extreme-index in the Fréchet MDA, see for instance [7].
Furthermore, since one has γ(x ′ ) < 0 for x ′ close enough to x, Corollary 1.2.10 in [21] yields for n large enough and every x ′ ∈ B(x, h) that where for every x ′ ∈ B(x, h), L(·|x ′ ) is a slowly varying function at infinity. Especially Consequently, in this framework, condition (2) becomes Write then for an arbitrary z > 1 and for The first term on the right-hand side in (9) is readily controlled if the conditional right endpoint x → U (∞|x) is a positive Hölder continuous function on E: say, with β ∈ (0, 1]. The second one can be bounded from above as follows: since n x /k x → ∞ and z γ(x) L(z|x) → 0 as z → ∞ (see Proposition 1.5.1 in [4]), we can write for n large enough imsart-ejs ver. 2013/03/06 file: Hillmom_EJS_revised.tex date: September 5, 2013 for every z ≥ n x /[(1 + δ)k x ]. Note now that for every z ∈ K x,δ we have Using Karamata's representation of L(·|x ′ ) (see (4)) and assuming that for some α ∈ (0, 1] sup and sup as n → ∞, then using the inequality it is readily seen that if h α log n x → 0 as n → ∞, we have Note that conditions (12), (13) and (14) are exactly (5), (6) and (7). Equations (10), (11), (16) and inequality (15) now entail Potter bounds for the regularly varying function z → z γ(x) L(z|x) (see imsart-ejs ver. 2013/03/06 file: Hillmom_EJS_revised.tex date: September 5, 2013 Finally, use together (9), (10) and (17) to get Equation (18) makes it clear that in this case, condition (8) shall be satisfied provided it holds that h α log n x → 0 (which was already required in the Fréchet MDA) and We can conclude that compared to the case of the Fréchet MDA, there is an additional condition for the pointwise consistency of our estimator to hold in the Weibull MDA. This condition compares the oscillation of the conditional right endpoint to the proportion of order statistics used in the expression of the estimator.
We end this paragraph by noting that Theorem 1 is only a pointwise result. In the case when E = R d and X has a probability density function f whose support S has a nonempty interior, it may be possible to obtain a uniform consistency result on every compact subset Ω of the interior of S, using for instance a method introduced by Härdle and Marron [23]: since Ω is a compact subset of R d we may, for all n ∈ N \ {0}, find a finite subset Ω n of Ω such that ∀x ∈ Ω, ∃χ(x) ∈ Ω n , x − χ(x) ≤ n −η and ∃c > 0, |Ω n | = O (n c ) as n → ∞, where |Ω n | stands for the cardinality of Ω n and η > 0 is suitably chosen (i.e. large enough). In other words, we may cover for every n the set Ω by a finite number of balls having a common radius which converges to 0 at a polynomial rate; we may also require that the set Ω n of the centers of these balls is such that the cardinality of Ω n grows at a polynomial rate. If γ is continuous on S, it is then enough to prove that for every δ > 0, and P sup as n → ∞. Showing (19) involves finding a uniform bound for the probabilities while the proof of (20) relies on a careful study of the oscillation of the random function x → γ n (x, k x , h). This is of course a challenging task, which shall be part of future research on this estimator.

Asymptotic normality
To prove a pointwise asymptotic normality result for our estimator, we need to introduce a second-order condition on the function U (·|x): (M 2 ) Condition (M 1 ) holds and for every x ∈ E, there exist a real number ρ(x) ≤ 0 and a function A(·|x) of constant sign converging to 0 at infinity such that the function U (·|x) satisfies Hypothesis (M 2 ) is the conditional analogue of the classical second-order condition on U , see for instance Definition 2.3.1 and Corollary 2.3.4 in [21]: the parameter ρ(x) is the so-called second-order parameter of Y given X = x. Note that Theorem 2.3.3 in [21] shows that the function |A(·|x)| is regularly varying at infinity with index ρ(x). Moreover, as shown in , a second-order condition also holds for the function log U (·|x), namely: where we have defined and Q(·|x) has ultimately constant sign, converges to 0 at infinity and is such that |Q(·|x)| is regularly varying at infinity with index ρ ′ (x); note that Lemma B.3.16 in [21] entails that one can choose Besides, if γ(x) > 0 and ρ(x) = 0, then according to Lemma B.3.16 in [21], one has for every Q(·|x) such that A(t|x) = O(Q(t|x)) as t → ∞; especially, we can and will take Q(·|x) = A(·|x) in this case.
We can now state the asymptotic normality of our estimator.
imsart-ejs ver. 2013/03/06 file: Hillmom_EJS_revised.tex date: September 5, 2013 Theorem 2 is the conditional analogue of the asymptotic normality result stated in [11]; see also Theorem 3.5.4 in [21]. In particular, the asymptotic bias and variance of our estimator are similar to those obtained in the univariate setting. Note that in this result, contrary to the asymptotic normality result of [18], we do not condition on the value of N n (x, h). Besides, condition √ k x Q(n x /k x |x) → λ(x) ∈ R as n → ∞ in Theorem 2 is a standard condition needed to control the bias of the estimator. Finally, hypothesis (23) can be replaced by a hypothesis on the uniform relative oscillation of the function log U in its second argument, see Proposition 1, which in turn can be made explicit if suitable regularity conditions are satisfied, see Section 3.1.
To illustrate this last remark, we use Theorem 2 to obtain optimal rates of convergence for our estimator. For the sake of simplicity, we assume that E = R d , d ≥ 1 is equipped with the standard Euclidean distance and that X has a probability density function f on R d which is continuous on its support S, assumed to have nonempty interior. If x is a point lying in the interior of S which is such that f (x) > 0, it is straightforward to show that it becomes clear that k x = kh d Vf (x) and that hypotheses n x → ∞, k x → ∞ and k x /n x → 0 as n → ∞ are equivalent to kh d → ∞ and k/n → 0 as n → ∞. If k and h have respective order n a and n −b , with a, b > 0, the rate of convergence of the estimator γ n (x, k x , h) to γ(x) is then n (a−bd)/2 . Under the hypotheses of Theorem 2, provided that (A 1 ) and (A 2 ) hold, one can find the optimal values for a and b in the case γ(x) = 0: • If γ(x) > 0, then under the Hölder conditions (5), (6) and (7), hypothe-

Recalling the bias condition
the problem is thus to maximize the quantity a − bd under the constraints a ∈ (0, 1), a − bd ≥ 0, The solution of this problem is The optimal convergence rate for our estimator in this case is therefore • If γ(x) < 0, then under the Hölder conditions (10), (12), (13) and (14), Recalling the bias condition To make things easier, we shall assume that the conditional right endpoint U (∞|·) is not more regular than γ, or in other words, that β ≤ α. In this case, since γ(x) < 0, the constraints reduce to a ∈ (0, 1), a − bd ≥ 0, The solution of this problem is The optimal convergence rate for our estimator in this case is then imsart-ejs ver. 2013/03/06 file: Hillmom_EJS_revised.tex date: September 5, 2013

Simulation study
To have an idea of how our estimator behaves on a finite sample situation, we carried out a simulation study in the case E = [0, 1] ⊂ R equipped with the standard Euclidean distance with a covariate X which is uniformly distributed on E. Furthermore, we let γ : E → R be the positive function defined by We consider three different models for the csf of Y given X = x: the first one is where the parameter τ is chosen to be independent of x and its value is picked in the set {−1.2, −1, −0.8}. In other words, Y given X = x is Burr type XII distributed; note that in this case the csf F 1 (·|x) belongs to the Fréchet MDA for every x ∈ E, the conditional extreme-value index is γ(x) and the conditional second-order parameter is ρ(x) = τ γ(x) (see [3], p.93). The second model is and the frontier function g is defined by with the constant c > 0 being picked in the set {0.1, 0.2, 0.3}. In this case, given X = x, Y /g(x) is a Beta(1/γ(x), 1/γ(x)) random variable: this conditional model is contained in the Weibull MDA with the conditional extreme-value index being −γ(x). The final model is where µ and σ are the functions defined by In this model, Y given X = x has a log-normal distribution with parameters µ(x) and σ 2 (x), which is an example of a conditional distribution belonging to the Gumbel MDA.
The aim of this simulation study is to estimate the conditional extreme-value index on a grid of points {x 1 , . . . , x M } of [0, 1]. We need to choose two parameters: the bandwidth h and the number of upper order statistics k x . We use a selection procedure that was introduced in [18], which we recall for the sake of completeness.
1) For every bandwidth h in a grid {h 1 , . . . , h P } of possible values of h, we first make a preliminary choice of k x . Let γ i,j (k) = γ n (x i , k, h j ) and ⌊·⌋ denote the floor function: We compute the variance of the set E i,j,k for every possible value of k and we record the number K i,j for which this variance is minimal. More precisely, We record the value k i,j such that γ i,j (k i,j ) is the median of the set E i,j,Ki,j . For the sake of simplicity, the estimate γ i,j (k i,j ) will be denoted by γ i,j . 2) We now select the bandwidth h: let q ′ be a positive integer such that 2q ′ + 1 < P . For each i ∈ {1, . . . , M } and j ∈ {q ′ + 1, . . . , P − q ′ }, let . . , j + q ′ }} and compute the standard deviation σ i (j) of F i,j : Our stability criterion is then the average of these quantities over the grid {x 1 , . . . , x M }: We next record the integer j * such that σ(j * ) is the first local minimum of the application j → σ(j) which is less than the average value of σ. In other words, imsart-ejs ver. 2013/03/06 file: Hillmom_EJS_revised.tex date: September 5, 2013 otherwise, where we extend σ by setting σ(q ′ ) = σ(q ′ + 1) and σ(P − q ′ + 1) = σ(P − q ′ ).
The selected bandwidth is then independent of x and is given by h * = h j * where j * is defined in (24). The selected number of upper order statistics is given, for The main idea of this procedure is that the bandwidth and the number of upper order statistics are selected in order to satisfy a stability criterion. This estimation procedure is carried out on N = 100 independent samples of size n = 500. The conditional extreme-value index is estimated on a grid of M = 50 evenly spaced points in [0, 1]. Regarding the selection procedure, P = 25 values of h ranging from 0.05 to 0.3 are tested; the parameter q ′ is set to 1.
To have an idea of our estimator behaves compared to another estimator in the conditional extreme-value index estimation literature, we introduce the estimator γ D = γ RP,1 n of [8]. Let K be the triweight kernel: Let F (·, h|x) be the empirical kernel estimator of the csf: x − X i h and let q n (·, h|x) be the generalized inverse of F (·, h|x): for α ∈ (0, 1), The quantity q n (·, h|x) is the empirical estimator of the conditional quantile function. The estimator γ D is then where α n,x → 0 as n → ∞ is a nonrandom sequence. This estimator is exactly the estimator γ RP,1 n of [8] with J = 3 and r = 1/J; it is a kernel version of the Pickands estimator, see Pickands [25]. To choose the parameters α n,x and h for γ D , we restrict our search to a parameter α n,x having the form k x /N n (x i , h j ), so that we are led to a choice of k x and h just as for our estimator, and we use the procedure detailed above.
We give in Table 1 the empirical mean squared errors (MSEs) of each estimator, averaged over the M points of the grid. Table 1 shows that our estimator outperforms the estimator γ D in terms of MSEs in every case except the Gumbel one. Besides, one can see that in the Fréchet MDA, the MSEs of both estimators increase as |ρ(x)| gets closer to 0, which was expected since ρ(x) controls the rate of convergence in (M 2 ): the closer |ρ(x)| is to 0, the slower is this convergence and the harder is the estimation. Some illustrations are given in Figures 1-3, where the estimations corresponding to the median of the MSE are represented in each case for both estimators. One can see on these pictures that our estimator generally oscillates less than γ D ; in the case when the conditional survival function belongs to the Fréchet or Weibull MDA, it also does a better job of mimicking the shape of the conditional extreme-value index.

Real data example
In this section, we introduce a real fire insurance data set, provided by the reinsurance broker Aon Re Belgium. The data set consists of n = 1823 observations (S i , C i ) where C i is the claim size related to the i−th fire accident and S i is the associated total sum insured. It was, among others, considered by Beirlant and Goegebeur [2] and Beirlant et al. [1]; see also [3]. Our variable of interest is the ratio claim size/sum insured: in other words, we focus on the random variables Y i = C i /S i . The covariate we consider is the total sum insured, which we can also consider as random; specifically, we let X i = log S i . A scatterplot of the data is given in Figure 4.
In Section 7.6 of [3], the authors show that the distribution of the Y i given log S i can be approximated rather well by a General Pareto (GP) distribution. Our goal is then to provide an estimate of the conditional extreme-value index of the Y i using our estimator. To this end, we use the selection procedure detailed in Section 4: the bandwidth h is selected among h 1 ≤ · · · ≤ h 25 where the h i are evenly spaced and h 1 = 0.05(X n,n − X 1,n ) and h 25 = 0.3(X n,n − X 1,n ) with X 1,n ≤ . . . ≤ X n,n being the order statistics deduced from the X i . This leads us to choose h * ≈ 1.35; a boxplot of the proportions k * xi /N n (x i , h * ) of order statistics used to compute the estimator is given in Figure 5. This allows us to give an estimate of the conditional extreme-value index, see Figure 6.
The first conclusion we draw from this study is that γ(x) > 0 for all x. This is somewhat surprising since one could expect the random variables Y i to be bounded from above by 1. However, the GP fits discussed in [3] and the estimations carried out in [2] and [1] also lead to the same conclusion. One can think that in this case, modelling the distribution of the Y i given log S i by a distribution belonging to the Fréchet domain of attraction is accurate in the "intermediate-upper" tail, namely, not too far into the upper tail of the distribution; an element backing this intuition is the exponential quantile plot given in Figure 7.17 of [3].
A second information is given by the shape of the estimated conditional extremevalue index. One can see that the estimator returns values that are greater than 1.1 for every considered value of the covariate. The study in [2], which considered the random variables C i as variables of interest and splitted the random sample into three subgroups according to the type of buildings insured (which is an additional covariate information that we do not consider in this paper), provides estimations ranging from 1.027 to 1.413, while [1], which did not consider any covariate information at all, gives the estimate γ = 1. Our estimate can therefore be considered as a somewhat conservative one, especially when log S i ≥ 19.5. Note that in this particular range, there are only very few (if any) high values of Y i in the sample, which may be the cause for this phenomenon.
All in all, we can conclude that this study confirms previous findings about this data set, although the proposed estimator may at times give fairly conservative results. A possible direction for future research on this estimator is therefore to correct this behavior. One should keep in mind though that the essential advantage of the estimator studied in this paper is the fact that it works in every domain of attraction, making it superior to most others in this respect.

Weak consistency
We start by proving the pointwise weak consistency of our estimator at a point x lying in E. To this end, since the M (j) n (x, k x , h) are defined conditionally on the value of the total number N n (x, h) of covariates belonging to B(x, h), which is random, a natural idea is to condition on this value. A preliminary classical lemma is then required to control this random variable.

Lemma 1.
If n x → ∞ as n → ∞ then for every δ > 0 From Lemma 1, we deduce that if it holds that N n (x, h) lies in I x with arbitrarily large probability as n → ∞; in other words, Furthermore, since k x /n x → 0 as n → ∞, we may and will, in the sequel, take n so large that k x < inf I x .
The next step is to show that when n is large, studying the convergence in probability of the quantities M independent and identically distributed random variables having cdf F (·|x). To achieve that we begin by stating a lemma which gives the conditional distribution of the random variables Z i . Lemma 2. Given N n (x, h) = p ≥ 1, the random variables Z i , 1 ≤ i ≤ p, are independent and identically distributed random variables having cdf F h (·|x).
Letting T i , i ≥ 1 be independent standard Pareto random variables, i.e. having cdf t → 1 − 1/t on (1, ∞), we deduce from this result that the distribution of the random vector (Z 1 , . . . , Z p ) given N n (x, h) = p ≥ 1 is the distribution of the random vector (U h (T 1 |x), . . . , U h (T p |x)). In other words, since U h (·|x) is nondecreasing, we may focus on the behavior in probability of the quantities for p > k x and j = 1, 2. Lemma 3 below is the desired approximation of the statistics M (j) and where for j = 1, 2, we let The final lemmas are technical results. The first one is a simple result we shall repeatedly make use of. Then for every Borel measurable function h : R 2 → R which is continuous at (ℓ, ℓ ′ ), one has for every t > 0 The following result is the main technical tool we shall use to prove our asymptotic results. It is basically a conditional analogue of the additive version of Slutsky's lemma. • it holds that p∈In P(A np ) → 1 as n → ∞;

two triangular arrays of random vectors
such that • for 1 ≤ p ≤ n, the distribution of S n given A np is the distribution of D np + R np ; • it holds that for every t = (t 1 , . . . , t r ) ∈ R r where t ′ is the transpose vector of t; • it holds that for every t > 0 and every j ∈ {1, . . . , r} sup p∈In P(|R (j) np | > t) → 0 as n → ∞.
The next lemma, which specifies the asymptotic behavior in probability of the order statistic T p−kx,p uniformly in p ∈ I x , shall be used several times.

Lemma 7.
For every Borel measurable functions f and g, every p ≥ 2 and k ∈ {1, . . . , p − 1}, the random vectors have the same distribution.
The final lemma shows that the asymptotic behavior of the random variables M (j) np (x, k x , h) is in some way uniform in p ∈ I x . Before stating this result, we note that applying Theorem B.2.18 in [21], condition (M 1 ) entails that there exists a positive function q 0 (·|x) which is equivalent to a(·|x)/U (·|x) at infinity such that the following property holds: for each ε > 0, there exists t 0 ≥ 1 such that for every t ≥ t 0 and z > 0 with tz ≥ t 0 , Lemma 8. Assume that (M 1 ) holds, and n x → ∞, k x → ∞ and k x /n x → 0 as n → ∞. Then for every t > 0 the convergences hold as n → ∞.
We are now in position to examine the convergence in probability of the statistics M n (x, k x , h), of which the consistency of our estimator is a simple corollary.
Proposition 2. Assume that (M 1 ) holds, that n x → ∞, k x → ∞, k x /n x → 0 and for some δ > 0 Then it holds that This result is the analogue of Lemma 3.5.1 in [21] when there is a covariate: of course, a major difference here is that the total number of observations N n (x, h) is random.
Proof of Proposition 2. We start the proof by remarking that with the notation of (25), applying Lemma 1 yields Pick then an arbitrary t > 0 and introduce the two events From (26), it is enough to prove that P(A (1) n ) → 0 and P(A (2) n ) → 0 as n → ∞. We start by controlling P(A (1) n ). Note that according to Lemma 2, one has

Moreover, Lemma 3 entails
Introducing for an arbitrary t ′ > 0 Lemmas 4 and 5 with A np = {N n (x, h) = p} make it enough to prove that u (1,j) np → 0 as n → ∞ uniformly in the integers p ∈ I x for every j ∈ {1, 2}.
To control u To control u (1,2) np we recall that the function q 0 (·|x) is regularly varying at infinity with index γ − (x) so that we can apply a uniform convergence result (see e.g. Especially, for n large enough, recalling that q 0 (·|x) and a(·|x)/U (·|x) are equivalent at infinity, we have This inequality gives for n sufficiently large Using condition (1), we get for n large enough Because the random variables T i are independent standard Pareto random variables, one has for n sufficiently large and the right-hand side in the last inequality converges to 0 as n → ∞, by Lemma 6. Collecting (27) and (29) shows that P(A Let us now consider P(A (2) n ). Applying Lemma 2, one has

Lemma 3 yields
Letting for an arbitrary t ′ > 0 Lemmas 4 and 5 with A np = {N n (x, h) = p} make it enough to prove that u (2,j) np → 0 as n → ∞ uniformly in the integers p ∈ I x for every j ∈ {1, 2, 3}. We start by noting that Lemma 8 leads to sup p∈Ix u (2,1) and since this term is similar to u (1,2) np and therefore we obtain from (29) that Finally, the obvious inequality Proof of Theorem 1. Using Lemma 1.2.9 in [21] yields a(t|x)/U (t|x) → γ + (x) as t → ∞. Applying Proposition 2, we get The result then follows from summing these two convergences.
We conclude this section by proving Proposition 1. To this end, we state a couple of preliminary results. The first one of them links the behavior of a function having the form 1/F , where F is a csf on R, to that of its left-continuous inverse.
Lemma 9. Let F be a csf on R and U be the left-continuous inverse of 1/F .

If U is an increasing function on (1, ∞) then the function F is continuous on R and
The second lemma examines the properties of the csf F h (·|x).

As a consequence
We may now show Proposition 1.
Proof of Proposition 1. We introduce the functions F min and F max defined by ∀y ∈ R, F min (y|x) = inf imsart-ejs ver. 2013/03/06 file: Hillmom_EJS_revised.tex date: September 5, 2013 With this definition, we get Applying Lemma 9, we obtain which, recalling (33), clearly entails Likewise, Inequality (33) and Lemma 10 therefore entail Applying Lemma 10 once again leads to the inequality From (34) and (35) we deduce that because the logarithm function is increasing. This yields The obvious inequality and − inf Collecting (36), (37) and (38) concludes the proof.

Asymptotic normality
We proceed by proving the pointwise asymptotic normality of the estimator at a point x ∈ E when condition (M 2 ) holds. We shall use the same ideas as in the proof of Proposition 2 to examine the asymptotic behavior of the statistics M (j) and ρ(x) < 0 if γ(x) > 0, then from (21) and Theorem 2.3.6 in [21], there exist functions q 0 (·|x) and Q 0 (·|x) which are equivalent to q(·|x) and respectively at infinity such that for every ε > 0 there exists t 0 ≥ 1 such that for every t ≥ t 0 and z > 0 with tz ≥ t 0 , If now γ(x) > 0 and ρ(x) = 0, recalling the equality q(·|x) = a(·|x)/U (·|x), we get from Lemma B.3.16 in [21] that Equation (22) thus yields We may now apply Theorem B.2.18 in [21] to obtain that for every ε > 0 there exists t 0 ≥ 1 such that for every t ≥ t 0 and z > 0 with tz ≥ t 0 , Using together (41), (42) and the fact that the function z → (z ε ∨ z −ε ) −1 log z is bounded on (0, ∞), we get that for every ε > 0 there exists t 0 ≥ 1 (possibly different) such that for every t ≥ t 0 and z > 0 with tz ≥ t 0 , imsart-ejs ver. 2013/03/06 file: Hillmom_EJS_revised.tex date: September 5, 2013 The following result is the analogue of Lemma 3.5.5 in [21] when there is a random covariate: let V(γ(x)) be the matrix is the distribution of a random vector (D where (P 1 , P 2 ) is a Gaussian random vector having mean (m (1) (x), m (2) (x)) with and covariance matrix V(γ(x)); the triangular arrays of random variables (R ij ) 1≤j≤i and (R (2) ij ) 1≤j≤i are such that for every t > 0 and j ∈ {1, 2}, sup p∈Ix P(|R (j) np | > t) → 0 as n → ∞.
• If γ(x) > 0 and ρ(x) = 0, it holds that the distribution of the random vector is the distribution of a random vector (D where (P 1 , P 2 ) is a Gaussian centered random vector having covariance matrix V(γ(x)); the triangular arrays of random variables (R ij ) 1≤j≤i and (R (2) ij ) 1≤j≤i are such that for every t > 0 and j ∈ {1, 2}, This result paves the way for a proof of Theorem 2.
Proof of Theorem 2. According to Lemma 2, the distribution of the random pair ( γ n,+ (x, k x , h), γ n,− (x, k x , h)) given N n (x, h) = p is that of  Arguing along the first lines of the proof of Theorem 3.5.4 in [21] and applying Lemmas 4 and 5 with A np = {N n (x, h) = p} together with Lemma 11 and the continuous mapping theorem, we then get that as n → ∞, where (P 1 , P 2 ) is the limit vector in Lemma 11. The result thus follows from Lemma 11 and some straightforward but lengthy computations.

Proofs of the auxiliary results
Proof of Lemma 1. The proof is a straightforward consequence of the fact that N n (x, h) is a binomial random variable with parameters n and P(X ∈ B(x, h)) and of Chebyshev's inequality.
Proof of Lemma 2. If (z 1 , . . . , z p ) ∈ (0, 1) p , then since the random pairs (X i , Y i ) have the same distribution, it holds that Using the definition of F h (·|x) and the independence of the random pairs (X i , Y i ), i = 1, . . . , p leads to The identity makes it clear that given N n (x, h) = p, the Z i , i = 1, . . . , p are independent and identically distributed random variables having cdf F h (·|x), from which the result follows.
Proof of Lemma 3. We start by writing the obvious inequality valid for every i ∈ {1, . . . , k x + 1}. The first part of the result is thus a straightforward consequence of (43) and the triangle inequality. To prove the second part, note that according to (43), for every i = 1, . . . , k The result on M (2) np (x, k x , h) then follows from the triangle inequality and from summing the above inequalities for i = 1, . . . , k x .
Proof of Lemma 4. Since h is continuous at (ℓ, ℓ ′ ), we can write In other words, one has and the right-hand side converges to 0 as n → ∞, which completes the proof.
Proof of Lemma 5. Start by writing, for every t = (t 1 , . . . , t r ) = (0, . . . , 0) Pick an arbitrary δ > 0: for n large enough, the triangle inequality yields We now bound the term on the right-hand side of this inequality as The second term of the above inequality is controlled using the hypothesis on the array (D ij ): we have for n sufficiently large Besides, using once again the triangle inequality entails, if ||t|| ∞ = max 1≤j≤r |t j |, Applying (48) and (49) then entails for n large enough Furthermore, Chebyshev's inequality and a comparison with an integral give as n → ∞. Collecting (50) and (51) yields the first result. The second result is then a simple consequence of the first result and of the inequality valid for n large enough. The third result is obtained by noting that since ϕ is regularly varying at infinity, writing then Theorem 1.5.2 in [4] shows that there exists t ′ > 0 such that for n large enough for every p ∈ I x ; the first result then applies to yield Finally, since sup p∈Ix p n x − 1 → 0 as n → ∞, applying once again Theorem 1.5.2 in [4] gives Using Lemma 4 completes the proof.
Proof of Lemma 7. If T is a standard Pareto random variable, then log T is a standard exponential random variable. One can thus use the Cramér-Wold device and argue along the lines of the proof of Lemma 3.2.3 in [21].
Proof of Lemma 8. We start by proving the first statement. Pick δ, t > 0 and ε ∈ (0, 1) such that 2ε With the notation of (25), letting B np = {T p−kx,p ≤ t 0 }, Lemma 6 shows that P(B np ) → 0 uniformly in p ∈ I x as n → ∞. For every p ∈ I x , on the complement B c np of B np , one can apply (25) to write (53) and ,p /T p−kx,p ≥ 1 for every p ∈ I x and i = 1, . . . , k x . Using (53) and (54), the probability of the event is then bounded from above by P(B np ) + P(C np ) + P(C (2) np ) uniformly in p ∈ I x for n large enough, where Apply Lemma 7 to get for every p ∈ I x Chebyshev's inequality leads to the inequality P(C (1) np ) ≤ δ/4 for n large enough, uniformly in p ∈ I x . Furthermore, since ε ∈ (0, 1), using together (52) and imsart-ejs ver. 2013/03/06 file: Hillmom_EJS_revised.tex date: September 5, 2013 Markov's inequality yields P(C (2) np ) ≤ δ/4 for every p ∈ I x . Hence for n large enough the inequality sup p∈Ix P(C np ) ≤ δ.
In other words, it holds that for every t > 0 Recall that q 0 (·|x) is regularly varying at infinity with index γ − (x) and apply Lemma 6 to get for every t > 0 Finally, writing and applying Lemma 4 together with (55) and (56) gives the first part of the result. To obtain the second part, square the inequalities (53) and (54) with ε < 1/2 small enough, use the equality and use the ideas developed for the proof of the first statement.
To show the second statement, assume that F is not continuous at y 0 ∈ R: in other words, since F is right-continuous and nonincreasing, It follows that for every z ∈ (1/β − , 1/β + ), one has U (z) = y 0 , which is a contradiction. Finally, note that by the right-continuity of F : imsart-ejs ver. 2013/03/06 file: Hillmom_EJS_revised.tex date: September 5, 2013 If one had η > 0, then it would hold that which is a contradiction.
Proof of Lemma 10. Write for every y ∈ R B(x, h)) .
The continuity assertion on F h (·|x) then follows from Lemma 9 and the dominated convergence theorem. Pick now y ∈ R such that F h (y|x) ∈ (0, 1) and δ > 0. One has: Assume that F h (y|x) = F h (y + δ|x). In this case, since for every x ′ ∈ E the function F (·|x ′ ) is nonincreasing, we get Besides, since F h (y|x) ∈ (0, 1), there exist measurable sets A 0 and A 1 such that Since the ball B(x, h) is a connected set in E because it is arc-connected, one may therefore apply the intermediate value theorem to the continuous map x ′ → F (y|x ′ ) to obtain that there exists x ′ ∈ B(x, h) such that F (y|x ′ ) ∈ (0, 1). Using once again the continuity of this map, we deduce that ∃r > 0, ∀x ′′ ∈ B(x ′ , r), F (y|x ′′ ) ∈ (0, 1).