Bias correction in conditional multivariate extremes

: We consider bias-corrected estimation of the stable tail depen- dence function in the regression context. To this aim, we ﬁrst estimate the bias of a smoothed estimator of the stable tail dependence function, and then we subtract it from the estimator. The weak convergence, as a stochastic process, of the resulting asymptotically unbiased estimator of the conditional stable tail dependence function, correctly normalized, is established under mild assumptions, the covariate argument being ﬁxed. The ﬁnite sample behaviour of our asymptotically unbiased estimator is then illustrated on a simulation study and compared to two alternatives, which are not bias corrected. Finally, our methodology is applied to a dataset of air pollution measurements.


Introduction
Most of the practical problems involving extreme events are inherently multivariate. Consequently, being able to estimate the extremal dependence between random variables is useful. To this aim, we can either use some extremal coefficients, that give a representative picture of the full dependency structure (see, e.g., Ledford and Tawn, 1997), or functions, such as the spectral distribution function or the stable tail dependence function, that provide a full characterization of the extreme dependence between variables. We refer to Beirlant et al. (2004) and de Haan and Ferreira (2006), and the references therein, for more details. In this paper, we will focus on the stable tail dependence function, which can be defined as follows.
For any arbitrary dimension d, let pY p1q , . . . , Y pdq q be a random vector with continuous marginal distribution functions F 1 , . . . , F d . The stable tail dependence function is defined for each y i P R`, i " 1, . . . , d, as lim tÑ8 tP´1´F1pY p1q q ď t´1y1 or . . . or 1´F d pY pdq q ď t´1y d¯" Lpy1, . . . , y d q, (1.1) provided that this limit exists. We refer to Huang (1992), and de Haan and Ferreira (2006) for more details.
Several estimators for L have been proposed in the literature, see, e.g., Huang (1992), Drees and Huang (1998), Fils-Villetard et al. (2008), Bücher et al. (2014), but as usual in the extreme value framework, the classical estimators are affected by bias, which often complicates their practical application. To solve this issue, Fougères et al. (2015) and Beirlant et al. (2016) have introduced bias-corrected estimators and they have established the main properties of their estimators as stochastic processes.
Taking care of the bias is important, but in practical applications, we are also often faced with the presence of covariates in addition to the random vector pY p1q , . . . , Y pdq q. It is thus important to be able to estimate the stable tail dependence function when random covariates X are present, i.e., to consider the regression situation with a multivariate response. In that case, we want to describe the extremal dependence between the variables pY p1q , . . . , Y pdq q given some observed value x for the covariate X P R p . Thus, the notion of conditional stable tail dependence function Lp¨|xq can be introduced and the classical framework (1.1) can be extended into lim tÑ8 tP´1´F1pY p1q |Xq ď t´1y1 or . . . or 1´F d pY pdq |Xq ď t´1y dˇX " x" Lpy1,. . . ,y d |xq,(1.2) where F j p¨|xq, j " 1, . . . , d, denote the continuous conditional distribution function of Y pjq given X " x. To the best of our knowledge, the estimation of the conditional stable tail dependence function has only been studied very recently by Escobar-Bach et al. (2018b), where a local estimator was proposed and its weak convergence as a stochastic process was established. In related work, Gardes and Girard (2015) introduced an estimator for the conditional tail copula and studied its finite dimensional convergence. However, being in the regression context, of course does not solve the bias problem of the estimator of Lp¨|xq. Thus, combining bias-correction and regression will be the subject of this paper. As far as we know, this topic is completely new in the literature. The remainder of the paper is organized as follows. In Section 2, we introduce our bias-corrected estimator of the conditional stable tail dependence function and we establish its weak convergence as a stochastic process, the covariate being fixed. Then in Section 3, we illustrate the performance of this estimator on a small simulation study where we compare it with two alternatives, that are not asymptotically unbiased. Section 4 is devoted to a data analysis of air pollution measurements. All the proofs are postponed to Section 5.

Estimators and convergence results
Denote pY, Xq :" pY p1q , . . . , Y pdq , Xq, a random vector satisfying (1.2), and let pY 1 , X 1 q, . . . , pY n , X n q, be independent copies of pY, Xq, where X has density function f . We introduce a local estimator for L, based on an empirical version of the left-hand side of (1.2), for large values of t. As is usual in the extreme value context, we consider an intermediate sequence k " k n , i.e., k Ñ 8 as n Ñ 8 with k{n Ñ 0. Since the margins F j p¨|xq appearing in (1.2) are unknown in practice, we have to replace them by estimators such as the empirical kernel estimator p F n,j py|xq :" where K c p¨q :" Kp¨{cq{c p with K a density function on R p , and c :" c n is a positive non-random sequence satisfying c n Ñ 0 as n Ñ 8. Denote with y :" py 1 , . . . , y d q a vector of the positive quadrant R d . According to Escobar- with h :" h n a positive non-random sequence satisfying h n Ñ 0 as n Ñ 8, is an estimator of the conditional stable tail dependence function Lpy|xq. Note that in p F n,j py|xq and p L k py|xq the same kernel function K has been used, but they can of course be taken different. As in Escobar-Bach et al. (2018b), the bandwidths for p F n,j and p L k are though different. This non-parametric estimator of Lp¨|xq is in construction similar to the estimator for the conditional copula function introduced in Veraverbeke et al. (2011). The purpose of the two papers is different though. In our paper we study the asymptotic properties of an estimator of extreme tail dependence, while in Veraverbeke et al. (2011), focus was on the asymptotic behavior of an estimator for the conditional copula function.
The aim of the paper is to propose an asymptotically unbiased estimator for Lp¨|xq. To the best of our knowledge this topic has not been considered previously in the literature in the regression context, in contrast to the classical framework without covariates where we can mention the contributions of Fougères et al. (2015) and Beirlant et al. (2016).
The main results of the paper will be derived as stochastic weak convergence results for processes in y P rε, T s d , for any ε ą 0 and T ą ε, but with the covariate argument fixed, meaning that we will focus our study only around one reference position x 0 P IntpS X q, the interior of the support S X of f , assumed to be non-empty. To this aim, we need to introduce some conditions mentioned below and well-known in the extreme value framework. Let }.} be some norm on R p , and denote by B x pτ q the closed ball with respect to }.}, centered at x and with radius τ ą 0. The event A t,y is defined for any t ą 0 and y P R d as A t,y :" ! 1´F 1 pY p1q |X q ď t´1y 1 or . . . or 1´F d pY pdq |X q ď t´1y d ) .
First order condition: The limit in (1.2) exists for all x P S X and y P R d , and the convergence is uniform on r0, T s dˆB x0 pτ q for any T ą 0 and a τ ą 0.

Second order condition:
For any x P S X there exist a positive function αp¨|xq such that αpt|xq Ñ 0 as t Ñ 8 and a non null function M p¨|xq such that for all uniformly on r0, T s dˆB x0 pτ q for any T ą 0 and a τ ą 0.

Third order condition:
For any x P S X there exist a positive function βp¨|xq such that βpt|xq Ñ 0 as t Ñ 8 and a non null function N p¨|xq such that for all uniformly on r0, T s dˆB x0 pτ q for any T ą 0 and a τ ą 0, and where N is not a multiple of M .
Note that these assumptions imply that the functions αp¨|xq and βp¨|xq are both regularly varying with indices ρpxq and ρ 1 pxq respectively which are non positive. In the sequel we assume that both indices are negative. Remark also that the functions Lp¨|xq, M p¨|xq and N p¨|xq have a homogeneity property, that is Lpay|xq " aLpy|xq, M pay|xq " a 1´ρpxq M py|xq, N pay|xq " a 1´ρpxq´ρ 1 pxq N py|xq for all a ą 0 and all y P R d .
Due to the regression context, we need some Hölder-type conditions.
Assumption pF m q. There exist M Fj ą 0 and η Fj ą 0 such that |F j py|xqF j py|zq| ď M Fj }x´z} η F j , for all y P R, all px, zq P S XˆSX and j " 1, . . . , d.
Also a condition is required on the kernel function K.
Assumption pKq. K is a bounded density function on R p with support S K included in the unit ball of R p with respect to the norm }.}. Moreover, we assume that there exists δ, m ą 0 such that B 0 pδq Ă S K and Kpuq ě m for all u P B 0 pδq, and K belongs to the linear span (the set of finite linear combinations) of functions k ě 0 satisfying the following property: the subgraph of k, tps, uq : kpsq ě uu, can be represented as a finite number of Boolean operations among sets of the form tps, uq : qps, uq ě ϕpuqu, where q is a polynomial on R pˆR and ϕ is an arbitrary real function.
The latter assumption is common, and used already in, e.g., Giné and Guillou (2002) and Escobar-Bach et al. (2018a,b), and allows to measure the discrepancy between the conditional distribution function F j p¨|xq and its empirical kernel version p F n,j p¨|xq.

Asymptotic result for p L k p¨|x 0 q under a third order condition
Escobar-Bach et al. (2018b) have established the weak convergence of p L k py|x 0 q as a stochastic process in y P r0, T s d and for a fixed covariate position x 0 P R p , under the second order condition. In order to construct an asymptotically unbiased estimator for Lp¨|x 0 q, the third order condition is required and thus we need to know if under this new condition, a similar convergence result can be stated.
For any T ą 0, let Dpr0, T s d q be the space of functions on r0, T s d that are right-continuous and have left-hand limits. All our weak convergence results will be derived in the Skorohod space Dpr0, T s d q equipped with the sup norm.
Under pF m q, pDq, pLq, pAq, pBq, pMq, pKq, and assuming that there exists an ε ą 0 such that for n sufficiently large kh p αpn{k|x 0 q βpn{k|x 0 q ÝÑ μ 1 px 0 q P R`, and for some q ą 1 and 0 ă η ă minpη F1 , ..., Then the process weakly converges in Dpr0, T s d q towards a tight centered Gaussian process tBpyq, y P r0, T s d u, for any T ą 0, with covariance structure given by where y, y 1 P r0, T s d and }K} 2 :" b ş S K K 2 puqdu.

Smoothed estimator for Lp¨|x 0 q
Inspired by the homogeneity of Lp¨|x 0 q, consider now the rescaled statistic for a positive scale parameter a. Our uncorrected (in terms of bias) estimator for Lp¨|x 0 q will be the following weighted version of the rescaled statistic, defined for any r ą 1 and ξ ą 0 as The weak convergence of this new estimator as a stochastic process is established in the following theorem.
Based on this result, in order to construct an asymptotically unbiased estimator for Lp¨|x 0 q, we need now to estimate ρpx 0 q and α k py|x 0 q :" αpn{k|x 0 qM py|x 0 q. This is the aim of the next section. Let pξ 1 , ξ 2 , ξ 3 , ξ 4 q P R 4 , r 1 " r 2 ą 1 and s ą 0. Now, define Ipy; r, t; sq :" 1 r´1

Estimation of ρpx
We propose to estimate ρpx 0 q by Theorem 2.3. Under the assumptions of Theorem 2.1, and additionally assuming that M never vanishes except on the axes and that ? kh p α 2 pn{k|x 0 q Ñ μ 2 px 0 q P R`, for any pξ 1 , ξ 2 , ξ 3 , ξ 4 q P R 4 , r 1 " r 2 ą 1 and s ą 0, we have Let pξ 5 , ξ 6 q P R 2 and r 3 " r 4 ą 1. To estimate α k py|x 0 q, we propose In the sequel, we denote by c 1 pr; ρpx 0 qq the derivative of cpr; ρpx 0 qq with respect to ρpx 0 q, and we use the following notations: in Dprε, T s d q, for every ε ą 0 and T ą ε, where B is defined in Theorem 2.1.

Bias correction of q L k py|x 0 q
Now we have all the ingredients to construct an asymptotically unbiased estimator for Lp¨|x 0 q by removing from q L k py|x 0 q the bias term where αpn{k|x 0 qM py|x 0 q together with the second order rate parameter ρpx 0 q have been estimated externally, using the same intermediate sequence k " k n , which is such that k " opkq. This idea has been originally proposed by Gomes and co-authors (see, e.g., Gomes et al., 2008;Caeiro et al., 2009) in the univariate framework and has the advantage that the variance of the bias-corrected estimator and the uncorrected one is the same. Thus, we propose the following bias-corrected estimator for Lp¨|x 0 q L k,k py|x 0 q :" q L k py|x 0 q´q α k py|x 0 qcpr; q ρ k px 0 qqˆk k˙q (2.4) Theorem 2.5. Assume the third order condition, M never vanishes except on the axes, y Ñ M py|xq continuous on r0, T s d and px, yq Ñ N py|xq continuous on B x0 pτ qˆr0, T s d , with B x0 pτ q Ă S X . Also suppose that there exists b ą 0 with f pxq ě b, @x P S X Ă R p and f bounded. Under pF m q, pDq, pLq, pAq, pBq, pMq, pKq, and assuming that there exists an ε ą 0 such that for n sufficiently large inf xPS X λptu P B 0 p1q : x´cu P S X uq ą ε, consider sequences k Ñ 8, h Ñ 0, c Ñ 0 as n Ñ 8 and k such that k " opkq, k{n Ñ 0, and with and for some q ą 1 and 0 ă η ă minpη F1 , ..., in Dprε, T s d q, for every ε ą 0 and T ą ε, where B is defined in Theorem 2.1.
Note that this bias-corrected estimator L k,k p¨|x 0 q has the same asymptotic variance as the uncorrected estimator q L k p¨|x 0 q (see Theorem 2.2). Also, the conditions on k,k, h and c appearing in Theorem 2.5 can be satisfied with μ 1 px 0 q " μ 2 px 0 q " 0 if one chooses, up to multiplicative constants, k " n α1 , k " n α2 , h " n´Δ 1 and c " n´Δ 2 , where the parameters satisfy the following bounds (2.8) According to the above, one can start with choosing α 1 within its bounds. Then, given α 1 , one can choose α 2 , and so on. It is thus possible to find sequences k, k, h and c satisfying the conditions of the theorem.

Simulation study
Our aim in this section is to illustrate the bias-correcting effect in the estimation of Lp¨|x 0 q. We focus on dimensions d " 2 and p " 1. We consider the two models studied in Escobar-Bach et al. (2018b), which both satisfy our third order condition, together with Assumptions pDq, pLq, pAq, pBq, pMq and pF m q.
In particular, these models are the following: • Model 1: The bivariate Student distribution with density function where θ is the Pearson correlation coefficient. The stable tail dependence function can be described as where F ν`1 is the distribution function of the univariate Student distribution with pν`1q degrees of freedom. Also We set θ " X, where X is uniformly distributed on r0, 1s. In the simulations, we use ν " 4, which corresponds to ρpx 0 q " ρ 1 px 0 q "´1{2. For this model we have η F1 " η F2 " η f " η L " η α " η β " η M " 1 and we set η " 0.99. Also, the conditions of Theorem 2.5 on k,k, h and c can be satisfied by using the bounds (2.5)-(2.8). For instance, if one takes α 1 " 0.80, α 2 " 0.81, Δ 1 " 0.54 and Δ 2 " 0.335, then the theoretical convergence rate is of order n´0 .13 .
For each model, we simulate 500 samples of size 1000, and we compare three estimators of Lp¨|x 0 q: the two uncorrected estimators, p L k p¨|x 0 q and its smoothed version q L k p¨|x 0 q, and our bias-corrected estimator L k,k p¨|x 0 q, at position x 0 " 0.3. Concerning the kernel, we always use the bi-quadratic function Kpuq :" 15 16 p1´u 2 q 2 1l tuPr´1,1su .
Each estimator requires the selection of some tuning parameters. This will be done as follows.
In the latter, q ρ k px 0 q is the value obtained in Step 2. First, in order to assess the theoretical result provided by Theorem 2.5, we show in Figure 1 the associated normal QQ-plots for the two models, three different sample sizes n " 1000, 5000 and 10000, and the theoretical choices of the parameters k, k, h, c induced by the conditions of Theorem 2.5. Here y " p0.4, 0.6q and x 0 " 0.3, but similar plots can be obtained for other y´positions and values of the covariate x 0 . From these normal QQ-plots, we can see that the linearity in the plots improves with increasing the sample size n and the points are getting closer to the diagonal, though there remains some curvature in the QQ-plot for Model 2, even for n " 10000.
Next, in order to assess the effect of replacing the unknown margins by estimators, we plot in Figure 2 the empirical quantiles of the normalized estimator p L k p0.4, 0.6|0.3q (see Theorem 2.1) with estimated margins versus those obtained with the true margins. Again the QQ-plots have been constructed with the theoretical choices of k, h and c and two different sample sizes: n " 1000 (left column) and n " 5000 (right column). As is clear from Figure 2, the similarity between the two quantiles in Model 1 is better than in Model 2, and as expected, increasing the sample size improves the results, with points getting closer to the diagonal. Again, similar results can be obtained for other y´positions and values of the covariate x 0 .
Finally, in Figure 3, we show the sample mean (left) and the empirical mean squared error (MSE, right) of p L k py|x 0 q (dotted line), q L k py|x 0 q (dashed line) and L k,k py|x 0 q (full line) as a function of k in case of Model 1 with x 0 " 0.3 and four possible values of y, corresponding to the different rows: from the top to the bottom, y " p0.2, 0.8q, p0.4, 0.6q, p0.6, 0.4q and p0.8, 0.2q, respectively. The horizontal line on the left panel represents the true value of Lpy|x 0 q. Figure 4 concerns Model 2 and the same values of x 0 and y. Based on these simulations, we can draw the following conclusions: • Our estimator L k,k py|x 0 q clearly outperforms the two alternatives. In terms of bias, the sample means show very stable paths as a function of k, close to the true value. In terms of MSE, it is still competitive, almost always better than p L k py|x 0 q and q L k py|x 0 q, or otherwise at least similar, and again very stable as a function of k. Those are very nice features since in our case, the selection of k is not very crucial, while it is for p L k and q L k . • For Model 2, the estimation is more difficult for y far away from the diagonal, whereas for Model 1, it does not depend on y. • Different values of the covariate x 0 have been tried, but since the performance of our bias-corrected estimator L k,k py|x 0 q does not seem to depend on the position in the covariate space, the figures are not included in the paper.

Application to air pollution data
In this section, we illustrate the practical applicability of our bias-corrected estimator on a dataset of air pollution measurements. We consider the data collected by the United States Environmental Protection Agency (EPA), publicly available at https://aqsdr1.epa.gov/aqsweb/aqstmp/airdata/download files.html. The dataset contains daily measurements of, among others, maximum temperature, ground-level ozone, carbon monoxide and particulate matter concentrations, for the period 1999 to 2013, and this for stations spread over the U.S. Monitoring levels of these pollutants is of crucial importance, as extreme temperature and high levels of pollutants like ground-level ozone and particulate matter pose a major threat to human health. We estimate the stable tail dependence function for the variables temperature and ozone concentration, conditional on time and location, where the latter is expressed by latitude and longitude. In the estimation, the covariates are standardised to the interval r0, 1s, and the tuning parameters are selected with the algorithm described in Section 3. In order to keep the computational time requirements under control, the tuning parameters selected at steps 1. to 3. of the algorithm are computed with a random sampling of size r0.1ns where n " 127328 refers to the initial sample size. As kernel function K˚, we use the following generalisation of the bi-quadratic kernel K: where x 1 , x 2 , x 3 , refer to the covariates time, latitude and longitude, respectively, in standardised form. Note that K˚has as support the unit ball with respect to the max-norm on R 3 .
We report here only the results at two different time points, January 15, 2007 and June 15, 2007, and for two locations, Fresno and Los Angeles (both in California). In Figure 5, we show the estimates mediant r L k pt, 1´t|xq, k " n{4,¨¨¨, n{2u, with a range of k´values based on 25 equally spaced integers, where r L k is either L k,k (full line), p L k (dotted line) or q L k (dashed line), for the cities Fresno (top row) and Los Angeles (bottom row) on January 15, 2007 (first column) and June 15, 2007 (second column). For both stations, the biascorrected estimate for the stable tail dependence function indicates a stronger extreme dependence between temperature and ozone concentration in winter than in summer. In winter the extreme dependence in Fresno is stronger than in Los Angeles. The results obtained with the uncorrected estimators p L k and q L k are typically similar to each other, and correspond more or less with the analysis reported in Escobar-Bach et al. (2018b). The estimate L k,k differs considerably from p L k and q L k for Fresno, winter and Los Angeles, summer. Note that in these cases, the bias-corrected estimate tends to be higher than the uncorrected estimates, indicating a weaker extremal dependence. This was also observed in the simulation experiment, where the bias-corrected estimator tends to be larger (and closer to the true value) than the uncorrected estimators. The observed discrepancy indicates that estimation of tail dependence between temperature and ozone concentration can suffer from bias, and therefore it is recommended to use the bias-corrected estimator in order to get a better estimate of the stable tail dependence function.

Proof of Theorem 2.1
We follow the lines of proof of Theorems 2.1 and 2.3 in Escobar-Bach et al. where the error terms are all independent from y.

Proof of Theorem 2.2
Using Theorem 2.1, the homogeneity properties of the functions Lp¨|x 0 q, M p¨|x 0 q, N p¨|x 0 q, and the Skorohod representation (but keeping the same notation), we have where the o´term is almost surely and uniform in a and y. This implies that for any r ą 1 and ξ ą 0, under the assumptions of Theorem 2.2, we have by the Skorohod representation and a straightforward application of Taylor's theorem that Theorem 2.2 follows then from another application of Taylor's theorem.

Proof of Theorem 2.3
According to Theorem 2.2, using the homogeneity properties of the functions Lp¨|x 0 q, Mp¨|x 0 q, N p¨|x 0 q, and the Skorohod representation, we havẽ Several Taylor series expansions allow us to achieve the proof of Theorem 2.3.