Estimation of tail parameters with missing largest observations

: The setting where an unknown number m of the largest data is missing from an underlying Pareto-type distribution is considered. Solutions are provided for estimating the extreme value index, the number of missing data and extreme quantiles. Asymptotic results of the parameter estimators and an adaptive selection method for the number of top data used in the estimation are proposed for the case where all missing data are beyond the observed data. An estimator of the number of missing extremes spread over the largest observed data is also proposed. To this purpose, a key component is a likelihood solution based on exponential representations of spacings between the largest observations. An eﬀective and fast optimization procedure is established using regularization, and simulation experiments are provided. The methodology is illustrated with a dataset from the diamond mining industry, where large-carat diamonds are expected to be missing.


Introduction
Extreme value methodology receives growing attention in order to model the occurrence of rare events with high impact in various fields of application.Estimation of the extreme value index (EVI) is then a crucial topic in extreme value methodology, assuming that the underlying distribution satisfies the maxdomain of attraction condition, i.e. assuming that the maximum of independent and identically distributed observations X 1 , X 2 , . . ., X n can be approximated by the generalized extreme value distribution: as n → ∞ for 1 + γ y > 0, where b n ∈ R, a n > 0 and γ ∈ R are the location, scale and shape parameters, respectively.The EVI γ is a measure of the tail-heaviness of the distribution of X with a larger value of γ implying a heavier tail of F .We consider the specific case of Pareto-type distributions with a positive EVI, which induces a restriction to right tail functions (RTF) given by with γ > 0 and a slowly varying function at infinity, that is lim t→∞ (ty) (t) = 1 for every y > 1. (1.3) We then assume that from the original ordered data set X 1,N ≤ X 2,N ≤ . . .≤ X N,N an unknown number of top data X N −m+1,N ≤ X N −m+2,N ≤ . . .≤ X N,N are missing.So we observe n = N − m observations.The estimation of m, γ > 0 and extreme quantiles Q(1 − p) with small p based on X 1,N ≤ X 2,N ≤ . . .≤ X n,N under (1.2) is then the main problem tackled in this paper.Recent discussions of this problem are given in [11] and [10] based on a Hill Estimator without Extremes (HEWE) process.Similar to their technique, we adopt a likelihood-based approach.However, the method is conceptually different since our starting point is, visually, the Pareto QQ-plot, and then mathematically the log-spacings and their Rényi representation, which allows for transparent, automatic selection of the sample fraction used in the estimation.
Specifically, it is well known that when γ > 0 the EVI can be estimated from the slope at an ultimate linear part of the Pareto QQ-plot: log n + 1 j , log X n−j+1,N , j = 1, . . ., n, (see for instance Chapter 4 in Beirlant et al. 2004).The celebrated [8] estimator of γ can then be considered as an estimator of the slope of this QQ-plot using the top k available data: for some appropriate k ∈ {1, . . ., n}.In this paper we propose an adaptation of the Hill estimator for the case that missing data are or could be present at the top data.
A motivating practical example can be found in [9] from the diamond mining industry.The nature of metallurgical recovery processes in diamond mining causes the under-recovery of large diamonds.Because of the potentially large monetary value of even a small number of large diamonds, the number of diamonds that are not recovered is an important problem in the diamond mining industry.We illustrate the proposed methods with the same sample of carat sizes as used in [9].Note that here the missing carat data are, though likely, not necessarily all larger than the largest observed carat size.
The remainder of the paper is organized as follows.In Section 2 we propose and analyze the estimation of (γ, m) and extreme quantiles.We propose asymptotic results for the estimators and an adaptive method for selecting k in case the missing observations are situated above the largest observation.A graphical method is given to detect the number of missing data in case some observations are also missing below the largest non-missing observation.Finally, we provide an efficient way of numerically optimizing a regularized version of the likelihood through a contraction operator.In Section 3 we present and discuss simulation results and revisit the diamond data set.Section 4 concludes.

Estimation of γ and the number of missing extremes under the
Pareto-type model

Missing data situated above the largest observation
To describe the influence of deleted data from a graphical point of view, in Figures 1 we illustrate the influence of missing top data on Pareto QQ-plots of strict Pareto (with (x) = 1, x > 1) and Fréchet samples (with F (x) = exp(−x −1/γ ), x > 1) both with γ = 0.5.We consider N = 200 and m = 20 in both cases.From the Pareto QQ-plots (see label 'm = 0' in Figure 1) we observe that the introduction of 20 missing data leads to concavity.This leads to underestimating extreme quantiles when using the classical linear extrapolation methods from extreme value analysis.An additional complication arises in the case of a non-constant slowly varying function , since then the missingness implies non-linearity in the top portion of the data, and hence the linear part in the QQ-plot is shortened.In case of slow convergence in (1.3) and/or significant missingness, linearity could be completely lost, and tail estimation then becomes problematic.Thus, we propose to correct the QQ-plot by adjusting the inverse ranks j to j + m in the construction of the QQ-plot with some appropriate value m, leading to a Pareto QQ-plot adapted for missingness: In Figure 1 such corrected QQ-plots are given using different values of m.When m < 20 the concavity in the top of the QQ-plot remains to some extent, while linearity is obtained at m = 20 (at least for upper quantiles), and convex corrected QQ-plots appear when m > 20.The possibility of estimating m and γ when adjusting the rank numbers is the key idea behind the graphical motivation.
More precisely, when the observed (non-missing) maximum X n,N is situated below the smallest missing data point X N −m+1,N we can exploit the exponential representations of scaled log-spacings as discussed in [2] and [6] which generalize the Rényi representation to Pareto-type distributions: under (1.2) with V j,n := log for small enough k can be well approximated by γE j with {E j , j = 1, . . ., k} independent standard exponential random variables.Hence the pseudo-likelihood based on a set {V j,n , j = 1, . . ., k} equals leading to the log-likelihood function where we use the fact that In the derivative of l with respect to m we next approximate k j=1 (j + m) −1 = k+m i=m+1 i −1 by log m+k m , which is motivated by Euler's formula when m, k → ∞.The maximum likelihood estimators (γ k , mk ) are then given by γk Note that γk is a simple adaptation of the Hill (1975) estimator H k,n which is induced by the missing observations.In case of a strict Pareto distribution, X 1/γ is standard Pareto distributed, so that ( , the mth smallest order statistics from a uniform (0,1) random sample of size m + k.The estimator of m given an estimate of γ hence tries to match the expected value m m+k of U m,m+k .For any given choice of k, an estimator for extreme quantiles Q(1−p) can now be presented based on estimators of γ and m using the classical Weissman-type extrapolation on the corrected QQ-plot: (2.3) One of the main practical drawbacks of the methodology in [10] is the extremely sensitive log-likelihood function in terms of (γ, m/k).Although the asymptotic theory is sound, the estimator has to be calibrated sequentially and with a good initial guess to produce accurate results.Moreover, the complicated definition of the estimators therein makes it difficult to mathematically analyse these step-wise procedures, and was not pursued in their paper.
In our specification, the likelihood function still empirically suffers the sensitivity issue, especially for small values of k, though to a much lesser degree.To further robustify our estimator we impose a regularization term on the number of missing observations (we refrain from also penalizing the tail index since the two quantities are heavily correlated anyway): The maximum penalized likelihood estimators are denoted by (γ (2.5)

Asymptotic results
For a mathematical analysis of the estimators ( m(λ) k , γ(λ) k ) we consider two cases.First we consider m/k → 0 as m, k, N → ∞, which refers to cases with small m values.Next, we assume that m/k → δ ∈ (0, 1) as k, N → ∞ and k/N → 0. Proofs are deferred to Appendix A.
To this end we assume the classical second-order assumption (refer to [7]) on When the number of missing observations m is limited, expressed by the assumption m/k → 0 as k, N → ∞ and N/k → ∞, any conventional estimator γk of γ could well be adequate as a substitute for γ(λ) k in (2.5).This is confirmed in a first asymptotic result.In this ζ k,n denotes the bias and Z the zero centered asymptotic normal distribution of γk .

Theorem 1. Assume the second-order condition (2.6). Then as
where the rv Γ m has density Assuming further that m 2 /k → 0 we have that From the above result it follows that the penalization parameter λ can be used to reduce the bias in estimating m induced by the bias of γk,n especially for small values of k.A positive penalization parameter then works for estimators γk,n with a positive bias ζ k,n .Estimation of b N,k , β and ζ k,n in the present setting is far from straightforward.In the practical realizations below we choose λ = 0.01, as a first attempt to reduce the bias.In the simulations we will consider the finite sample behaviour of m(λ) k when using different well established estimators of γ next to the estimator γ(λ) k following from optimizing l λ (γ, m).From Theorem 1 it appears appropriate to use the Γ(m, 1) distribution as the sampling distribution for mk at fixed k.The goodness-of-fit of this model will be discussed in the simulation study.
Next we consider the case m/k → δ ∈ (0, 1) as k, N → ∞ and k/N → 0. We propose an asymptotic result for the estimators (γ k , δk ) solving (2.2) in case the missing data were all deleted at the top of the original data set.We here take λ = 0. Below we use the notation ζ(δ, a) = ((1 + δ) a − δ a )/a, a > 0.

Theorem 2. Assume the second-order condition (2.6). Then as
Note that when δ → 0 the result for γk naturally corresponds with the classical limit result for the Hill estimator when no data are missing, i.e. a normal limit distribution with asymptotic variance γ 2 .

Detecting missing extremes below the largest observation
So far we have made the assumption that all the missing data is above the largest observation.However, if the data has missing values in large quantiles but not necessarily above the largest observation, we may adapt our methodology to this scenario.In essence, the idea is to sequentially keep removing the largest datapoint from the sample until we reduce to the canonical scenario, that is where all missing observations are above the largest datapoint in the sample.We provide details below.
Given an appropriate value of k, the search for missing extreme values that are situated within the top extreme data (and not necessarily above the largest observation) can be performed by trimming the likelihood from the preceding section until we obtain stabilization of the estimators.Trimming has shown to provide stability in other settings (cf.[4] for an application in outlier detection and [5] for general threshold selection).
Thus, we consider the deletion of further extreme points, with the rationale of eventually excluding all regions of the datasets where missing observations were present.If the m missing top data are all located above X n−k0,N with k 0 < k, then we still have that the spacings V j,n with j = k 0 + 1, . . ., k are approximately exponentially distributed with mean γ if scaled with the inverse rank numbers j + m, j = k 0 + 1, . . ., k.Then, the amount m can be estimated through maximization of leading to the trimmed log-likelihood function As before, approximating k j=k0+1 (j+m) −1 by log( m+k m+k0 ), we obtain the following likelihood equations for the maximum likelihood estimators (γ k0,k , mk0,k ): (2.7) Direct optimization of the above likelihood is possible and effective through standard nonlinear numerical procedures.However, we refer the reader to the next subsection for a fast fixed-point iterative solver.Note that (γ 0,k , m0,k ) = (γ k , mk ).In practice, one first selects k large enough so that one believes that all missing datapoints are within the top k observations.Subsequently, one plots (γ k0,k , mk0,k ) as a function of k 0 and with k fixed.One then expects to find a stable region in the plot for a certain value of k 0 and above.The point from which stability happens then provides an estimate k 0 of additional top order statistics required to be deleted from the sample in order to reduce the dataset to the canonical case, i.e.where data is missing above the largest sample observation.
As an example, in Figure 2, such a plot for one sample is given for a sample of size N = 500 from a Pareto distribution with γ = 0.5 where 25 missing observations are all randomly spread in the top 50 of the original data set.We also present the MSE and bias of (γ k0,k , mk0,k ) based on 1000 repetitions.Since all missing observations are situated above X n−25,N , we expect the plots (γ k0,k , mk0,k ) to be stable in the region k 0 ≥ 25 with mk0,k indicating the total number of missing observations, while for k 0 < 25 these plots will be decreasing with decreasing k 0 as the number of missing observations above such X n−k0,N is decreasing with smaller k 0 .The estimates γk0,k appear to be reliable for k 0 ≥ 25.Compare this with a case where the 25 missing data are all situated above X n,N (bottom line of Figure 2).Here the plot is stable over the whole plotted area.
The present approach using likelihood trimming only provides a graphical method assisting in detecting missing observations below the largest observation X n,N , and asymptotics are still absent.It appears that methods for detection of change points in the γk0,n−1 and mk0,n−1 plots could be used to provide an adaptive method to detect a minimal k 0 value above which the missing observations are situated.This will be pursued in subsequent work.Note that formula (2.3) can still be used in this case in order to estimate extreme quantiles.

Regularized fixed-point optimization
This subsection is devoted to further robustifying our estimator in two ways.First, we impose a regularization term on the number of missing observations (we refrain from also penalizing the tail index since the two quantities are heavily correlated anyway).Subsequently, we prove that we may compute the (possibly) penalized MLE estimator recursively by defining a suitable contraction operator and then invoking Banach's fixed-point theorem.The latter is key to iteratively computing our estimators without having to have good initial guesses.
To this end, consider the penalized log-likelihood given by (2.8) It is not hard to see that the approximate solution to the score equations satisfies the following equations mk0,k = kX This gives rise to a recursive perturbed estimator for the tail index, given by . (2.9) The corresponding estimator m(λ) k0,k (r) for the number of missing observations is defined analogously.
It is clear that whenever (2.8) has a unique maximum then proving that (2.9) is a contraction operator will suffice, by Banach's fixed-point theorem, to obtain convergence of γ(λ) k0,k (r) as r → ∞ to the maximizer of (2.8).
Proof.Any c-Lipschitz continuous function with c ∈ (0, 1) is a contraction.Then apply Banach's fixed point theorem.

Sample fraction selection
In order to select an appropriate value k of the number k of top data several methods can be used based on goodness-of-fit techniques.For instance: i) For every k the correlation coefficient r k can be computed based on the top k points of the adjusted QQ-plot (2.1) with mk substituting m, and kr then corresponds to the largest correlation r k ; ii) For every k the Anderson-Darling (A-D) W -statistic, given by (cf.[1]) is computed on and kW then corresponds to the smallest W -statistic W k .
From the simulations and practical experiments the use of kW appears to yield much better results in cases different from the strict Pareto distribution.While such adaptive method does not guarantee a consistent estimator for the asymptotic MSE optimal k value, it does address the finite-sample case appropriately, as it appears from simulation studies.However, it is advised to further validate the choice of k using graphical support from the Pareto QQ-plot adjusted with mk W , for instance, by validating linearity in this plot above X n− kW ,N .

Simulation results
We conduct a systematic simulation study to investigate the finite-sample behaviour of our estimators for varying k, as well as the A-D approach for the automatic k selection when all missing observations are situated beyond the largest observation.For varying sample fraction we consider several estimators, as follows: 1.The estimator (γ ).

The Hill estimator H k,n and an implied number of missing observations
given by plugging it into the second equation of (2.2), yielding a 'naive' estimator for the number of missing observations: k{( The moment estimator for the EVI, and its implied estimator for the number of missing observations through the same construct as the previous case.5. The Generalized Pareto Distribution (GPD) maximum-likelihood estimator for the EVI, and its implied estimator for the number of missing observations through the same construct as the previous case.
We have also compared with the estimator from [10] as written in their paper, but their likelihood is very sensitive, for instance with respect to the starting value.A stepwise procedure for their estimators could improve the performance, though we refrain from implementing this.Note that the last two estimators often provide negative tail indices, so that automatic selection formulae derived through the Anderson-Darling approach from Subsection 2.5 becomes unstable.Thus, for automatic selection of k we only consider the following three cases: In order to assess how deviations from pure the Pareto distribution affect the estimation procedure, we consider the following distributions with regularlyvarying tails: i) The Pareto distribution with RTF given by F ii) The Burr(2, −2, 2) distribution with RTF given by F (x) = ((2+x 4 )/2) −1/2 , x > 0. iii) The Fréchet(2) distribution with RTF given by F (x) = 1 − exp(−(x −2 )), x > 0. iv) The GPD(1/2, 1) distribution with RTF given by F x > 0.
Notice that we have specified γ = 1/2 in all cases, which allows for deviations in behaviour to be solely attributed to the slowly varying component of the distributions.For all distributions we consider a sample size of 500, m = 5, 25, 50, and compare mean (median when applicable) squared error and bias terms for the estimation of the tail index, for the number of missing observations, and for a high quantile (p = 1/500).All results are provided in Appendix B.
Remark 5. We use the iterative procedure from Subsection 2.4, which by Proposition 4 converges for any positive starting value.For convenience, we choose 1/2 as starting value, agreeing with the tail index of the simulations, though any other starting value provides indistinguishable results.Convergence usually happens within a few iterations, but all results were obtained using 100 iterations.
The main conclusions from the figures are the following: • Our estimators 1, 1a, 2 and 2a behave better at most k than the benchmarks when the true distribution is close to being strictly Pareto, and the automatic selection procedure is effective regardless of the distribution type.The regularization terms arising from using λ = 0.01 is useful for fixed k but seems to play a less important role when automatically selecting k. • When the distribution has a significant deviation from strict Pareto tails, the automatic estimators 1a and 2a also outperform the 'naive' estimator 3a, except in the m = 5 case when considering the tail index, where there is a slight underestimation by the estimator 3a, and a slight overestimation by 1a and 2a.For the number of missing observations and high quantiles, this effect seems to be less pronounced.Note in case m = 5 for all distributions that estimator 3 has a smaller MSE than estimators 1 and 2. • Estimators allowing negative tail indices, 4 and 5, can only be considered adequate when the number of missing observations is small (m = 5 in our study, or 1%), and then in that case, using a very large k.This is particularly the case for the GPD distribution, iv).Notice, however, that these estimators do not provide a natural estimate of the number of missing observations, so that their effectiveness relies solely on m n. 1 This behaviour is extended to the Hill estimator, 3, which in its automatic form 3a can be competitive for tail index estimation.
• Not accounting for missing values decreases the value of estimator 3, but not removing bias from the regularly varying component (by choosing k large enough) increases estimator 3.These two very different sources of bias often cancel each other out, providing "by chance" a good estimate.This is observed as a sharp decrease in MSE for medium or large k values.
However, estimating such a high k where the two biases cancel out is neither straightforward nor in the scope of this article.As expected, the effect vanishes when considering estimators with automatically selected k.
We end this section checking the validity of the Gamma(m, 1) distribution which follows from Theorem 1 as a sampling distribution for mk in case of a small and a moderate m.The goodness-of-fit is illustrated in Figure 17 in case of the Pareto distribution, based on 10, 000 repetitions.When using γ(0) k a positive bias with respect to the Gamma model is present with small m.When inserting the Hill estimator H k,n in place of γk in the second equation of (2.2), the distribution of the estimates of m for very small values of m follows the Γ(m, 1) distribution quite close and shows a negative bias in case of moderate m.

Diamond case study
The problem of estimating the amount of missing diamonds in ore mining was considered before in [9].They used a Bayesian approach to fit a truncated generalized Pareto distribution to part of the data.Based on the estimated tail probability the expected number of diamonds larger than a specific weight was estimated.Ore recovered from alluvial deposits are less subjected to the possibility of breakage.If the stones are not recovered during the metallurgical recovery process, they are discarded onto the tailing dumps from where they can be recovered during a re-mining program.Because of the potentially large monetary value of even a small amount of missing large diamonds, it is of interest to analyze if re-mining of a diamond dump is profitable thanks to the presence of large diamonds.The Pareto-type model is generally accepted to describe and analyze carat data.
Here we use the same data set as in [9].The Pareto QQ-plot of these observations is given in the bottom left plot of Figure 3.A concave deflection at the upper part of the QQ-plot appears comparable with the graphs in Figure 1 obtained from simulations.This leads to a systematic decline of the Hill estimates with decreasing k below 100 (see Figure 3 bottom right, before correction).We provide plots of mk and γk as a function of k in the top panels of Figure 3 and of the W k statistic in Figure 4. We obtain kW = 273, γk W = 0.4798, mk W = 6.4739 The corresponding parameter estimates for λ = 0.01 are kλ W = 273, γλ kW = 0.4792, mλ kW = 6.3740.
The plots are virtually indistinguishable between λ = 0 and λ = 0.01, and hence we present only the former.The plug-in missing observation estimator derived from inserting the Hill estimator into the second equation of (2.2) yields a very stable plot as a function of k leading to an estimate of 5 missing observations.The bottom panels of Figure 3 shows the difference of Pareto QQ plots before and after correcting with 6 missing extremes.In Figure 4 the solutions ( mk0,273 , γk0,273 ) of the trimmed likelihood with 0 ≤ k 0 < 273 are plotted as a function of k 0 .From this it appears that no extra missing observations can be reported apart from 6 missing observations at the top.This is to be compared with the 8 missing observations reported by the method from [9].Based on Theorem 1, a 95% confidence interval for m is obtained using the 0.025 and 0.975 quantiles of the Γ(6.45, 1) distribution: (2.5, 12.3).Based on the Hill imputed missings estimator we obtain the interval (1.6, 10.2).

Conclusion
In this paper, we addressed the problem of missing observations in the highest quantiles of a dataset, assuming that the data followed a Pareto-type distribu-tion.We presented solutions for estimating the extreme value index, the number of missing data and extreme quantiles, assuming that all missing data were beyond the observed data.We also proposed an adaptive method for selecting the number of top data used in the estimation.Additionally, we introduced a graphical method in order to infer on the number of missing extremes spread over the largest observed data.We derived asymptotic results and considered robustifying our estimator through regularization.We demonstrated the effectiveness of our approach through simulation experiments and an application in the diamond mining industry.

Appendix A: Proof of Theorems
Proof of Theorem 1.With U j,v (j = 1, . . ., v) denoting the order statistics of an i.i.d.sample of size v from the uniform (0,1) distribution, we have using (2.6) that .
Using that k,m,N +O(k −1 ) as k → ∞, the likelihood equations are given by γk Using the consistency of δk and γk we then obtain .
Using the asymptotic result in (A.1) now leads to the asserted result after some algebra.
Proof of Proposition 3. We ease the notation by defining the constants (with respect to x) a = X n−k,N X n−k 0 ,N ∈ (0, 1) and A = 1 (k−k0) k j=k0+1 jV j,n .Consequently, we wish to show that the following function is c-Lipschitz continuous: It will be enough to establish that its derivative is, uniformly and in absolute value, less than unity.For this, note first that Taking square roots and replacing the right-hand side with two Taylor expansions, we obtain, after cancelling terms, the equivalent inequality But the above assertion is satisfied for any λ ≥ 0, since in that case, using the bounded-away property of the compact K, we get and we may simply take c = (1/I) 2 ∈ (0, 1).

Fig 2 .
Fig 2. Simulations results for Pareto data (γ = 1/2, N = 500, m = 25).Top: plots of γk 0 ,n−1 and mk 0 ,n−1 using k = n − 1 as a function of log k 0 for one sample when 25 data are randomly deleted from the top 50 original observations.Middle two lines: plot of MSE and bias of γk 0 ,n−1 and mk 0 ,n−1 as a function of log k 0 with randomly spread missing observations based on 1000 repetitions.Bottom: plot of γk 0 ,n−1 and mk 0 ,n−1 as a function of log k 0 when the original top 25 data are deleted.

)
with k selected through the goodness-of-fit criterion with Anderson-Darling statistic outlined in Subsection 2.5.2a.The estimator (γ (0) k , m(0) k ) with k selected through the goodness-of-fit criterion with Anderson-Darling statistic outlined in Subsection 2.5.3a.The Hill estimator H k,n with k chosen from the selection of the case 1a.

Fig 3 .
Fig 3. Diamonds data.Top left: adapted Hill and Hill estimates, γk and H k,n respectively, as a function of k.Top right: mk and the plug-in missing observation estimator derived from inserting the Hill estimator into the second equation of (2.2), as a function of k.Bottom: original Pareto QQ-plot and adapted QQ-plot using m273 .

Fig 4 .
Fig 4. Diamonds data.Top: W k statistics as a function of k.Bottom left: estimates γk 0 ,273 as a function of k 0 .Bottom right: estimates mk 0 ,273 as a function of k 0 .

Fig 6 .
Fig 6.Simulations results for Pareto data (γ = 1/2, m = 25).Top: mean square error (MSE) for the tail index, number of missing observations, and 99.8% quantile, as a function of top k order statistics used.Center: Corresponding bias plots as a function of top k order statistics.Bottom: density of the estimators with automatically-selected k.

Fig 7 .
Fig 7. Simulations results for Pareto data (γ = 1/2, m = 5).Top: mean square error (MSE) for the tail index, number of missing observations, and 99.8% quantile, as a function of top k order statistics used.Center: Corresponding bias plots as a function of top k order statistics.Bottom: density of the estimators with automatically-selected k.

Fig 8 .
Fig 8. Simulations results for Burr(2, −2, 2) data (γ = 1/2, m = 50).Top: mean square error (MSE) for the tail index, number of missing observations, and 99.8% quantile, as a function of top k order statistics used.Center: Corresponding bias plots as a function of top k order statistics.Bottom: density of the estimators with automatically-selected k.

Fig 9 .
Fig 9. Simulations results for Burr(2, −2, 2) data (γ = 1/2, m = 25).Top: mean square error (MSE) for the tail index, number of missing observations, and 99.8% quantile, as a function of top k order statistics used.Center: Corresponding bias plots as a function of top k order statistics.Bottom: density of the estimators with automatically-selected k.

Fig 10 .
Fig 10.Simulations results for Burr(2, −2, 2) data (γ = 1/2, m = 5).Top: mean square error (MSE) for the tail index, number of missing observations, and 99.8% quantile, as a function of top k order statistics used.Center: Corresponding bias plots as a function of top k order statistics.Bottom: density of the estimators with automatically-selected k.

Fig 11 .
Fig 11.Simulations results for Fréchet(2) data (γ = 1/2, m = 50).Top: mean square error (MSE) for the tail index, number of missing observations, and 99.8% quantile, as a function of top k order statistics used.Center: Corresponding bias plots as a function of top k order statistics.Bottom: density of the estimators with automatically-selected k.

Fig 17 .
Fig 17.Histograms and fitted Γ(m, 1) densities for simulated Pareto data.Top left: N = 500, m = 5, k = 494; top right: N = 5000, m = 50, k = 4940.In the bottom panels are the corresponding histograms using the plug-in missings estimator derived from inserting the Hill estimator into the second equation of (2.2).