Electronic Journal of Statistics

Data-adaptive trimming of the Hill estimator and detection of outliers in the extremes of heavy-tailed data

Shrijita Bhattacharya, Michael Kallitsis, and Stilian Stoev

Full-text: Open access

Abstract

We introduce a trimmed version of the Hill estimator for the index of a heavy-tailed distribution, which is robust to perturbations in the extreme order statistics. In the ideal Pareto setting, the estimator is essentially finite-sample efficient among all unbiased estimators with a given strict upper break-down point. For general heavy-tailed models, we establish the asymptotic normality of the estimator under second order regular variation conditions and also show that it is minimax rate-optimal in the Hall class of distributions. We also develop an automatic, data-driven method for the choice of the trimming parameter which yields a new type of robust estimator that can adapt to the unknown level of contamination in the extremes. This adaptive robustness property makes our estimator particularly appealing and superior to other robust estimators in the setting where the extremes of the data are contaminated. As an important application of the data-driven selection of the trimming parameters, we obtain a methodology for the principled identification of extreme outliers in heavy tailed data. Indeed, the method has been shown to correctly identify the number of outliers in the previously explored Condroz data set.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 1872-1925.

Dates
Received: August 2018
First available in Project Euclid: 19 June 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1560909645

Digital Object Identifier
doi:10.1214/19-EJS1561

Mathematical Reviews number (MathSciNet)
MR3964266

Zentralblatt MATH identifier
07080064

Subjects
Primary: 62G32: Statistics of extreme values; tail inference 62G35: Robustness
Secondary: 62G30: Order statistics; empirical distribution functions

Keywords
Trimmed Hill adaptive robustness weighted sequential testing minimax rate optimality

Rights
Creative Commons Attribution 4.0 International License.

Citation

Bhattacharya, Shrijita; Kallitsis, Michael; Stoev, Stilian. Data-adaptive trimming of the Hill estimator and detection of outliers in the extremes of heavy-tailed data. Electron. J. Statist. 13 (2019), no. 1, 1872--1925. doi:10.1214/19-EJS1561. https://projecteuclid.org/euclid.ejs/1560909645


Export citation

References

  • [1] M. Kallitsis, S. A. Stoev, S. Bhattacharya, and G. Michailidis. Amon: An open source architecture for online monitoring, statistical analysis, and forensics of multi-gigabit streams., IEEE Journal on Selected Areas in Communications, 34(6) :1834–1848, June 2016.
  • [2] I. B. Aban, M. M. Meerschaert, and A. K. Panorska. Parameter Estimation for the Truncated Pareto Distribution., Journal of the American Statistical Association, 101:270–277, 2006.
  • [3] M. Ahsanullah, V. Nevzorov, and M. Shakil. An introduction to order statistics, volume 3 of, Atlantis Studies in Probability and Statistics. Atlantis Press, Paris, 2013.
  • [4] J. Beirlant, Ch. Bouquiaux, and B. Werker. Semiparametric lower bounds for tail index estimation., Journal of Statistical Planning and Inference, 136(3):705–729, 2006.
  • [5] J. Beirlant, P. Vynckier and J. L. Teugels. Tail Index Estimation, Pareto Quantile Plots, and Regression Diagnostics., Journal of the American Statistical Association, 436(91) :1659–1667, 1996.
  • [6] J. Beirlant, I. Fraga Alves and I. Gomes. Tail fitting for truncated and non-truncated Pareto-type distributions., Extremes, 19(3):429–462, 2016.
  • [7] J. Beirlant, Y. Goegebeur, J. Teugels, and J. Segers. Statistics of extremes: Theory and applications., Wiley Series in Probability and Statistics. John Wiley & Sons, Ltd., Chichester, 2004.
  • [8] J. Beirlant, A. Guillou, G. Dierckx, and A. Fils-Villetard. Estimation of the extreme value index and extreme quantiles under random censoring., Extremes, 10(3):151–174, 2007.
  • [9] Package CASdatasets., freclaimset, http://cas.uqam.ca/pub/R/web/CASdatasets-manual.pdf p. 42.
  • [10] N. H. Bingham, C. M. Goldie, and J. L. Teugels., Regular Variation. Number no. 1 in Encyclopedia of Mathematics and its Applications. Cambridge University Press, 1989.
  • [11] St. Boucheron and M. Thomas. Tail index estimation, concentration and adaptivity., Electronic Journal of Statistics, 9(2) :2751–2792, 2015.
  • [12] V. Brazauskas and R. Serfling. Robust estimation of tail parameters for two-parameter Pareto and exponential models via generalized quantile statistics., Extremes, 3(3):231–249, 2001, 2000.
  • [13] M. Brzezinski. Robust estimation of the Pareto tail index: a Monte Carlo analysis., Empirical Economics, 51(1):1–30, 2016.
  • [14] J. Danielsson, L. de Haan, L. Peng, and C. G. de Vries. Using a bootstrap method to choose the sample fraction in tail index estimation., J. Multivariate Anal., 76(2):226–248, 2001.
  • [15] L. de Haan and A. Ferreira. Extreme Value Theory, An Introduction., Springer Series in Operations Research and Financial Engineering. Springer, New York, 2006.
  • [16] H. Drees and E. Kaufmann. Selecting the optimal sample fraction in univariate extreme value estimation., Stochastic Processes and their Applications, 75(2):149–172, 1998.
  • [17] D. Dupuis and M.-P. Victoria-Feser. A robust prediction error criterion for Pareto modeling of upper tails., Canadian Journal of Statistics, 34(4):639–358, 2006.
  • [18] Ch. Dutang, Y. Goegebeur, and A. Guillou. Robust and bias-corrected estimation of the coefficient of tail dependence., Insurance Math. Econom., 57:46–57, 2014.
  • [19] P. Embrechts, C. Klüppelberg, and T. Mikosch., Modelling Extremal Events. Springer-Verlag, New York, 1997.
  • [20] M. Finkelstein, H. G. Tucker, and J. A. Veeh. Pareto tail index estimation revisited., North American Actuarial Journal, 10(1):1–10, 2006.
  • [21] Y. Goegebeur, A. Guillou, and A. Verster. Robust and asymptotically unbiased estimation of extreme quantiles for heavy tailed distributions., Statist. Probab. Lett., 87:108–114, 2014.
  • [22] F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, W. A. Stahel. Robust statistics: the approach based on influence functions., Wiley series in probability and mathematical statistics. Probability and mathematical statistics, 2005.
  • [23] P. Hall. On some simple estimates of an exponent of regular variation., J. Roy. Stat. Assoc., 44:37–42, 1982. Series B.
  • [24] P. Hall and A. H. Welsh. Best Attainable Rates of Convergence for Estimates of Parameters of Regular Variation., The Annals of Statistics, 12(3) :1079–1084, 1984.
  • [25] P. Hall and A. H. Welsh. Adaptive estimates of parameters of regular variation., Ann. Statist., 13, The Annals of Statistics, 12(3):331–341, 1985.
  • [26] B. M. Hill. A simple general approach to inference about the tail of a distribution., The Annals of Statistics, 3 :1163–1174, 1975.
  • [27] P. J. Huber. Robust Estimation of a Location Parameter., The Annals of Mathematical Statistics; 35:73–101, 1964.
  • [28] K. Knight. A simple modification of the Hill estimator with applications to robustness and bias reduction., Technical Report. http://www.utstat.utoronto.ca/keith/papers/robusthill.pdf.
  • [29] The Trimmed Hill Estimator: Robust and adaptive tail inference., Shiny App. https://shrijita-apps.shinyapps.io/adaptive-trimmed-hill/.
  • [30] trHill: Hill estimator for upper truncated data., https://rdrr.io/cran/ReIns/man/trHill.html.
  • [31] E. L. Lehmann and G. Casella., Theory of Point Estimation. Springer.
  • [32] L. Peng and A. H. Welsh. Robust estimation of the generalized Pareto distribution., Extremes, 4(1):53–65, 2001.
  • [33] J. Pickands. Statistical inference using extreme order statistics., Ann. Statist., 3:119–131, 1975.
  • [34] S. I. Resnick. Heavy-tail phenomena: Probabilistic and statistical modeling., Springer Series in Operations Research and Financial Engineering. Springer, New York, 2007.
  • [35] G. Yuri, V. Planchon, J. Beirlant and O. Robert. Quality Assessment of Pedochemical Data Using Extreme Value Methodology., Journal of Applied Sciences, 5, 2005.
  • [36] B. Vandewalle, J. Beirlant, A. Christmann, and M. Hubert. A robust estimator for the tail index of Pareto-type distributions., Comput. Stat. Data Anal., 51(12) :6252–6268, August 2007.
  • [37] B. Vandewalle, J. Beirlant, A. Christmann, and M. Hubert. A Robust Estimator of the Tail Index Based on an Exponential Regression Model., Theory and Applications of Recent Robust Methods, 367–376, January 2004.
  • [38] M.-P. Victoria-Feser and E. Ronchetti. Robust methods for personal-income distribution models., Canadian Journal of Statistics, 22(2):247–258, 1994.
  • [39] J. Zou, R. Davis, and G. Samorodnitsky. Extreme Value Analysis Without the Largest Values: What Can Be Done?, Technical Report. https://people.orie.cornell.edu/gennady/techreports/StrangeHill.pdf.