Annals of Statistics

Empirical risk minimization for heavy-tailed losses

Christian Brownlees, Emilien Joly, and Gábor Lugosi

Full-text: Open access


The purpose of this paper is to discuss empirical risk minimization when the losses are not necessarily bounded and may have a distribution with heavy tails. In such situations, usual empirical averages may fail to provide reliable estimates and empirical risk minimization may provide large excess risk. However, some robust mean estimators proposed in the literature may be used to replace empirical means. In this paper, we investigate empirical risk minimization based on a robust estimate proposed by Catoni. We develop performance bounds based on chaining arguments tailored to Catoni’s mean estimator.

Article information

Ann. Statist., Volume 43, Number 6 (2015), 2507-2536.

Received: June 2014
Revised: May 2015
First available in Project Euclid: 7 October 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F35: Robustness and adaptive procedures
Secondary: 62F12: Asymptotic properties of estimators

Empirical risk minimization heavy-tailed data robust regression robust $k$-means clustering Catoni’s estimator


Brownlees, Christian; Joly, Emilien; Lugosi, Gábor. Empirical risk minimization for heavy-tailed losses. Ann. Statist. 43 (2015), no. 6, 2507--2536. doi:10.1214/15-AOS1350.

Export citation


  • [1] Abaya, E. F. and Wise, G. L. (1984). Convergence of vector quantizers with applications to optimal quantization. SIAM J. Appl. Math. 44 183–189.
  • [2] Alon, N., Matias, Y. and Szegedy, M. (1999). The space complexity of approximating the frequency moments. J. Comput. System Sci. 58 137–147.
  • [3] Antos, A. (2005). Improved minimax bounds on the test and training distortion of empirically designed vector quantizers. IEEE Trans. Inform. Theory 51 4022–4032.
  • [4] Antos, A., Györfi, L. and György, A. (2005). Individual convergence rates in empirical vector quantizer design. IEEE Trans. Inform. Theory 51 4013–4022.
  • [5] Audibert, J.-Y. and Catoni, O. (2011). Robust linear least squares regression. Ann. Statist. 39 2766–2794.
  • [6] Bartlett, P. L., Linder, T. and Lugosi, G. (1998). The minimax distortion redundancy in empirical quantizer design. IEEE Trans. Inform. Theory 44 1802–1813.
  • [7] Bartlett, P. L. and Mendelson, S. (2006). Empirical minimization. Probab. Theory Related Fields 135 311–334.
  • [8] Biau, G., Devroye, L. and Lugosi, G. (2008). On the performance of clustering in Hilbert spaces. IEEE Trans. Inform. Theory 54 781–790.
  • [9] Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323–375.
  • [10] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence, with a Foreword by Michel Ledoux. Oxford Univ. Press, Oxford.
  • [11] Bubeck, S., Cesa-Bianchi, N. and Lugosi, G. (2013). Bandits with heavy tail. IEEE Trans. Inform. Theory 59 7711–7717.
  • [12] Catoni, O. (2012). Challenging the empirical mean and empirical variance: A deviation study. Ann. Inst. Henri Poincaré Probab. Stat. 48 1148–1185.
  • [13] Dudley, R. M. (1978). Central limit theorems for empirical measures. Ann. Probab. 6 899–929.
  • [14] Embrechts, P., Klüppelberg, C. and Mikosch, T. (1997). Modelling Extremal Events for Insurance and Finance. Springer, Berlin.
  • [15] Fama, E. F. (1963). Mandelbrot and the stable Paretian hypothesis. The Journal of Business 36 420–429.
  • [16] Finkenstadt, B. and Rootzén, H. (2003). Extreme Values in Finance, Telecommunications and the Enviroment. Chapman & Hall, New York.
  • [17] Hsu, D. and Sabato, S. (2013). Approximate loss minimization with heavy tails. Preprint. Available at arXiv:1307.1827.
  • [18] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593–2656.
  • [19] Lerasle, M. and Oliveira, R. I. (2012). Robust empirical mean estimators. Manuscript.
  • [20] Levrard, C. (2013). Fast rates for empirical vector quantization. Electron. J. Stat. 7 1716–1746.
  • [21] Linder, T. (2002). Learning-theoretic methods in vector quantization. In Principles of Nonparametric Learning (Udine, 2001) (L. Györfi, ed.). CISM Courses and Lectures 434 163–210. Springer, Vienna.
  • [22] Mandelbrot, B. (1963). The variation of certain speculative prices. The Journal of Business 36 394–419.
  • [23] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin.
  • [24] Matoušek, J. (2002). Lectures on Discrete Geometry. Graduate Texts in Mathematics 212. Springer, New York.
  • [25] Maurer, A. and Pontil, M. (2010). $K$-dimensional coding schemes in Hilbert spaces. IEEE Trans. Inform. Theory 56 5839–5846.
  • [26] Mendelson, S. (2014). Learning without concentration. Preprint. Available at arXiv:1401.0304.
  • [27] Minsker, S. (2015). Geometric median and robust estimation in Banach spaces. Bernoulli 21 2308–2335.
  • [28] Nemirovsky, A. S. and Yudin, D. B. (1983). Problem Complexity and Method Efficiency in Optimization. Wiley, New York.
  • [29] Pollard, D. (1981). Strong consistency of $k$-means clustering. Ann. Statist. 9 135–140.
  • [30] Pollard, D. (1982). A central limit theorem for $k$-means clustering. Ann. Probab. 10 919–926.
  • [31] Pollard, D. (1982). Quantization and the method of $k$-means. IEEE Trans. Inform. Theory 28 199–205.
  • [32] Talagrand, M. (2005). The Generic Chaining. Springer, Berlin.
  • [33] Telgarsky, M. and Dasgupta, S. (2013). Moment-based uniform deviation bounds for $k$-means and friends. Preprint. Available at arXiv:1311.1903.
  • [34] van de Geer, S. A. (2000). Applications of Empirical Process Theory. Cambridge Univ. Press, Cambridge.
  • [35] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.