Open Access
November 2012 Challenging the empirical mean and empirical variance: A deviation study
Olivier Catoni
Ann. Inst. H. Poincaré Probab. Statist. 48(4): 1148-1185 (November 2012). DOI: 10.1214/11-AIHP454
Abstract

We present new M-estimators of the mean and variance of real valued random variables, based on PAC-Bayes bounds. We analyze the non-asymptotic minimax properties of the deviations of those estimators for sample distributions having either a bounded variance or a bounded variance and a bounded kurtosis. Under those weak hypotheses, allowing for heavy-tailed distributions, we show that the worst case deviations of the empirical mean are suboptimal. We prove indeed that for any confidence level, there is some M-estimator whose deviations are of the same order as the deviations of the empirical mean of a Gaussian statistical sample, even when the statistical sample is instead heavy-tailed. Experiments reveal that these new estimators perform even better than predicted by our bounds, showing deviation quantile functions uniformly lower at all probability levels than the empirical mean for non-Gaussian sample distributions as simple as the mixture of two Gaussian measures.

Nous présentons de nouveaux M-estimateurs de la moyenne et de la variance d’une variable aléatoire réelle, fondés sur des bornes PAC-Bayésiennes. Nous analysons les propriétés minimax non-asymptotiques des déviations de ces estimateurs pour des distributions de l’échantillon soit de variance bornée, soit de variance et de kurtosis bornées. Sous ces hypothèses faibles, permettant des distributions à queue lourde, nous montrons que les déviations de la moyenne empirique sont dans le pire des cas sous-optimales. Nous prouvons en effet que pour tout niveau de confiance, il existe un M-estimateur dont les déviations sont du même ordre que les déviations de la moyenne empirique d’un échantillon Gaussien, même dans le cas où la véritable distribution de l’échantillon a une queue lourde. Le comportement expérimental de ces nouveaux estimateurs est du reste encore meilleur que ce que les bornes théoriques laissent prévoir, montrant que la fonction quantile des déviations est constamment en dessous de celle de la moyenne empirique pour des échantillons non Gaussiens aussi simples que des mélanges de deux distributions Gaussiennes.

References

1.

[1] P. Alquier. PAC-Bayesian bounds for randomized empirical risk minimizers. Math. Methods Statist. 17 (2008) 279–304. MR2483458 05614400 10.3103/S1066530708040017[1] P. Alquier. PAC-Bayesian bounds for randomized empirical risk minimizers. Math. Methods Statist. 17 (2008) 279–304. MR2483458 05614400 10.3103/S1066530708040017

2.

[2] J.-Y. Audibert. A better variance control for PAC-Bayesian classification. Preprint n.905bis, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7, 2004. Available at  http://www.proba.jussieu.fr/mathdoc/textes/PMA-905Bis.pdf.[2] J.-Y. Audibert. A better variance control for PAC-Bayesian classification. Preprint n.905bis, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7, 2004. Available at  http://www.proba.jussieu.fr/mathdoc/textes/PMA-905Bis.pdf.

3.

[3] J.-Y. Audibert and O. Catoni. Robust linear least squares regression. Ann. Statist. 39 (2011) 2766–2794. MR2906886 1231.62126 10.1214/11-AOS918 euclid.aos/1324563355 [3] J.-Y. Audibert and O. Catoni. Robust linear least squares regression. Ann. Statist. 39 (2011) 2766–2794. MR2906886 1231.62126 10.1214/11-AOS918 euclid.aos/1324563355

4.

[4] J.-Y. Audibert and O. Catoni. Robust linear regression through PAC-Bayesian truncation. Unpublished manuscript, 2010. Available at  http://hal.inria.fr/hal-00522536.[4] J.-Y. Audibert and O. Catoni. Robust linear regression through PAC-Bayesian truncation. Unpublished manuscript, 2010. Available at  http://hal.inria.fr/hal-00522536.

5.

[5] R. Beran. An efficient and robust adaptive estimator of location. Ann. Statist. 6 (1978) 292–313. MR518885 0378.62051 10.1214/aos/1176344125 euclid.aos/1176344125 [5] R. Beran. An efficient and robust adaptive estimator of location. Ann. Statist. 6 (1978) 292–313. MR518885 0378.62051 10.1214/aos/1176344125 euclid.aos/1176344125

6.

[6] P. J. Bickel. On adaptive estimation. Ann. Statist. 10 (1982) 647–671. MR663424 0489.62033 10.1214/aos/1176345863 euclid.aos/1176345863 [6] P. J. Bickel. On adaptive estimation. Ann. Statist. 10 (1982) 647–671. MR663424 0489.62033 10.1214/aos/1176345863 euclid.aos/1176345863

7.

[7] O. Catoni. Statistical Learning Theory and Stochastic Optimization: École d’Été de Probabilités de Saint-Flour XXXI – 2001. Lecture Notes in Math. 1851. Springer, Berlin, 2004. MR2163920 1076.93002[7] O. Catoni. Statistical Learning Theory and Stochastic Optimization: École d’Été de Probabilités de Saint-Flour XXXI – 2001. Lecture Notes in Math. 1851. Springer, Berlin, 2004. MR2163920 1076.93002

8.

[8] O. Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. IMS Lecture Notes Monogr. Ser. 56. Institute of Mathematical Statistics, Beachwood, OH, 2007. MR2483528 05544465[8] O. Catoni. PAC-Bayesian Supervised Classification: The Thermodynamics of Statistical Learning. IMS Lecture Notes Monogr. Ser. 56. Institute of Mathematical Statistics, Beachwood, OH, 2007. MR2483528 05544465

9.

[9] P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist. 35 (1964) 73–101. MR161415 0136.39805 10.1214/aoms/1177703732 euclid.aoms/1177703732 [9] P. J. Huber. Robust estimation of a location parameter. Ann. Math. Statist. 35 (1964) 73–101. MR161415 0136.39805 10.1214/aoms/1177703732 euclid.aoms/1177703732

10.

[10] P. J. Huber. Robust Statistics. Wiley Series in Probability and Mathematical Statistics. Wiley-Interscience, New York, 1981. MR606374[10] P. J. Huber. Robust Statistics. Wiley Series in Probability and Mathematical Statistics. Wiley-Interscience, New York, 1981. MR606374

11.

[11] O. Lepski. Asymptotically minimax adaptive estimation I: Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36 (1991) 682–697. MR1147167[11] O. Lepski. Asymptotically minimax adaptive estimation I: Upper bounds. Optimally adaptive estimates. Theory Probab. Appl. 36 (1991) 682–697. MR1147167

12.

[12] D. A. McAllester. PAC-Bayesian model averaging. In Proceedings of the 12th Annual Conference on Computational Learning Theory. Morgan Kaufmann, New York, 1999. MR1811612[12] D. A. McAllester. PAC-Bayesian model averaging. In Proceedings of the 12th Annual Conference on Computational Learning Theory. Morgan Kaufmann, New York, 1999. MR1811612

13.

[13] D. A. McAllester. Some PAC-Bayesian theorems. Mach. Learn. 37 (1999) 355–363. MR1811587 0945.68157[13] D. A. McAllester. Some PAC-Bayesian theorems. Mach. Learn. 37 (1999) 355–363. MR1811587 0945.68157

14.

[14] D. A. McAllester. PAC-Bayesian stochastic model selection. Mach. Learn. 51 (2003) 5–21.[14] D. A. McAllester. PAC-Bayesian stochastic model selection. Mach. Learn. 51 (2003) 5–21.

15.

[15] C. J. Stone. Adaptive maximum likelihood estimators of a location parameter. Ann. Statist. 3 (1975) 267–284. MR362669 0303.62026 10.1214/aos/1176343056 euclid.aos/1176343056 [15] C. J. Stone. Adaptive maximum likelihood estimators of a location parameter. Ann. Statist. 3 (1975) 267–284. MR362669 0303.62026 10.1214/aos/1176343056 euclid.aos/1176343056
Copyright © 2012 Institut Henri Poincaré
Olivier Catoni "Challenging the empirical mean and empirical variance: A deviation study," Annales de l'Institut Henri Poincaré, Probabilités et Statistiques 48(4), 1148-1185, (November 2012). https://doi.org/10.1214/11-AIHP454
Published: November 2012
Vol.48 • No. 4 • November 2012
Back to Top