Bernoulli

  • Bernoulli
  • Volume 21, Number 1 (2015), 176-208.

Minimax fast rates for discriminant analysis with errors in variables

Sébastien Loustau and Clément Marteau

Full-text: Open access

Abstract

The effect of measurement errors in discriminant analysis is investigated. Given observations $Z=X+\varepsilon$, where $\varepsilon$ denotes a random noise, the goal is to predict the density of $X$ among two possible candidates $f$ and $g$. We suppose that we have at our disposal two learning samples. The aim is to approach the best possible decision rule $G^{\star}$ defined as a minimizer of the Bayes risk.

In the free-noise case ($\varepsilon=0$), minimax fast rates of convergence are well-known under the margin assumption in discriminant analysis (see (Ann. Statist. 27 (1999) 1808–1829)) or in the more general classification framework (see (Ann. Statist. 35 (2002) 608–633, Ann. Statist. 32 (2004) 135–166)). In this paper, we intend to establish similar results in the noisy case, that is, when dealing with errors in variables. We prove minimax lower bounds for this problem and explain how can these rates be attained, using in particular an Empirical Risk Minimizer (ERM) method based on deconvolution kernel estimators.

Article information

Source
Bernoulli, Volume 21, Number 1 (2015), 176-208.

Dates
First available in Project Euclid: 17 March 2015

Permanent link to this document
https://projecteuclid.org/euclid.bj/1426597067

Digital Object Identifier
doi:10.3150/13-BEJ564

Mathematical Reviews number (MathSciNet)
MR3322316

Zentralblatt MATH identifier
06436791

Keywords
classification deconvolution minimax theory fast rates

Citation

Loustau, Sébastien; Marteau, Clément. Minimax fast rates for discriminant analysis with errors in variables. Bernoulli 21 (2015), no. 1, 176--208. doi:10.3150/13-BEJ564. https://projecteuclid.org/euclid.bj/1426597067


Export citation

References

  • [1] Audibert, J.-Y. (2004). Classification under polynomial entropy and margin assumptions and randomized estimators. Preprint, Laboratoire de Probabilités et Modéles Aléatoires, Univ. Paris VI and VII.
  • [2] Audibert, J.-Y. and Tsybakov, A.B. (2007). Fast learning rates for plug-in classifiers. Ann. Statist. 35 608–633.
  • [3] Bartlett, P.L., Boucheron, S. and Lugosi, G. (2002). Model selection and error estimation. Machine Learning 48 85–113.
  • [4] Bartlett, P.L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497–1537.
  • [5] Bartlett, P.L. and Mendelson, S. (2006). Empirical minimization. Probab. Theory Related Fields 135 311–334.
  • [6] Bickel, P.J. and Ritov, Y. (2003). Nonparametric estimators which can be “plugged-in.” Ann. Statist. 31 1033–1053.
  • [7] Boucheron, S., Bousquet, O. and Lugosi, G. (2005). Theory of classification: A survey of some recent advances. ESAIM Probab. Stat. 9 323–375.
  • [8] Butucea, C. (2007). Goodness-of-fit testing and quadratic functional estimation from indirect observations. Ann. Statist. 35 1907–1930.
  • [9] Carroll, R.J., Delaigle, A. and Hall, P. (2009). Nonparametric prediction in measurement error models. J. Amer. Statist. Assoc. 104 993–1003.
  • [10] Chapelle, O., Weston, J., Bottou, L. and Vapnik, V. (2001). Vicinal risk minimization. In Advances in Neural Information Processing Systems 416–422. Cambridge, MA: MIT Press.
  • [11] Delaigle, A. and Gijbels, I. (2006). Estimation of boundary and discontinuity points in deconvolution problems. Statist. Sinica 16 773–788.
  • [12] Delaigle, A., Hall, P. and Meister, A. (2008). On deconvolution with repeated measurements. Ann. Statist. 36 665–685.
  • [13] Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York) 31. New York: Springer.
  • [14] Engl, W.H., Hanke, M. and Neubauer, A. (2000). Regularization of Inverse Problems. Dordrecht: Kluwer Academic Publishers Group.
  • [15] Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. 19 1257–1272.
  • [16] Fan, J. and Truong, Y.K. (1993). Nonparametric regression with errors in variables. Ann. Statist. 21 1900–1925.
  • [17] Genovese, C.R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2012). Minimax manifold estimation. J. Mach. Learn. Res. 13 1263–1291.
  • [18] Goldstein, L. and Messer, K. (1992). Optimal plug-in estimators for nonparametric functional estimation. Ann. Statist. 20 1306–1328.
  • [19] Klemelä, J. and Mammen, E. (2010). Empirical risk minimization in inverse problems. Ann. Statist. 38 482–511.
  • [20] Koltchinskii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593–2656.
  • [21] Korostelëv, A.P. and Tsybakov, A.B. (1993). Minimax Theory of Image Reconstruction. Lecture Notes in Statistics 82. New York: Springer.
  • [22] Laurent, B., Loubes, J.-M. and Marteau, C. (2011). Testing inverse problems: A direct or an indirect problem? J. Statist. Plann. Inference 141 1849–1861.
  • [23] Loubes, J.M. and Marteau, C. (2014). Goodness-of-fit strategies from indirect observations. J. Nonparametr. Statist. To appear.
  • [24] Loustau, S. (2009). Penalized empirical risk minimization over Besov spaces. Electron. J. Stat. 3 824–850.
  • [25] Mallat, S. (2000). Une Exploration des Signaux en Ondelettes. Paris: Éditions de l’École Polytechnique, Ellipses diffusion.
  • [26] Mammen, E. and Tsybakov, A.B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829.
  • [27] Massart, P. and Nédélec, É. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326–2366.
  • [28] Meister, A. (2009). Deconvolution Problems in Nonparametric Statistics. Lecture Notes in Statistics 193. Berlin: Springer.
  • [29] Mendelson, S. (2004). On the performance of kernel classes. J. Mach. Learn. Res. 4 759–771.
  • [30] Tsybakov, A.B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.
  • [31] Tsybakov, A.B. and van de Geer, S.A. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203–1224.
  • [32] van de Geer, S.A. (2000). Empirical Processes in M-estimation. Cambridge: Cambridge Univ. Press.
  • [33] van der Vaart, A.W. and Wellner, J.A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. New York: Springer.
  • [34] Vapnik, V.N. (2000). The Nature of Statistical Learning Theory, 2nd ed. Statistics for Engineering and Information Science. New York: Springer.
  • [35] Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45 2271–2284.