Electronic Journal of Statistics

Classification via local multi-resolution projections

Jean-Baptiste Monnier

Full-text: Open access

Abstract

We focus on the supervised binary classification problem, which consists in guessing the label Y associated to a co-variate Xd, given a set of n independent and identically distributed co-variates and associated labels (Xi,Yi). We assume that the law of the random vector (X,Y) is unknown and the marginal law of X admits a density supported on a set ${\mathcal{A}}$. In the particular case of plug-in classifiers, solving the classification problem boils down to the estimation of the regression function $\eta(X)=\mathbb {E}[Y|X]$. Assuming first ${\mathcal{A}}$ to be known, we show how it is possible to construct an estimator of η by localized projections onto a multi-resolution analysis (MRA). In a second step, we show how this estimation procedure generalizes to the case where ${\mathcal{A}}$ is unknown. Interestingly, this novel estimation procedure presents similar theoretical performances as the celebrated local-polynomial estimator (LPE). In addition, it benefits from the lattice structure of the underlying MRA and thus outperforms the LPE from a computational standpoint, which turns out to be a crucial feature in many practical applications. Finally, we prove that the associated plug-in classifier can reach super-fast rates under a margin assumption.

Article information

Source
Electron. J. Statist., Volume 6 (2012), 382-420.

Dates
First available in Project Euclid: 19 March 2012

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1332162334

Digital Object Identifier
doi:10.1214/12-EJS677

Mathematical Reviews number (MathSciNet)
MR2988413

Zentralblatt MATH identifier
1274.62251

Subjects
Primary: 62G05: Estimation 62G08: Nonparametric regression
Secondary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 62H12: Estimation

Keywords
Nonparametric regression random design multi-resolution analysis supervised binary classification margin assumption

Citation

Monnier, Jean-Baptiste. Classification via local multi-resolution projections. Electron. J. Statist. 6 (2012), 382--420. doi:10.1214/12-EJS677. https://projecteuclid.org/euclid.ejs/1332162334


Export citation

References

  • [1] Antoniadis, A., Grégoire, G. and Vial, P. (1997). Random design wavelet curve smoothing., Statistics & Probability Letters 35 225-232.
  • [2] Antoniadis, A. and Pham, D. T. (1998). Wavelet regression for random or irregular design., Comput. Stat. Data An. 28 353-369.
  • [3] Audibert, J.-Y. (2004). Classification under polynomial entropy and margin assumptions and randomized estimators., Preprint, Laboratoire de Probabilités et Modèles Aléatoires, Université Paris 6 and 7.
  • [4] Audibert, J.-Y. and Tsybakov, A. B. (2007). Fast learning rates for plug-in classifiers., Ann. Stat. 35 608-633.
  • [5] Baraud, Y. (2002). Model selection for regression on a random design., ESAIM Probab. Statist. 6 127-146.
  • [6] Cai, T. T. and Brown, L. D. (1998). Wavelet shrinkage for nonequispaced samples., Ann. Stat. 26 1783-1799.
  • [7] Cohen, A. (2003)., Numerical analysis of wavelet methods. Studies in mathematics and its applications 32. North-Holland.
  • [8] Daubechies, I. (1992)., Ten lectures on wavelets. CBMS-NSF regional conference series in applied mathematics. Society for Industrial and Applied Mathematics.
  • [9] Delouille, V., Franke, J. and von Sachs, R. (2001). Nonparametric stochastic regression with design-adapted wavelets., The Indian Journal of Statistics 63 328-366.
  • [10] Delyon, B. and Juditsky, A. (1996). On minimax wavelet estimators., Appl. Comput. Harmon. Anal. 3 215-228.
  • [11] DeVore, R. A. and Lorentz, G. G. (1993)., Constructive approximation. Grundlehren Der Mathematischen Wissenschaften. Springer-Verlag.
  • [12] Devroye, L., Györfi, L. and Lugosi, G. (1996)., A probabilistic theory of pattern recognition. Springer-Verlag.
  • [13] Donoho, D. L. (1995). De-Noising by Soft-Thresholding., IEEE Trans. Inf. Theory 41(3) 613-627.
  • [14] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage., Biometrika 81 425-455.
  • [15] Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet Shrinkage: Asymptotia?, J. Roy. Stat. Soc. B 57(2) 301-369.
  • [16] Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding., Ann. Stat. 24(2) 508-539.
  • [17] Gaïffas, S. (2005). Convergence rates for pointwise curve estimation with a degenerate design., Math. Methods Statist. 1 1-27.
  • [18] Gaïffas, S. (2007). On pointwise adaptive curve estimation based on inhomogeneous data., ESAIM Probab. Statist. 11 344-364.
  • [19] Gaïffas, S. (2007). Sharp estimation in sup norm with random design., Statistics & Probability Letters 77 782-794.
  • [20] Greblicki, W. and Pawlak, M. (1982). A classification procedure using the multiple Fourier series., Inf. Sci. 26 115-126.
  • [21] Greblicki, W. and Pawlak, M. (1985). Fourier and Hermite series estimates of regression functions., Ann. Inst. Statist. Math. 37 443-454.
  • [22] Györfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2001)., A distribution-free theory of nonparametric regression. Springer Series in Statistics. Springer.
  • [23] Hall, P. and Turlach, B. A. (1997). Interpolation methods for nonlinear wavelet regression with irregularly spaced design., Ann. Stat. 25 1912-1925.
  • [24] Härdle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. B. (1997)., Wavelets, approximation and statistical applications. Springer Verlag, Berlin.
  • [25] Hastie, T., Tibshirani, R. and Friedman, J. (2001)., The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer-Verlag, Berlin.
  • [26] Horn, R. A. and Johnson, C. R. (1990)., Matrix analysis. Cambridge University Press.
  • [27] Kerkyacharian, G. and Picard, D. (1992). Density estimation in Besov spaces., Statistics & Probability Letters 13 15-24.
  • [28] Kerkyacharian, G. and Picard, D. (2004). Regression in random design and warped wavelets., Bernoulli 10 1053-1105.
  • [29] Kohler, M. (2003). Nonlinear orthogonal series estimates for random design regression., J. Stat. Plan. Infer. 115 491-520.
  • [30] Kohler, M. (2008). Multivariate orthogonal series estimates for random design regression., J. Stat. Plan. Infer. 138 3217-3237.
  • [31] Kovac, A. and Silverman, B. W. (2000). Extending the scope of wavelet regression methods by coefficient-dependent thresholding., J. Amer. Statistical Assoc. 95 172-183.
  • [32] Lepski, O. V., Mammen, E. and Spokoiny, V. G. (1997). Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors., Ann. Stat. 25 392-947.
  • [33] Malgouyres, G. and Lemarié-Rieusset, P.-G. (1991). On the support of the scaling function in a multi-resolution analysis., Comptes rendus de l’Académie des sciences 313 377-380.
  • [34] Mallat, S. (1989). Multiresolution approximations and wavelet orthonormal bases of, L2. Trans. Amer. Math. Soc. 315 69-87.
  • [35] Mallat, S. (2008)., A wavelet tour of signal processing: the sparse way. Academic Press.
  • [36] Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis., Ann. Stat. 27 1808-1829.
  • [37] Marron, J. S. (1983). Optimal rates of convergence to the Bayes risk in nonparametric discrimination., Ann. Stat. 11 1142-1155.
  • [38] Meyer, Y. (1992)., Wavelets and operators. Cambridge Studies in Advanced Mathematics 37. Cambridge University Press.
  • [39] Monnier, J.-B. (2011). Classification via local multi-resolution projections Technical Report, LPMA, Université Paris Diderot - Paris, 7.
  • [40] Nadaraya, E. A. (1964). On estimating regression., Theory Probab. Appl. 9 141-142.
  • [41] Neumann, M. H. and Spokoiny, V. G. (1995). On the efficiency of wavelet estimators under arbitrary error distributions., Math. Methods Statist. 4 137-166.
  • [42] Pensky, M. and Vidakovic, B. (2001). On non-equally spaced wavelet regression., Ann. Inst. Statist. Math. 53 681-690.
  • [43] Picard, D. and Tribouley, K. (2000). Adaptive confidence interval for pointwise curve estimation., Ann. Stat. 28 298-335.
  • [44] Sardy, S., Percival, D. B., Bruce, A. G., Gao, H.-Y. and Stuetzle, W. (1999). Wavelet shrinkage for unequally spaced data., Satistics and Computing 9 65-75.
  • [45] Stone, C. J. (1980). Optimal rates of convergence for nonparametric estimators., Ann. Stat. 8 1348-1360.
  • [46] Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression., Ann. Stat. 10 1040-1053.
  • [47] Sweldens, W. (1996). The lifting scheme: a custom-design construction of biorthogonal wavelets., Appl. Comput. Harmon. Anal. 3 186-200.
  • [48] Vapnik, V. N. (1998)., Statistical learning theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. John Wiley & Sons.
  • [49] Watson, G. S. (1964). Smooth regression analysis., The Indian Journal of Statistics 26 359-372.
  • [50] Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence., IEEE Trans. Inf. Theory 45 2271-2284.
  • [51] Zhang, S., Wong, M.-Y. and Zheng, Z. (2002). Wavelet threshold estimation of a regression function with random design., J. Multivariate Anal. 80 256-284.