Bernoulli

  • Bernoulli
  • Volume 24, Number 3 (2018), 2176-2203.

On optimality of empirical risk minimization in linear aggregation

Adrien Saumard

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

In the first part of this paper, we show that the small-ball condition, recently introduced by (J. ACM 62 (2015) Art. 21, 25), may behave poorly for important classes of localized functions such as wavelets, piecewise polynomials or for trigonometric polynomials, in particular leading to suboptimal estimates of the rate of convergence of ERM for the linear aggregation problem. In a second part, we recover optimal rates of convergence for the excess risk of ERM when the dictionary is made of trigonometric functions. Considering the bounded case, we derive the concentration of the excess risk around a single point, which is an information far more precise than the rate of convergence. In the general setting of a $L_{2}$ noise, we finally refine the small ball argument by rightly selecting the directions we are looking at, in such a way that we obtain optimal rates of aggregation for the Fourier dictionary.

Article information

Source
Bernoulli, Volume 24, Number 3 (2018), 2176-2203.

Dates
Received: October 2016
First available in Project Euclid: 2 February 2018

Permanent link to this document
https://projecteuclid.org/euclid.bj/1517540472

Digital Object Identifier
doi:10.3150/17-BEJ925

Mathematical Reviews number (MathSciNet)
MR3757527

Zentralblatt MATH identifier
06839264

Keywords
empirical risk minimization excess risk’s concentration linear aggregation optimal rates small-ball property

Citation

Saumard, Adrien. On optimality of empirical risk minimization in linear aggregation. Bernoulli 24 (2018), no. 3, 2176--2203. doi:10.3150/17-BEJ925. https://projecteuclid.org/euclid.bj/1517540472


Export citation

References

  • [1] Arlot, S. and Massart, P. (2009). Data-driven calibration of penalties for least-squares regression. J. Mach. Learn. Res. 10 245–279 (electronic).
  • [2] Audibert, J.-Y. and Catoni, O. (2011). Robust linear least squares regression. Ann. Statist. 39 2766–2794.
  • [3] Bachman, G., Narici, L. and Beckenstein, E. (2000). Fourier and Wavelet Analysis. Universitext. New York: Springer.
  • [4] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
  • [5] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities. A Nonasymptotic Theory of Independence. Oxford: Oxford Univ. Press.
  • [6] Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–500.
  • [7] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • [8] Chatterjee, S. (2014). A new perspective on least squares under convex constraint. Ann. Statist. 42 2340–2381.
  • [9] Cohen, A., Daubechies, I. and Vial, P. (1993). Wavelets on the interval and fast wavelet transforms. Appl. Comput. Harmon. Anal. 1 54–81.
  • [10] de la Peña, V.H. and Giné, E. (1999). From dependence to independence, randomly stopped processes. $U$-statistics and processes. In Decoupling. Probability and Its Applications (New York). New York: Springer.
  • [11] Härdle, W., Kerkyacharian, G., Picard, D. and Tsybakov, A. (1998). Wavelets, Approximation, and Statistical Applications. Lecture Notes in Statistics 129. New York: Springer.
  • [12] Katznelson, Y. (2004). An Introduction to Harmonic Analysis, 3rd ed. Cambridge Mathematical Library. Cambridge: Cambridge Univ. Press.
  • [13] Klein, T. (2002). Une inégalité de concentration à gauche pour les processus empiriques. C. R. Math. Acad. Sci. Paris 334 501–504.
  • [14] Klein, T. and Rio, E. (2005). Concentration around the mean for maxima of empirical processes. Ann. Probab. 33 1060–1077.
  • [15] Koltchinskii, V. and Mendelson, S. (2015). Bounding the smallest singular value of a random matrix without concentration. Int. Math. Res. Not. IMRN 23 12991–13008.
  • [16] Lecué, G. and Mendelson, S. (2013). Learning subgaussian classes: Upper and minimax bounds. In Topics in Learning Theory – Societe Mathematique de France (S. Boucheron and N. Vayatis, eds.). To appear.
  • [17] Lecué, G. and Mendelson, S. (2014). Sparse recovery under weak moment assumptions. J. Eur. Math. Soc. (JEMS). Technical report. To appear.
  • [18] Lecué, G. and Mendelson, S. (2016). Regularization and the small-ball method i: Sparse recovery. arXiv:1601.05584.
  • [19] Lecué, G. and Mendelson, S. (2016). Performance of empirical risk minimization in linear aggregation. Bernoulli 22 1520–1534.
  • [20] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Berlin: Springer.
  • [21] Mendelson, S. (2014). Learning without concentration for general loss functions. Technical report. Australia: Technion, Israel and ANU.
  • [22] Mendelson, S. (2014). A remark on the diameter of random sections of convex bodies. In Geometric Aspects of Functional Analysis. Lecture Notes in Math. 2116 395–404. Springer, Cham.
  • [23] Mendelson, S. (2015). Learning without concentration. J. ACM 62 Art. 21, 25.
  • [24] Muro, A. and van de Geer, S. (2015). Concentration behavior of the penalized least squares estimator. Preprint. Available at arXiv:1511.08698.
  • [25] Navarro, F. and Saumard, A. (2017). Slope heuristics and V-Fold model selection in heteroscedastic regression using strongly localized bases. ESAIM: Probability and Statistics, to appear.
  • [26] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics. Lecture Notes in Math. 1738 85–277. Berlin: Springer.
  • [27] Rigollet, Ph. and Tsybakov, A.B. (2007). Linear and convex aggregation of density estimators. Math. Methods Statist. 16 260–280.
  • [28] Rio, E. (2001). Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Related Fields 119 163–175.
  • [29] Saumard, A. (2012). Optimal upper and lower bounds for the true and empirical excess risks in heteroscedastic least-squares regression. Electron. J. Stat. 6 579–655.
  • [30] Tsybakov, A.B. (1996). Introduction à L’estimation Non-paramétrique. Berlin: Springer.
  • [31] Tsybakov, A.B. (2003). Optimal rates of aggregation. In Learning Theory and Kernel Machines 303–313. Springer.
  • [32] van de Geer, S. and Wainwright, M. (2016). On concentration for (regularized) empirical risk minimization. Preprint. Available at arXiv:1512.00677.