The Annals of Statistics

Local Rademacher complexities

Peter L. Bartlett, Olivier Bousquet, and Shahar Mendelson

Full-text: Open access


We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.

Article information

Ann. Statist., Volume 33, Number 4 (2005), 1497-1537.

First available in Project Euclid: 5 August 2005

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression 68Q32: Computational learning theory [See also 68T05]

Error bounds Rademacher averages data-dependent complexity concentration inequalities


Bartlett, Peter L.; Bousquet, Olivier; Mendelson, Shahar. Local Rademacher complexities. Ann. Statist. 33 (2005), no. 4, 1497--1537. doi:10.1214/009053605000000282.

Export citation


  • Bartlett, P. L., Boucheron, S. and Lugosi, G. (2002). Model selection and error estimation. Machine Learning 48 85–113.
  • Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2005). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. To appear.
  • Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3 463–482.
  • Bartlett, P. L. and Mendelson, S. (2003). Empirical minimization. Probab. Theory Related Fields. To appear. Available at bm-em-03.pdf.
  • Boucheron, S., Lugosi, G. and Massart, P. (2000). A sharp concentration inequality with applications. Random Structures Algorithms 16 277–292.
  • Boucheron, S., Lugosi, G. and Massart, P. (2003). Concentration inequalities using the entropy method. Ann. Probab. 31 1583–1614.
  • Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495–500.
  • Bousquet, O. (2003). Concentration inequalities for sub-additive functions using the entropy method. In Stochastic Inequalities and Applications (E. Giné, C. Houdré and D. Nualart, eds.) 213–247. Birkhäuser, Boston.
  • Bousquet, O., Koltchinskii, V. and Panchenko, D. (2002). Some local measures of complexity of convex hulls and generalization bounds. Computational Learning Theory. Lecture Notes in Artificial Intelligence 2375 59–73. Springer, Berlin.
  • Devroye, L., Gy örfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Univ. Press.
  • Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
  • Haussler, D. (1992). Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inform. and Comput. 100 78–150.
  • Haussler, D. (1995). Sphere packing numbers for subsets of the Boolean $n$-cube with bounded Vapnik–Chervonenkis dimension. J. Combin. Theory Ser. A 69 217–232.
  • Koltchinskii, V. (2001). Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory 47 1902–1914.
  • Koltchinskii, V. and Panchenko, D. (2000). Rademacher processes and bounding the risk of function learning. In High Dimensional Probability II (E. Giné, D. M. Mason and J. A. Wellner, eds.) 443–459. Birkhäuser, Boston.
  • Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer, New York.
  • Lee, W. S., Bartlett, P. L. and Williamson, R. C. (1998). The importance of convexity in learning with squared loss. IEEE Trans. Inform. Theory 44 1974–1980.
  • Lugosi, G. and Wegkamp, M. (2004). Complexity regularization via localized random penalties. Ann. Statist. 32 1679–1697.
  • Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829.
  • Massart, P. (2000). About the constants in Talagrand's concentration inequalities for empirical processes. Ann. Probab. 28 863–884.
  • Massart, P. (2000). Some applications of concentration inequalities to statistics. Probability theory. Ann. Fac. Sci. Toulouse Math. (6) 9 245–303.
  • McDiarmid, C. (1998). Concentration. In Probabilistic Methods for Algorithmic Discrete Mathematics (M. Habib, C. McDiarmid, J. Ramirez-Alfonsin and B. Reed, eds.) 195–248. Springer, New York.
  • Mendelson, S. (2002). Geometric parameters of kernel machines. Computational Learning Theory. Lecture Notes in Artificial Intelligence 2375 29–43. Springer, Berlin.
  • Mendelson, S. (2002). Rademacher averages and phase transitions in Glivenko–Cantelli classes. IEEE Trans. Inform. Theory 48 251–263.
  • Mendelson, S. (2002). Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 1977–1991.
  • Mendelson, S. (2003). A few notes on statistical learning theory. Advanced Lectures on Machine Learning. Lecture Notes in Comput. Sci. 2600 1–40. Springer, New York.
  • Pollard, D. (1984). Convergence of Stochastic Processes. Springer, Berlin.
  • Rio, E. (2001). Une inégalité de Bennett pour les maxima de processus empiriques. Ann. Inst. H. Poincaré Probab. Statist. 38 1053–1057.
  • Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 28–76.
  • van de Geer, S. (1987). A new approach to least-squares estimation, with applications. Ann. Statist. 15 587–602.
  • van de Geer, S. (2000). Empirical Processes in M-Estimation. Cambridge Univ. Press.
  • van der Vaart, A. (1998). Asymptotic Statistics. Cambridge Univ. Press.
  • van der Vaart, A. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. With Applications of Statistics. Springer, New York.
  • Vapnik, V. N. and Chervonenkis, A. Y. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264–280.