The Annals of Statistics

Fast rates for support vector machines using Gaussian kernels

Ingo Steinwart and Clint Scovel

Full-text: Open access

Abstract

For binary classification we establish learning rates up to the order of n−1 for support vector machines (SVMs) with hinge loss and Gaussian RBF kernels. These rates are in terms of two assumptions on the considered distributions: Tsybakov’s noise assumption to establish a small estimation error, and a new geometric noise condition which is used to bound the approximation error. Unlike previously proposed concepts for bounding the approximation error, the geometric noise assumption does not employ any smoothness assumption.

Article information

Source
Ann. Statist., Volume 35, Number 2 (2007), 575-607.

Dates
First available in Project Euclid: 5 July 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1183667285

Digital Object Identifier
doi:10.1214/009053606000001226

Mathematical Reviews number (MathSciNet)
MR2336860

Zentralblatt MATH identifier
1127.68091

Subjects
Primary: 68Q32: Computational learning theory [See also 68T05]
Secondary: 62G20: Asymptotic properties 62G99: None of the above, but in this section 68T05: Learning and adaptive systems [See also 68Q32, 91E40] 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30} 41A46: Approximation by arbitrary nonlinear expressions; widths and entropy 41A99: None of the above, but in this section

Keywords
Support vector machines classification nonlinear discrimination learning rates noise assumption Gaussian RBF kernels

Citation

Steinwart, Ingo; Scovel, Clint. Fast rates for support vector machines using Gaussian kernels. Ann. Statist. 35 (2007), no. 2, 575--607. doi:10.1214/009053606000001226. https://projecteuclid.org/euclid.aos/1183667285


Export citation

References

  • Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337--404.
  • Bartlett, P. L., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497--1537.
  • Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification and risk bounds. J. Amer. Statist. Assoc. 101 138--156.
  • Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res. 3 463--482.
  • Bennett, C. and Sharpley, R. (1988). Interpolation of Operators. Academic Press, Boston.
  • Berg, C., Christensen, J. P. R. and Ressel, P. (1984). Harmonic Analysis on Semigroups. Springer, New York.
  • Bergh, J. and Löfström, J. (1976). Interpolation Spaces. An Introduction. Springer, Berlin.
  • Birman, M. Š. and Solomjak, M. Z. (1967). Piecewise polynomial approximations of functions of the classes $W^\alpha_p$. Mat. Sb. (N.S.) 73 331--355.
  • Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris 334 495--500.
  • Carl, B. and Stephani, I. (1990). Entropy, Compactness and the Approximation of Operators. Cambridge Univ. Press.
  • Courant, R. and Hilbert, D. (1953). Methods of Mathematical Physics 1. Interscience, New York.
  • Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines: And Other Kernel-Based Learning Methods. Cambridge Univ. Press.
  • Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning. Bull. Amer. Math. Soc. (N.S.) 39 1--49 (electronic).
  • Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • Howse, J., Hush, D. and Scovel, C. (2002). Linking learning strategies and performance for support vector machines. Available at www.c3.lanl.gov/ml/pubs_ml.shtml.
  • Klein, T. (2002). Une inégalité de concentration à gauche pour les processus empiriques. C. R. Math. Acad. Sci. Paris 334 501--504.
  • Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces. Isoperimetry and Processes. Springer, Berlin.
  • Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808--1829.
  • Marron, J. S. (1983). Optimal rates on convergence to Bayes risk in nonparametric discrimination. Ann. Statist. 11 1142--1155.
  • Massart, P. (2000). About the constants in Talagrand's concentration inequalities for empirical processes. Ann. Probab. 28 863--884.
  • Mendelson, S. (2002). Improving the sample complexity using global data. IEEE Trans. Inform. Theory 48 1977--1991.
  • O'Neil, R. (1963). Convolution operators and $L(p,q)$ spaces. Duke Math. J. 30 129--142.
  • Pietsch, A. (1980). Operator Ideals. North-Holland, Amsterdam.
  • Pietsch, A. (1987). Eigenvalues and $s$-Numbers. Geest und Portig, Leipzig.
  • Rio, E. (2001). Inégalités de concentration pour les processus empiriques de classes de parties. Probab. Theory Related Fields 119 163--175.
  • Schölkopf, B. and Smola, A. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press, Cambridge, MA.
  • Smale, S. and Zhou, D.-X. (2003). Estimating the approximation error in learning theory. Anal. Appl. (Singap.) 1 17--41.
  • Steinwart, I. (2002). On the influence of the kernel on the consistency of support vector machines. J. Mach. Learn. Res. 2 67--93.
  • Steinwart, I. (2002). Support vector machines are universally consistent. J. Complexity 18 768--791.
  • Steinwart, I. (2004). Sparseness of support vector machines. J. Mach. Learn. Res. 4 1071--1105.
  • Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers. IEEE Trans. Inform. Theory 51 128--142.
  • Steinwart, I., Hush, D. and Scovel, C. (2006). An explicit description of the reproducing kernel Hilbert spaces of Gaussian RBF kernels. IEEE Trans. Inform. Theory 52 4635--4643.
  • Talagrand, M. (1994). Sharper bounds for Gaussian and empirical processes. Ann. Probab. 22 28--76.
  • Triebel, H. (1978). Interpolation Theory, Function Spaces, Differential Operators. North-Holland, Amsterdam.
  • Tsybakov, A. B. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135--166.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. With Applications to Statistics. Springer, New York.
  • Wu, Q. and Zhou, D.-X. (2003). Analysis of support vector machine classification. Technical report, City Univ. Hong Kong.
  • Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45 2271--2284.
  • Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56--85.