Electronic Journal of Statistics

Optimal regression rates for SVMs using Gaussian kernels

Mona Eberts and Ingo Steinwart

Full-text: Open access

Abstract

Support vector machines (SVMs) using Gaussian kernels are one of the standard and state-of-the-art learning algorithms. In this work, we establish new oracle inequalities for such SVMs when applied to either least squares or conditional quantile regression. With the help of these oracle inequalities we then derive learning rates that are (essentially) minmax optimal under standard smoothness assumptions on the target function. We further utilize the oracle inequalities to show that these learning rates can be adaptively achieved by a simple data-dependent parameter selection method that splits the data set into a training and a validation set.

Article information

Source
Electron. J. Statist., Volume 7 (2013), 1-42.

Dates
First available in Project Euclid: 11 January 2013

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1357913280

Digital Object Identifier
doi:10.1214/12-EJS760

Mathematical Reviews number (MathSciNet)
MR3020412

Zentralblatt MATH identifier
1337.62073

Subjects
Primary: 62G08: Nonparametric regression
Secondary: 62G05: Estimation 68Q32: Computational learning theory [See also 68T05] 68T05: Learning and adaptive systems [See also 68Q32, 91E40]

Keywords
Least squares regression quantile estimation support vector machines

Citation

Eberts, Mona; Steinwart, Ingo. Optimal regression rates for SVMs using Gaussian kernels. Electron. J. Statist. 7 (2013), 1--42. doi:10.1214/12-EJS760. https://projecteuclid.org/euclid.ejs/1357913280


Export citation

References

  • [1] Adams, R. A. and Fournier, J. J. F. (2003)., Sobolev Spaces, 2nd ed. Academic Press, New York.
  • [2] Aronszajn, N. (1950). Theory of reproducing kernels., Trans. Amer. Math. Soc. 68 337–404.
  • [3] Berens, H. and Vore, R. D. (1978). Quantitative Korovkin Theorems for Positive Linear Operators on $L_p$- Spaces., Trans. Amer. Math. Soc. 245 349–361.
  • [4] Berlinet, A. and Thomas-Agnan, C. (2004)., Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, Boston.
  • [5] Caponnetto, A. and De Vito, E. (2007). Optimal rates for regularized least squares algorithm., Found. Comput. Math. 7 331–368.
  • [6] Carl, B. and Stephani, I. (1990)., Entropy, Compactness, and the Approximation of Operators. Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge.
  • [7] Chaudhuri, P. (1991). Global nonparametric estimation of conditional quantile functions and their derivatives., J. Multivariate Anal. 39 246–269.
  • [8] Chen, D.-R., Wu, Q., Ying, Y. and Zhou, D.-X. (2004). Support vector machine soft margin classifiers: Error analysis., J. Mach. Learn. Res. 5 1143–1175.
  • [9] Cristianini, N. and Shawe-Taylor, J. (2000)., An Introduction to Support Vector Machines. Cambridge University Press, Cambridge.
  • [10] Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning., Bull. Amer. Math. Soc. 39 1–49.
  • [11] De Vito, E., Caponnetto, A. and Rosasco, L. (2005). Model selection for regularized least-squares algorithm in learning theory., Found. Comput. Math. 5 59–85.
  • [12] DeVore, R. A. and Lorentz, G. G. (1993)., Constructive approximation. Grundlehren Der Mathematischen Wissenschaften. Springer-Verlag, Berlin.
  • [13] DeVore, R. A. and Popov, V. A. (1988). Interpolation of Besov Spaces., Trans. Amer. Math. Soc. 305 397–414.
  • [14] Eberts, M. and Steinwart, I. (2011). Optimal learning rates for least squares SVMs using Gaussian kernels. In, Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira and K. Q. Weinberger, eds.) 1539–1547.
  • [15] Edmunds, D. E. and Triebel, H. (1996)., Function Spaces, Entropy Numbers, Differential Operators. Cambridge University Press, Cambridge.
  • [16] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002)., A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
  • [17] Johnen, H. and Scherer, K. (1976). On the equivalence of the K-functional and moduli of continuity and some applications. In, Lecture Notes in Math., 571 119–140. Springer-Verlag, Berlin.
  • [18] Keerthi, S. S. and Shevade, S. K. (2003). SMO Algorithm for Least Squares SVM Formulations., Neural Computation 15 487–507.
  • [19] Koenker, R. (2005)., Quantile Regression, 1st ed. Econometric Society Monographs. Cambridge University Press.
  • [20] Lee, S. S. (2003). Efficient Semiparametric estimation of a partially linear quantile regression model., Econometric Theory 19 1–31.
  • [21] Li, Y., Liu, Y. and Zhu, J. (2007). Quantile Regression in Reproducing Kernel Hilbert Spaces., J. Amer. Statist. Assoc. 102 255–268.
  • [22] Mendelson, S. and Neeman, J. (2010). Regularization in Kernel Learning., Ann. Statist. 38 526–565.
  • [23] Micchelli, C. A., Pontil, M., Wu, Q. and Zhou, D. X. (2005). Error bounds for learning the kernel., http://eprints.pascal-network.org/archive/00001014/.
  • [24] Schölkopf, B. and Smola, A. J. (2002)., Learning with Kernels. MIT Press, Cambridge, MA.
  • [25] Shen, X. (1998). On the Method of Penalization., Statist. Sinica 8 337–357.
  • [26] Smale, S. and Zhou, D. X. (2003). Estimating the approximation error in learning theory., Anal. Appl. 1 17–41.
  • [27] Smale, S. and Zhou, D. X. (2007). Learning theory estimates via integral operators and their approximations., Constr. Approx. 26 153–172.
  • [28] Stein, E. M. (1970)., Singular integrals and differentiability properties of functions. Princeton University Press, Princeton, NJ.
  • [29] Steinwart, I. and Christmann, A. (2008a). How SVMs can estimate quantiles and the median. In, Advances in Neural Information Processing Systems 20 (J. C. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 305–312. MIT Press, Cambridge, MA.
  • [30] Steinwart, I. and Christmann, A. (2008b)., Support Vector Machines. Springer, New York.
  • [31] Steinwart, I. and Christmann, A. (2011). Estimating Conditional Quantiles with the Help of the Pinball Loss., Bernoulli 17 211–225.
  • [32] Steinwart, I., Hush, D. and Scovel, C. (2009). Optimal Rates for Regularized Least Squares Regression. In, Proceedings of the 22nd Annual Conference on Learning Theory (S. Dasgupta and A. Klivans, eds.) 79–93.
  • [33] Steinwart, I. and Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels., Ann. Statist. 35 575–607.
  • [34] Suzuki, T. (2011). Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning. In, Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira and K. Q. Weinberger, eds.) 1575–1583.
  • [35] Takeuchi, I., Le, Q. V., Sears, T. D. and Smola, A. J. (2006). Nonparametric Quantile Estimation., J. Mach. Learn. Res. 7 1231–1264.
  • [36] Temlyakov, V. (2006). Optimal estimators in learning theory., Banach Center Publications, Inst. Math. Polish Academy of Sciences 72 341–366.
  • [37] Triebel, H. (2006)., Theory of function spaces III. Birkhäuser, Basel [u.a.].
  • [38] Triebel, H. (2010)., Theory of Function Spaces, Repr. of the 1983 ed. Birkhäuser, Basel.
  • [39] Wu, Q., Ying, Y. and Zhou, D.-X. (2006). Learning Rates of Least-Square Regularized Regression., Found. Comput. Math. 6 171–192.
  • [40] Xiang, D. H. and Zhou, D. X. (2009). Classification with Gaussians and Convex Loss., J. Mach. Learn. Res. 10 1447–1468.
  • [41] Ye, G.-B. and Zhou, D.-X. (2008). Learning and approximation by Gaussians on Riemannian manifolds., Adv. Comput. Math. 29 291–310.
  • [42] Ying, Y. and Campbell, C. (2009). Generalization bounds for learning the kernel. In, Proceedings of the 22nd Annual Conference on Learning Theory (S. Dasgupta and A. Klivans, eds.).
  • [43] Ying, Y. and Zhou, D.-X. (2007). Learnability of Gaussians with Flexible Variances., J. Mach. Learn. Res. 8 249–276.