Electronic Journal of Statistics
- Electron. J. Statist.
- Volume 7 (2013), 1-42.
Optimal regression rates for SVMs using Gaussian kernels
Mona Eberts and Ingo Steinwart
Full-text: Open access
Abstract
Support vector machines (SVMs) using Gaussian kernels are one of the standard and state-of-the-art learning algorithms. In this work, we establish new oracle inequalities for such SVMs when applied to either least squares or conditional quantile regression. With the help of these oracle inequalities we then derive learning rates that are (essentially) minmax optimal under standard smoothness assumptions on the target function. We further utilize the oracle inequalities to show that these learning rates can be adaptively achieved by a simple data-dependent parameter selection method that splits the data set into a training and a validation set.
Article information
Source
Electron. J. Statist. Volume 7 (2013), 1-42.
Dates
First available in Project Euclid: 11 January 2013
Permanent link to this document
http://projecteuclid.org/euclid.ejs/1357913280
Digital Object Identifier
doi:10.1214/12-EJS760
Mathematical Reviews number (MathSciNet)
MR3020412
Zentralblatt MATH identifier
1337.62073
Subjects
Primary: 62G08: Nonparametric regression
Secondary: 62G05: Estimation 68Q32: Computational learning theory [See also 68T05] 68T05: Learning and adaptive systems [See also 68Q32, 91E40]
Keywords
Least squares regression quantile estimation support vector machines
Citation
Eberts, Mona; Steinwart, Ingo. Optimal regression rates for SVMs using Gaussian kernels. Electron. J. Statist. 7 (2013), 1--42. doi:10.1214/12-EJS760. http://projecteuclid.org/euclid.ejs/1357913280.
References
- [1] Adams, R. A. and Fournier, J. J. F. (2003)., Sobolev Spaces, 2nd ed. Academic Press, New York.Mathematical Reviews (MathSciNet): MR2424078
- [2] Aronszajn, N. (1950). Theory of reproducing kernels., Trans. Amer. Math. Soc. 68 337–404.Mathematical Reviews (MathSciNet): MR51437
Digital Object Identifier: doi:10.1090/S0002-9947-1950-0051437-7 - [3] Berens, H. and Vore, R. D. (1978). Quantitative Korovkin Theorems for Positive Linear Operators on $L_p$- Spaces., Trans. Amer. Math. Soc. 245 349–361.Mathematical Reviews (MathSciNet): MR511414
- [4] Berlinet, A. and Thomas-Agnan, C. (2004)., Reproducing Kernel Hilbert Spaces in Probability and Statistics. Kluwer, Boston.Mathematical Reviews (MathSciNet): MR2239907
- [5] Caponnetto, A. and De Vito, E. (2007). Optimal rates for regularized least squares algorithm., Found. Comput. Math. 7 331–368.Mathematical Reviews (MathSciNet): MR2335249
Digital Object Identifier: doi:10.1007/s10208-006-0196-8 - [6] Carl, B. and Stephani, I. (1990)., Entropy, Compactness, and the Approximation of Operators. Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge.Mathematical Reviews (MathSciNet): MR1098497
- [7] Chaudhuri, P. (1991). Global nonparametric estimation of conditional quantile functions and their derivatives., J. Multivariate Anal. 39 246–269.Mathematical Reviews (MathSciNet): MR1147121
Digital Object Identifier: doi:10.1016/0047-259X(91)90100-G - [8] Chen, D.-R., Wu, Q., Ying, Y. and Zhou, D.-X. (2004). Support vector machine soft margin classifiers: Error analysis., J. Mach. Learn. Res. 5 1143–1175.Mathematical Reviews (MathSciNet): MR2248013
- [9] Cristianini, N. and Shawe-Taylor, J. (2000)., An Introduction to Support Vector Machines. Cambridge University Press, Cambridge.
- [10] Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning., Bull. Amer. Math. Soc. 39 1–49.Mathematical Reviews (MathSciNet): MR1864085
Digital Object Identifier: doi:10.1090/S0273-0979-01-00923-5 - [11] De Vito, E., Caponnetto, A. and Rosasco, L. (2005). Model selection for regularized least-squares algorithm in learning theory., Found. Comput. Math. 5 59–85.Mathematical Reviews (MathSciNet): MR2125691
Digital Object Identifier: doi:10.1007/s10208-004-0134-1 - [12] DeVore, R. A. and Lorentz, G. G. (1993)., Constructive approximation. Grundlehren Der Mathematischen Wissenschaften. Springer-Verlag, Berlin.Mathematical Reviews (MathSciNet): MR1261635
- [13] DeVore, R. A. and Popov, V. A. (1988). Interpolation of Besov Spaces., Trans. Amer. Math. Soc. 305 397–414.Mathematical Reviews (MathSciNet): MR920166
Digital Object Identifier: doi:10.1090/S0002-9947-1988-0920166-3 - [14] Eberts, M. and Steinwart, I. (2011). Optimal learning rates for least squares SVMs using Gaussian kernels. In, Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira and K. Q. Weinberger, eds.) 1539–1547.
- [15] Edmunds, D. E. and Triebel, H. (1996)., Function Spaces, Entropy Numbers, Differential Operators. Cambridge University Press, Cambridge.Mathematical Reviews (MathSciNet): MR1410258
- [16] Györfi, L., Kohler, M., Krzyżak, A. and Walk, H. (2002)., A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
- [17] Johnen, H. and Scherer, K. (1976). On the equivalence of the K-functional and moduli of continuity and some applications. In, Lecture Notes in Math., 571 119–140. Springer-Verlag, Berlin.
- [18] Keerthi, S. S. and Shevade, S. K. (2003). SMO Algorithm for Least Squares SVM Formulations., Neural Computation 15 487–507.
- [19] Koenker, R. (2005)., Quantile Regression, 1st ed. Econometric Society Monographs. Cambridge University Press.Mathematical Reviews (MathSciNet): MR2268657
- [20] Lee, S. S. (2003). Efficient Semiparametric estimation of a partially linear quantile regression model., Econometric Theory 19 1–31.Mathematical Reviews (MathSciNet): MR1965840
Digital Object Identifier: doi:10.1017/S0266466603191013 - [21] Li, Y., Liu, Y. and Zhu, J. (2007). Quantile Regression in Reproducing Kernel Hilbert Spaces., J. Amer. Statist. Assoc. 102 255–268.Mathematical Reviews (MathSciNet): MR2293307
Digital Object Identifier: doi:10.1198/016214506000000979 - [22] Mendelson, S. and Neeman, J. (2010). Regularization in Kernel Learning., Ann. Statist. 38 526–565.Mathematical Reviews (MathSciNet): MR2590050
Digital Object Identifier: doi:10.1214/09-AOS728
Project Euclid: euclid.aos/1262271623 - [23] Micchelli, C. A., Pontil, M., Wu, Q. and Zhou, D. X. (2005). Error bounds for learning the kernel., http://eprints.pascal-network.org/archive/00001014/.
- [24] Schölkopf, B. and Smola, A. J. (2002)., Learning with Kernels. MIT Press, Cambridge, MA.
- [25] Shen, X. (1998). On the Method of Penalization., Statist. Sinica 8 337–357.Mathematical Reviews (MathSciNet): MR1624410
- [26] Smale, S. and Zhou, D. X. (2003). Estimating the approximation error in learning theory., Anal. Appl. 1 17–41.Mathematical Reviews (MathSciNet): MR1959283
Digital Object Identifier: doi:10.1142/S0219530503000089 - [27] Smale, S. and Zhou, D. X. (2007). Learning theory estimates via integral operators and their approximations., Constr. Approx. 26 153–172.Mathematical Reviews (MathSciNet): MR2327597
Digital Object Identifier: doi:10.1007/s00365-006-0659-y - [28] Stein, E. M. (1970)., Singular integrals and differentiability properties of functions. Princeton University Press, Princeton, NJ.Mathematical Reviews (MathSciNet): MR290095
- [29] Steinwart, I. and Christmann, A. (2008a). How SVMs can estimate quantiles and the median. In, Advances in Neural Information Processing Systems 20 (J. C. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 305–312. MIT Press, Cambridge, MA.
- [30] Steinwart, I. and Christmann, A. (2008b)., Support Vector Machines. Springer, New York.Mathematical Reviews (MathSciNet): MR2450103
- [31] Steinwart, I. and Christmann, A. (2011). Estimating Conditional Quantiles with the Help of the Pinball Loss., Bernoulli 17 211–225.Mathematical Reviews (MathSciNet): MR2797989
Digital Object Identifier: doi:10.3150/10-BEJ267
Project Euclid: euclid.bj/1297173840 - [32] Steinwart, I., Hush, D. and Scovel, C. (2009). Optimal Rates for Regularized Least Squares Regression. In, Proceedings of the 22nd Annual Conference on Learning Theory (S. Dasgupta and A. Klivans, eds.) 79–93.
- [33] Steinwart, I. and Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels., Ann. Statist. 35 575–607.Mathematical Reviews (MathSciNet): MR2336860
Digital Object Identifier: doi:10.1214/009053606000001226
Project Euclid: euclid.aos/1183667285 - [34] Suzuki, T. (2011). Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning. In, Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. C. N. Pereira and K. Q. Weinberger, eds.) 1575–1583.
- [35] Takeuchi, I., Le, Q. V., Sears, T. D. and Smola, A. J. (2006). Nonparametric Quantile Estimation., J. Mach. Learn. Res. 7 1231–1264.Mathematical Reviews (MathSciNet): MR2274404
- [36] Temlyakov, V. (2006). Optimal estimators in learning theory., Banach Center Publications, Inst. Math. Polish Academy of Sciences 72 341–366.
- [37] Triebel, H. (2006)., Theory of function spaces III. Birkhäuser, Basel [u.a.].Mathematical Reviews (MathSciNet): MR2250142
- [38] Triebel, H. (2010)., Theory of Function Spaces, Repr. of the 1983 ed. Birkhäuser, Basel.Mathematical Reviews (MathSciNet): MR3024598
- [39] Wu, Q., Ying, Y. and Zhou, D.-X. (2006). Learning Rates of Least-Square Regularized Regression., Found. Comput. Math. 6 171–192.Mathematical Reviews (MathSciNet): MR2228738
Digital Object Identifier: doi:10.1007/s10208-004-0155-9 - [40] Xiang, D. H. and Zhou, D. X. (2009). Classification with Gaussians and Convex Loss., J. Mach. Learn. Res. 10 1447–1468.Mathematical Reviews (MathSciNet): MR2534867
- [41] Ye, G.-B. and Zhou, D.-X. (2008). Learning and approximation by Gaussians on Riemannian manifolds., Adv. Comput. Math. 29 291–310.Mathematical Reviews (MathSciNet): MR2438346
Digital Object Identifier: doi:10.1007/s10444-007-9049-0 - [42] Ying, Y. and Campbell, C. (2009). Generalization bounds for learning the kernel. In, Proceedings of the 22nd Annual Conference on Learning Theory (S. Dasgupta and A. Klivans, eds.).
- [43] Ying, Y. and Zhou, D.-X. (2007). Learnability of Gaussians with Flexible Variances., J. Mach. Learn. Res. 8 249–276.Mathematical Reviews (MathSciNet): MR2320669
The Institute of Mathematical Statistics and the Bernoulli Society

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- A Bernstein-type inequality for some mixing processes and dynamical systems with an application to learning
Hang, Hanyuan and Steinwart, Ingo, The Annals of Statistics, 2017 - Estimating conditional quantiles with the help of the pinball loss
Steinwart, Ingo and Christmann, Andreas, Bernoulli, 2011 - Statistical performance of support vector machines
Blanchard, Gilles, Bousquet, Olivier, and Massart, Pascal, The Annals of Statistics, 2008
- A Bernstein-type inequality for some mixing processes and dynamical systems with an application to learning
Hang, Hanyuan and Steinwart, Ingo, The Annals of Statistics, 2017 - Estimating conditional quantiles with the help of the pinball loss
Steinwart, Ingo and Christmann, Andreas, Bernoulli, 2011 - Statistical performance of support vector machines
Blanchard, Gilles, Bousquet, Olivier, and Massart, Pascal, The Annals of Statistics, 2008 - A new concentration result for regularized risk minimizers
Hush, Don, Scovel, Clint, and Steinwart, Ingo, High Dimensional Probability, 2006 - Sharp oracle inequalities for aggregation of affine estimators
Dalalyan, Arnak S. and Salmon, Joseph, The Annals of Statistics, 2012 - Fast rates for support vector machines using Gaussian kernels
Steinwart, Ingo and Scovel, Clint, The Annals of Statistics, 2007 - Fast learning rates in statistical inference through aggregation
Audibert, Jean-Yves, The Annals of Statistics, 2009 - A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets
Zhang, Yong and Wang, Dapeng, Abstract and Applied Analysis, 2013 - Globally adaptive quantile regression with ultra-high dimensional data
Zheng, Qi, Peng, Limin, and He, Xuming, The Annals of Statistics, 2015 - Data augmentation for support vector machines
Polson, Nicholas G. and Scott, Steven L., Bayesian Analysis, 2011
