The Annals of Statistics
- Ann. Statist.
- Volume 36, Number 2 (2008), 489-531.
Statistical performance of support vector machines
Gilles Blanchard, Olivier Bousquet, and Pascal Massart
Full-text: Open access
Abstract
The support vector machine (SVM) algorithm is well known to the computer learning community for its very good practical results. The goal of the present paper is to study this algorithm from a statistical perspective, using tools of concentration theory and empirical processes.
Our main result builds on the observation made by other authors that the SVM can be viewed as a statistical regularization procedure. From this point of view, it can also be interpreted as a model selection principle using a penalized criterion. It is then possible to adapt general methods related to model selection in this framework to study two important points: (1) what is the minimum penalty and how does it compare to the penalty actually used in the SVM algorithm; (2) is it possible to obtain “oracle inequalities” in that setting, for the specific loss function used in the SVM algorithm? We show that the answer to the latter question is positive and provides relevant insight to the former. Our result shows that it is possible to obtain fast rates of convergence for SVMs.
Article information
Source
Ann. Statist. Volume 36, Number 2 (2008), 489-531.
Dates
First available in Project Euclid: 13 March 2008
Permanent link to this document
http://projecteuclid.org/euclid.aos/1205420509
Digital Object Identifier
doi:10.1214/009053607000000839
Mathematical Reviews number (MathSciNet)
MR2396805
Zentralblatt MATH identifier
1133.62044
Subjects
Primary: 62G05: Estimation 62G20: Asymptotic properties
Keywords
Classification support vector machine model selection oracle inequality
Citation
Blanchard, Gilles; Bousquet, Olivier; Massart, Pascal. Statistical performance of support vector machines. Ann. Statist. 36 (2008), no. 2, 489--531. doi:10.1214/009053607000000839. http://projecteuclid.org/euclid.aos/1205420509.
References
- Alon, N., Ben-David, S., Cesa-Bianchi, N. and Haussler, D. (1997). Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44 615–631.Mathematical Reviews (MathSciNet): MR1481318
Digital Object Identifier: doi: 10.1145/263867.263927
Zentralblatt MATH: 0891.68086 - Bartlett, P. (1998). The sample complexity of pattern classification with neural networks: The size of the weights is more important than the size of the network. IEEE Trans. Inform. Theory 44 525–536.Mathematical Reviews (MathSciNet): MR1607706
Digital Object Identifier: doi: 10.1109/18.661502
Zentralblatt MATH: 0901.68177 - Bartlett, P., Bousquet, O. and Mendelson, S. (2005). Local Rademacher complexities. Ann. Statist. 33 1497–1537.Mathematical Reviews (MathSciNet): MR2166554
Digital Object Identifier: doi: 10.1214/009053605000000282
Project Euclid: euclid.aos/1123250221 - Bartlett, P., Jordan, M. and McAuliffe, J. (2006). Convexity, classification, and risk bounds. J. Amer. Statist. Assoc. 101 138–156.Mathematical Reviews (MathSciNet): MR2268032
Digital Object Identifier: doi: 10.1198/016214505000000907
Zentralblatt MATH: 1118.62330 - Bartlett, P. and Mendelson, S. (2002). Rademacher and Gaussian complexities: Risk bounds and structural results. J. Machine Learning Research 3 463–482.Mathematical Reviews (MathSciNet): MR1984026
Digital Object Identifier: doi: 10.1162/153244303321897690
Zentralblatt MATH: 1084.68549 - Bartlett, P. and Mendelson, S. (2006). Empirical minimization. Probab. Theory Related Fields 135 311–334.Mathematical Reviews (MathSciNet): MR2240689
Digital Object Identifier: doi: 10.1007/s00440-005-0462-3
Zentralblatt MATH: 1142.62348 - Birman, M. S. and Solomyak, M. Z. (1967). Piecewise polynomial approximations of functions of the class Wαp. Mat. USSR Sb. 73 295–317.Zentralblatt MATH: 0173.16001
- Blanchard, G., Bousquet, O. and Zwald, L. (2007). Statistical properties of kernel principal component analysis. Machine Learning 66 259–294.
- Blanchard, G., Lugosi, G. and Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers. J. Machine Learning Research 4 861–894.
- Bousquet, O. (2002). A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Acad. Sci. Paris Ser. I 334 495–500.Mathematical Reviews (MathSciNet): MR1890640
Digital Object Identifier: doi: 10.1016/S1631-073X(02)02292-6
Zentralblatt MATH: 1001.60021 - Bousquet, O. and Elisseeff, A. (2002). Stability and generalization. J. Machine Learning Research 2 499–526.Mathematical Reviews (MathSciNet): MR1929416
Digital Object Identifier: doi: 10.1162/153244302760200704
Zentralblatt MATH: 1007.68083 - Chen, D.-R., Wu, Q., Ying, Y. and Zhou, D.-X. (2004). Support vector machine soft margin classifiers: Error analysis. J. Machine Learning Research 5 1143–1175.Mathematical Reviews (MathSciNet): MR2248013
- Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge Univ. Press.Zentralblatt MATH: 0994.68074
- Edmunds, D. E. and Triebel, H. (1996). Function Spaces, Entropy Numbers, Differential Operators. Cambridge Univ. Press.
- Evgeniou, T., Pontil, M. and Poggio, T. (2000). Regularization networks and support vector machines. Adv. Comput. Math. 13 1–50.Mathematical Reviews (MathSciNet): MR1759187
Digital Object Identifier: doi: 10.1023/A:1018946025316
Zentralblatt MATH: 0939.68098 - Koltchinksii, V. (2006). Local Rademacher complexities and oracle inequalities in risk minimization. Ann. Statist. 34 2593–2656.Mathematical Reviews (MathSciNet): MR2329442
Digital Object Identifier: doi: 10.1214/009053606000001019
Project Euclid: euclid.aos/1179935055
Zentralblatt MATH: 1118.62065 - Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 1–50.
- Lecué, G. (2007). Simultaneous adaptation to the margin and to complexity in classification. Ann. Statist. 35 1698–1721.Mathematical Reviews (MathSciNet): MR2351102
Digital Object Identifier: doi: 10.1214/009053607000000055
Project Euclid: euclid.aos/1188405627
Zentralblatt MATH: 05201518 - Lin, Y. (2002). Support vector machines and the Bayes rule in classification. Data Mining and Knowledge Discovery 6 259–275.Mathematical Reviews (MathSciNet): MR1917926
Digital Object Identifier: doi: 10.1023/A:1015469627679 - Lugosi, G. and Wegkamp, M. (2004). Complexity regularization via localized random penalties. Ann. Statist. 32 1679–1697.Mathematical Reviews (MathSciNet): MR2089138
Digital Object Identifier: doi: 10.1214/009053604000000463
Project Euclid: euclid.aos/1091626183
Zentralblatt MATH: 1045.62060 - Massart, P. (2000). About the constants in Talagrand’s inequality for empirical processes. Ann. Probab. 28 863–884.Mathematical Reviews (MathSciNet): MR1782276
Digital Object Identifier: doi: 10.1214/aop/1019160263
Project Euclid: euclid.aop/1019160263
Zentralblatt MATH: 1140.60310 - Massart, P. (2000). Some applications of concentration inequalities in statistics. Ann. Fac. Sci. Toulouse Math. 9 245–303.Mathematical Reviews (MathSciNet): MR1813803
- Massart, P. and Nédelec, E. (2006). Risk bounds for statistical learning. Ann. Statist. 34 2326–2366.Mathematical Reviews (MathSciNet): MR2291502
Digital Object Identifier: doi: 10.1214/009053606000000786
Project Euclid: euclid.aos/1169571799
Zentralblatt MATH: 1108.62007 - Massart, P. (2007). Concentration Inequalities and Model Selection. Lectures on Probability Theory and Statistics. Ecole d’Été de Probabilités de Saint-Flour XXXIII—2003. Lecture Notes in Math. 1896. Springer, Berlin.Mathematical Reviews (MathSciNet): MR2319879
- Mendelson, S. (2003). Estimating the performance of kernel classes. J. Machine Learning Research 4 759–771.Mathematical Reviews (MathSciNet): MR2075996
Digital Object Identifier: doi: 10.1162/1532443041424337
Zentralblatt MATH: 1083.68097 - De Vito, E., Rosasco, L., Caponnetto, A., De Giovanni, U. and Odone, F. (2005). Learning from examples as an inverse problem. J. Machine Learning Research 6 883–904.Mathematical Reviews (MathSciNet): MR2249842
- Schölkopf, B., Smola, A. J. and Williamson, R. C. (2001). Generalization performance of regularization networks and support vector machines via entropy numbers of compact operators. IEEE Trans. Inform. Theory 47 2516–2532.Mathematical Reviews (MathSciNet): MR1873936
Digital Object Identifier: doi: 10.1109/18.945262
Zentralblatt MATH: 1008.62507 - Shawe-Taylor, J., Williams, C., Cristianini, N. and Kandola, J. (2005). On the eigenspectrum of the Gram matrix and the generalisation error of kernel PCA. IEEE Trans. Inform. Theory 51 2510–2522.Mathematical Reviews (MathSciNet): MR2246374
Digital Object Identifier: doi: 10.1109/TIT.2005.850052 - Schölkopf, B. and Smola, A. (2002). Learning with Kernels. MIT Press.
- Smola, A. and Schölkopf, B. (1998). From regularization operators to support vector kernels. In Advances in Neural Information Processings Systems 10 (M. I. Jordan, M. J. Kearns and S. A. Solla, eds.) 343–349. MIT Press.
- Steinwart, I. (2002). Support vector machines are universally consistent. J. Complexity 18 768–791.Mathematical Reviews (MathSciNet): MR1928806
Digital Object Identifier: doi: 10.1006/jcom.2002.0642
Zentralblatt MATH: 1030.68074 - Steinwart, I. and Scovel, C. (2007). Fast rates for support vector machines using Gaussian kernels. Ann. Statist. 35 575–607.Mathematical Reviews (MathSciNet): MR2336860
Digital Object Identifier: doi: 10.1214/009053606000001226
Project Euclid: euclid.aos/1183667285
Zentralblatt MATH: 1127.68091 - Steinwart, I., Hush, D. and Scovel, C. (2006). A new concentration result for regularized risk minimizers. In High-Dimensional Probability IV 260–275. IMS Lecture Notes—Monograph Series 51.Mathematical Reviews (MathSciNet): MR2387774
Digital Object Identifier: doi: 10.1214/074921706000000897
Zentralblatt MATH: 1127.68090 - Tarigan, B. and van de Geer, S. (2006). Adaptivity of support vector machines with ℓ1 penalty. Bernoulli 12 1045–1076.Zentralblatt MATH: 1118.62067
- Tsybakov, A. (2004). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.Mathematical Reviews (MathSciNet): MR2051002
Digital Object Identifier: doi: 10.1214/aos/1079120131
Project Euclid: euclid.aos/1079120131
Zentralblatt MATH: 1105.62353 - Tsybakov, A. and van de Geer, S. (2005). Square root penalty: Adaptation to the margin in classification and in edge estimation. Ann. Statist. 33 1203–1224.Mathematical Reviews (MathSciNet): MR2195633
Digital Object Identifier: doi: 10.1214/009053604000001066
Project Euclid: euclid.aos/1120224100 - Vapnik, V. (1998). Statistical Learning Theory. Wiley, New York.
- Vapnik, V. and Chervonenkis, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264–280.
- Yang, Y. (1999). Minimax nonparametric classification. I. Rates of convergence. IEEE Trans. Inform. Theory 45 2271–2284.Mathematical Reviews (MathSciNet): MR1725115
Digital Object Identifier: doi: 10.1109/18.796368
Zentralblatt MATH: 0962.62026 - Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56–85.Mathematical Reviews (MathSciNet): MR2051001
Digital Object Identifier: doi: 10.1214/aos/1079120130
Project Euclid: euclid.aos/1079120130
Zentralblatt MATH: 1105.62323 - Zhou, D.-X. (2003). Capacity of reproducing kernel spaces in learning theory. IEEE Trans. Inform. Theory 49 1743–1752.Mathematical Reviews (MathSciNet): MR1985575
Digital Object Identifier: doi: 10.1109/TIT.2003.813564

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Optimal regression rates for SVMs using Gaussian kernels
Eberts, Mona and Steinwart, Ingo, Electronic Journal of Statistics, 2013 - An Interior Point Method for
L
1
/
2
-SVM and Application to Feature Selection in Classification
Yao, Lan, Zhang, Xiongji, Li, Dong-Hui, Zeng, Feng, and Chen, Haowen, Journal of Applied Mathematics, 2014 - Sharp oracle inequalities and slope heuristic for specification probabilities estimation in discrete random fields
Lerasle, Matthieu and Takahashi, Daniel Y., Bernoulli, 2016
- Optimal regression rates for SVMs using Gaussian kernels
Eberts, Mona and Steinwart, Ingo, Electronic Journal of Statistics, 2013 - An Interior Point Method for
L
1
/
2
-SVM and Application to Feature Selection in Classification
Yao, Lan, Zhang, Xiongji, Li, Dong-Hui, Zeng, Feng, and Chen, Haowen, Journal of Applied Mathematics, 2014 - Sharp oracle inequalities and slope heuristic for specification probabilities estimation in discrete random fields
Lerasle, Matthieu and Takahashi, Daniel Y., Bernoulli, 2016 - Data augmentation for support vector machines
Polson, Nicholas G. and Scott, Steven L., Bayesian Analysis, 2011 - High-dimensional data: p > > n in mathematical statistics and bio-medical applications
Van De Geer, Sara A. and Van Houwelingen, Hans C., Bernoulli, 2004 - Application of EMD-Based SVD and SVM to Coal-Gangue Interface Detection
Liu, Wei, He, Kai, Gao, Qun, and Liu, Cheng-yin, Journal of Applied Mathematics, 2013 - Adaptive robust variable selection
Fan, Jianqing, Fan, Yingying, and Barut, Emre, The Annals of Statistics, 2014 - Nonconcave penalized composite conditional likelihood estimation of sparse Ising models
Xue, Lingzhou, Zou, Hui, and Cai, Tianxi, The Annals of Statistics, 2012 - Support Vector Machines with Applications
Moguerza, Javier M. and Muñoz, Alberto, Statistical Science, 2006 - A Conjunction Method of Wavelet Transform-Particle Swarm Optimization-Support Vector Machine for Streamflow Forecasting
Zhang, Fanping, Dai, Huichao, and Tang, Deshan, Journal of Applied Mathematics, 2013
