Statistical Science

Support Vector Machines with Applications

Javier M. Moguerza and Alberto Muñoz

Full-text: Open access

Abstract

Support vector machines (SVMs) appeared in the early nineties as optimal margin classifiers in the context of Vapnik’s statistical learning theory. Since then SVMs have been successfully applied to real-world data analysis problems, often providing improved results compared with other techniques. The SVMs operate within the framework of regularization theory by minimizing an empirical risk in a well-posed and consistent way. A clear advantage of the support vector approach is that sparse solutions to classification and regression problems are usually obtained: only a few samples are involved in the determination of the classification or regression functions. This fact facilitates the application of SVMs to problems that involve a large amount of data, such as text processing and bioinformatics tasks. This paper is intended as an introduction to SVMs and their applications, emphasizing their key features. In addition, some algorithmic extensions and illustrative real-world applications of SVMs are shown.

Article information

Source
Statist. Sci. Volume 21, Number 3 (2006), 322-336.

Dates
First available in Project Euclid: 20 December 2006

Permanent link to this document
https://projecteuclid.org/euclid.ss/1166642435

Digital Object Identifier
doi:10.1214/088342306000000493

Mathematical Reviews number (MathSciNet)
MR2339130

Zentralblatt MATH identifier
1246.68185

Keywords
Support vector machines kernel methods regularization theory classification inverse problems

Citation

Moguerza, Javier M.; Muñoz, Alberto. Support Vector Machines with Applications. Statist. Sci. 21 (2006), no. 3, 322--336. doi:10.1214/088342306000000493. https://projecteuclid.org/euclid.ss/1166642435


Export citation

References

  • Aizerman, M. A., Braverman, E. M. and Rozonoer, L. I. (1964). Theoretical foundations of the potential function method in pattern recognition learning. Automat. Remote Control 25 821--837.
  • Amari, S.-I. (1985). Differential-Geometrical Methods in Statistics. Lecture Notes in Statist. 28. Springer, New York.
  • Aronszajn, N. (1950). Theory of reproducing kernels. Trans. Amer. Math. Soc. 68 337--404.
  • Aronszajn, N. (1951). Green's functions and reproducing kernels. In Proc. Symposium on Spectral Theory and Differential Problems 355--411.
  • Baeza-Yates, R. and Ribeiro-Neto, B. (1999). Modern Information Retrieval. Addison-Wesley, Harlow.
  • Baudat, G. and Anouar, F. (2000). Generalized discriminant analysis using a kernel approach. Neural Computation 12 2385--2404.
  • Bazaraa, M. S., Sherali, H. D. and Shetty, C. M. (1993). Nonlinear Programming: Theory and Algorithms, 2nd ed. Wiley, New York.
  • Ben-Hur, A., Horn, D., Siegelmann, H. and Vapnik, V. (2001). Support vector clustering. J. Mach. Learn. Res. 2 125--137.
  • Bennett, K. P. and Campbell, C. (2000). Support vector machines: Hype or hallelujah? SIGKDD Explorations 2 (2) 1--13.
  • Boser, B. E., Guyon, I. and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In Proc. Fifth ACM Workshop on Computational Learning Theory (COLT) 144--152. ACM Press, New York.
  • Bousquet, O. and Elisseeff, A. (2002). Stability and generalization. J. Mach. Learn. Res. 2 499--526.
  • Breiman, L. (2001). Statistical modeling: The two cultures (with discussion). Statist. Sci. 16 199--231.
  • Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
  • Chen, B.-J., Chang, M.-W. and Lin, C.-J. (2004). Load forecasting using support vector machines: A study on EUNITE competition 2001. IEEE Transactions on Power Systems 19 1821--1830.
  • Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning 20 273--297.
  • Cover, T. M. (1965). Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers 14 326--334.
  • Cover, T. M. and Hart, P. E. (1967). Nearest neighbour pattern classification. IEEE Trans. Inform. Theory 13 21--27.
  • Cox, D. and O'Sullivan, F. (1990). Asymptotic analysis of penalized likelihood and related estimators. Ann. Statist. 18 1676--1695.
  • Cressie, N. (1993). Statistics for Spatial Data. Wiley, New York.
  • Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines. Cambridge Univ. Press.
  • Cucker, F. and Smale, S. (2002). On the mathematical foundations of learning. Bull. Amer. Math. Soc. (N.S.) 39 1--49.
  • DeCoste, D. and Schölkopf, B. (2002). Training invariant support vector machines. Machine Learning 46 161--190.
  • Ding, C. and Dubchak, I. (2001). Multi-class protein fold recognition using support vector machines and neural networks. Bioinformatics 17 349--358.
  • Domingos, P. and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning 29 103--130.
  • Dumais, S., Platt, J., Heckerman, D. and Sahami, M. (1998). Inductive learning algorithms and representations for text categorization. In Proc. 7th International Conference on Information and Knowledge Management 148--155. ACM Press, New York.
  • Furey, T. S., Cristianini, N., Duffy, N., Bednarski, D., Schummer, M. and Haussler, D. (2000). Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16 906--914.
  • Green, P. J. (1999). Penalized likelihood. Encyclopedia of Statistical Sciences Update 3 578--586. Wiley, New York.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Springer, New York.
  • Heckerman, D., Geiger, D. and Chickering, D. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning 20 197--243.
  • Herbrich, R. (2002). Learning Kernel Classifiers: Theory and Algorithms. MIT Press, Cambridge, MA.
  • Hua, S. and Sun, Z. (2001). Support vector machine approach for protein subcellular localization prediction. Bioinformatics 17 721--728.
  • Hua, S. and Sun, Z. (2001). A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach. J. Molecular Biology 308 397--407.
  • Ivanov, V. V. (1976). The Theory of Approximate Methods and their Application to the Numerical Solution of Singular Integral Equations. Noordhoff International, Leyden.
  • Joachims, T. (2002). Learning to Classify Text Using Support Vector Machines. Kluwer, Boston.
  • Kanwal, R. P. (1983). Generalized Functions. Academic Press, Orlando, FL.
  • Kimeldorf, G. S. and Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist. 41 495--502.
  • Kressel, U. (1999). Pairwise classification and support vector machines. In Advances in Kernel Methods---Support Vector Learning (B. Schölkopf, C. J. C. Burges and A. J. Smola, eds.) 255--268. MIT Press, Cambridge, MA.
  • Lanckriet, G. R. G., Cristianini, N., Barlett, P., El Ghaoui, L. and Jordan, M. I. (2002). Learning the kernel matrix with semi-definite programming. In Proc. 19th International Conference on Machine Learning 323--330. Morgan Kaufmann, San Francisco.
  • LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W. and Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation 1 541--551.
  • Lin, Y. (2002). Support vector machines and the Bayes rule in classification. Data Min. Knowl. Discov. 6 259--275.
  • Lin, Y., Wahba, G., Zhang, H. and Lee, Y. (2002). Statistical properties and adaptive tuning of support vector machines. Machine Learning 48 115--136.
  • Martin, I., Moguerza, J. M. and Muñoz, A. (2004). Combining kernel information for support vector classification. Multiple Classifier Systems. Lecture Notes in Comput. Sci. 3077 102--111. Springer, Berlin.
  • Mercer, J. (1909). Functions of positive and negative type and their connection with the theory of integral equations. Philos. Trans. Roy. Soc. London A 209 415--446.
  • Mika, S., Rätsch, G., Weston, J., Schölkopf, B. and Müller, K.-R. (1999). Fisher discriminant analysis with kernels. In Neural Networks for Signal Processing (Y.-H. Hu, J. Larsen, E. Wilson and S. Douglas, eds.) 41--48. IEEE Press, Piscataway, NJ.
  • Moghaddam, B. and Yang, M.-H. (2002). Learning gender with support faces. IEEE Trans. Pattern Analysis and Machine Intelligence 24 707--711.
  • Moguerza, J. M., Muñoz, A. and Martin-Merino, M. (2002). Detecting the number of clusters using a support vector machine approach. Proc. International Conference on Artificial Neural Networks. Lecture Notes in Comput. Sci. 2415 763--768. Springer, Berlin.
  • Moguerza, J. M. and Prieto, F. J. (2003). An augmented Lagrangian interior-point method using directions of negative curvature. Math. Program. Ser. A 95 573--616.
  • Mukherjee, S., Rifkin, P. and Poggio, T. (2003). Regression and classification with regularization. Nonlinear Estimation and Classification. Lecture Notes in Statist. 171 111--128. Springer, New York.
  • Mukherjee, S. and Vapnik, V. (1999). Multivariate density estimation: A support vector machine approach. Technical Report, AI Memo 1653, MIT AI Lab.
  • Müller, K.-R., Smola, A. J., Rätsch, G., Schölkopf, B., Kohlmorgen, J. and Vapnik, V. (1999). Using support vector machines for time series prediction. In Advances in Kernel Methods---Support Vector Learning (B. Schölkopf, C. J. C. Burges and A. J. Smola, eds.) 243--253. MIT Press, Cambridge, MA.
  • Müller, P. and Rios Insua, D. (1998). Issues in Bayesian analysis of neural network models. Neural Computation 10 749--770.
  • Muñoz, A., Martin, I. and Moguerza, J. M. (2003). Support vector machine classifiers for asymmetric proximities. Artificial Neural Networks and Neural Information. Lecture Notes in Comput. Sci. 2714 217--224. Springer, Berlin.
  • Muñoz, A. and Moguerza, J. M. (2003). Combining support vector machines and ARTMAP architectures for natural classification. Knowledge-Based Intelligent Information and Engineering Systems. Lecture Notes in Artificial Intelligence 2774 16--21. Springer, Berlin.
  • Muñoz, A. and Moguerza, J. M. (2006). Estimation of high-density regions using one-class neighbor machines. IEEE Trans. Pattern Analysis and Machine Intelligence 28 476--480.
  • O'Sullivan, F. (1986). A statistical perspective on ill-posed inverse problems (with discussion). Statist. Sci. 1 502--527.
  • Osuna, E., Freund, R. and Girosi, F. (1997). Training support vector machines: An application to face detection. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 130--136. IEEE Press, New York.
  • Osuna, E., Freund, R. and Girosi, F. (1997). Support vector machines: Training and applications. CBCL Paper 144/AI Memo 1602, MIT AI Lab.
  • Osuna, E., Freund, R. and Girosi, F. (1997). An improved training algorithm for support vector machines. In Proc. IEEE Workshop on Neural Networks for Signal Processing 276--285. IEEE Press, New York.
  • Pavlidis, P., Weston, J., Cai, J. and Grundy, W. N. (2001). Gene functional classification from heterogeneous data. In Proc. Fifth Annual International Conference on Computational Biology 249--255. ACM Press, New York.
  • Phillips, D. L. (1962). A technique for the numerical solution of certain integral equations of the first kind. J. Assoc. Comput. Mach. 9 84--97.
  • Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods---Support Vector Learning (B. Schölkopf, C. J. C. Burges and A. J. Smola, eds.) 185--208. MIT Press, Cambridge, MA.
  • Platt, J. C. (2000). Probabilities for SV machines. In Advances in Large-Margin Classifiers (P. J. Bartlett, B. Schölkopf, D. Schuurmans and A. J. Smola, eds.) 61--74. MIT Press, Cambridge, MA.
  • Poggio, T. and Girosi, F. (1990). Networks for approximation and learning. Proc. IEEE 78 1481--1497.
  • Poggio, T., Mukherjee, S., Rifkin, R., Rakhlin, A. and Verri, A. (2001). b. CBCL Paper 198/AI Memo 2001-011, MIT AI Lab.
  • Ramsay, J. O. and Silverman, B. W. (1997). Functional Data Analysis. Springer, New York.
  • Rosipal, R. and Trejo, L. J. (2001). Kernel partial least squares regression in reproducing kernel Hilbert space. J. Mach. Learn. Res. 2 97--123.
  • Schölkopf, B., Herbrich, R., Smola, A. J. and Williamson, R. C. (2001). A generalized representer theorem. Lecture Notes in Artificial Intelligence 2111 416--426. Springer, Berlin.
  • Schölkopf, B., Platt, J. C., Shawe-Taylor, J., Smola, A. J. and Williamson, R. C. (2001). Estimating the support of a high-dimensional distribution. Neural Computation 13 1443--1471.
  • Schölkopf, B. and Smola, A. J. (2002). Learning with Kernels. MIT Press, Cambridge, MA.
  • Schölkopf, B., Smola, A. J. and Müller, K.-R. (1999). Kernel principal component analysis. In Advances in Kernel Methods---Support Vector Learning (B. Schölkopf, C. J. C. Burges and A. J. Smola, eds.) 327--352. MIT Press, Cambridge, MA.
  • Smola, A. J. and Schölkopf, B. (1998). A tutorial on support vector regression. NeuroColt2 Technical Report Series, NC2-TR-1998-030.
  • Sollich, P. (2002). Bayesian methods for support vector machines: Evidence and predictive class probabilities. Machine Learning 46 21--52.
  • Tikhonov, A. N. and Arsenin, V. Y. (1977). Solutions of Ill-Posed Problems. Wiley, New York.
  • Tipping, M. (2001). Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1 211--244.
  • van Kampen, N. G. (1981). Stochastic Processes in Physics and Chemistry. North-Holland, Amsterdam.
  • Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, New York.
  • Vapnik, V. (1998). Statistical Learning Theory. Wiley, New York.
  • Vapnik, V. and Chervonenkis, A. (1964). A note on a class of perceptrons. Automat. Remote Control 25 103--109.
  • Wahba, G. (1980). Spline bases, regularization, and generalized cross validation for solving approximation problems with large quantities of noisy data. In Approximation Theory III (W. Cheney, ed.) 905--912. Academic Press, New York.
  • Wahba, G. (1985). A comparison of GCV and GML for choosing the smoothing parameter in the generalized spline smoothing problem. Ann. Statist. 13 1378--1402.
  • Wahba, G. (1990). Spline Models for Observational Data. SIAM, Philadelphia.
  • Wahba, G. (1999). Support vector machines, reproducing kernel Hilbert spaces and the randomized GACV. In Advances in Kernel Methods---Support Vector Learning (B. Schölkopf, C. J. C. Burges and A. J. Smola, eds.) 69--88. MIT Press, Cambridge, MA.
  • Wu, S. and Amari, S.-I. (2002). Conformal transformation of kernel functions: A data-dependent way to improve support vector machine classifiers. Neural Processing Letters 15 59--67.