Bernoulli

  • Bernoulli
  • Volume 12, Number 6 (2006), 1045-1076.

Classifiers of support vector machine type with \ell1 complexity regularization

Bernadetta Tarigan and Sara A. Van De Geer

Full-text: Open access

Abstract

We study the binary classification problem with hinge loss. We consider classifiers that are linear combinations of base functions. Instead of an 2 penalty, which is used by the support vector machine, we put an 1 penalty on the coefficients. Under certain conditions on the base functions, hinge loss with this complexity penalty is shown to lead to an oracle inequality involving both model complexity and margin.

Article information

Source
Bernoulli, Volume 12, Number 6 (2006), 1045-1076.

Dates
First available in Project Euclid: 4 December 2006

Permanent link to this document
https://projecteuclid.org/euclid.bj/1165269150

Digital Object Identifier
doi:10.3150/bj/1165269150

Mathematical Reviews number (MathSciNet)
MR2274857

Zentralblatt MATH identifier
1118.62067

Keywords
binary classification hinge loss margin oracle inequality penalized classification rule sparsity

Citation

Tarigan, Bernadetta; Van De Geer, Sara A. Classifiers of support vector machine type with \ell1 complexity regularization. Bernoulli 12 (2006), no. 6, 1045--1076. doi:10.3150/bj/1165269150. https://projecteuclid.org/euclid.bj/1165269150


Export citation

References

  • [1] Audibert, J.-Y. (2004) Classification under polynomial entropy and margin assumptions and randomized estimators. Preprint PMA-908, Laboratoire de Probabilités et Modèles Aléatoires. http://www.proba.jussieu.fr/mathdoc/textes/PMA-908.pdf
  • [2] Bartlett, P.L., Jordan, M.I. and McAuliffe, J.D. (2006) Convexity, classification and risk bounds. J. Amer. Statist. Assoc., 101, 138-156.
  • [3] Blanchard, G., Lugosi, G. and Vayatis, N. (2003) On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res., 4, 861-894.
  • [4] Blanchard, G., Bousquet, O. and Massart, P. (2004) Statistical performance of support vector machines. Manuscript. http://ida.first.fraunhofer.de/~blanchard/publi/index.html
  • [5] Boser, B., Guyon, I. and Vapnik, V.N. (1992) A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, pp. 142-152. New York: Association for Computing Machine.
  • [6] Candès, E.J. and Donoho, D.L. (2004) New tight frames of curvelets and optimal representations of objects with piecewise C2 singularities. Comm. Pure Appl. Math., 57, 219-266.
  • [7] Donoho, D.L. (1995) Denoising via soft-thresholding. IEEE Trans. Inform. Theory, 41, 613-627.
  • [8] Donoho, D.L. (1999) Wedgelets: nearly minimax estimation of edges. Ann. Statist., 27, 859-897.
  • [9] Donoho, D.L. (2004a) For most large underdetermined systems of equations, the minimal '1-norm near-solution approximates the sparsest near-solution. Technical report, Stanford University. http://www-stat.stanford.edu/~donoho/Reports/2004/l1l0approx.pdf
  • [10] Donoho, D.L. (2004b) For most large underdetermined systems of linear equations, the minimal '1-norm solution is also the sparsest solution. Technical report, Stanford University. http://www-stat.stanford.edu/~donoho/Reports/2004/l1l0EquivCorrected.pdf
  • [11] Hardy, G.H., Littlewood, J.E. and Po´ lya, G. (1988) Inequalities, 2nd edn. Cambridge: Cambridge University Press.
  • [12] Hastie, T., Tibshirani, R. and Friedman, J. (2001) The Elements of Statistical Learning. Data Mining, Inference and Prediction. New York: Springer-Verlag.
  • [13] Koltchinskii, V. (2001) Rademacher penalties and structural risk minimization. IEEE Trans. Inform. Theory, 47, 1902-1914.
  • [14] Koltchinskii, V. (2006) Local Rademacher complexities and oracle inequalities in risk minimization. To appear in Ann. Statist., 34(6).
  • [15] Koltchinskii, V. and Panchenko, D. (2002) Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist., 30, 1-50.
  • [16] Ledoux, M. (1997) On Talagrand´s deviation inequalities for product measures. ESAIM Probab. Statist., 1, 63-87.
  • [17] Ledoux, M. and Talagrand, M. (1991) Probability in Banach Spaces: Isoperimetry and Processes. New York: Springer-Verlag.
  • [18] Lin, Y. (2002) Support vector machines and the Bayes rule in classification. Data Min. Knowledge Discovery, 6, 259-275.
  • [19] Loubes, J.-M. and van de Geer, S. (2002) Adaptive estimation in regression, using soft thresholding type penalties. Statist. Neerlandica, 56, 453-478.
  • [20] Lugosi, G. and Wegkamp, M. (2004) Complexity regularization via localized random penalties. Ann. Statist., 32, 1679-1697.
  • [21] Mammen, E. and Tsybakov, A.B. (1999) Smooth discrimination analysis. Ann. Statist., 27, 1808-1829.
  • [22] Massart, P. (2000) About the constants in Talagrand´s concentration inequalities for empirical processes. Ann. Probab., 28, 863-884.
  • [23] Schölkopf, B. and Smola, A. (2002) Learning with Kernels. Cambridge, MA: MIT Press.
  • [24] Scott, C. and Nowak, R. (2006) Minimax-optimal classification with dyadic decision trees. IEEE Trans. Inform. Theory, 52, 1335-1353.
  • [25] Shorack, G.R. andWellner, J.A. (1986) Empirical Processes with Applications to Statistics.NewYork:Wiley.
  • [26] Steinwart, I. and Scovel, S. (2005) Fast rates for support vector machines using Gaussian kernels. Technical report LA-UR 04-8796, Los Alamos National Laboratory. http://www.c3.lanl.gov/ml/pubs/2004_fastratesa/paper.pdf
  • [27] Tarigan, B. and van de Geer, S.A. (2004) Adaptivity of support vector machines with '1 penalty. Technical report MI 2004-14, University of Leiden. http://www.stat.math.ethz.ch/~geer/reports.html
  • [28] Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B, 58, 267-288.
  • [29] Tsybakov, A.B. (2004) Optimal aggregation of classifiers in statistical learning. Ann. Statist., 32, 135-166.
  • [30] Tsybakov, A.B. and van de Geer, S.A. (2005) Square root penalty: adaptation to the margin in classification and in edge estimation. Ann. Statist., 33, 1203-1224.
  • [31] van de Geer, S. (2000) Empirical Processes in M-Estimation. Cambridge: Cambridge University Press.
  • [32] van de Geer, S. (2003) Adaptive quantile regression. In M.G. Akritas and D.N. Politis (eds), Recent Advances and Trends in Nonparametric Statistics, pp. 235-250. Amsterdam: Elsevier.
  • [33] Vapnik, V.N. (1995) The Nature of Statistical Learning Theory. New York: Springer-Verlag.
  • [34] Vapnik, V.N. (1998) Statistical Learning Theory. New York: Wiley.
  • [35] Zhang, T. (2004) Statistical behaviour and consistency of classification methods based on convex risk minimization. Ann. Statist., 32, 56-84.
  • [36] Zhu, J., Rosset, S., Hastie, T. and Tibshirani, R. (2003) 1-norm support vector machines. Neural Inform. Process. Syst., 16.