Journal of Applied Mathematics

Learning Rates for ${l}^{1}$-Regularized Kernel Classifiers

Abstract

We consider a family of classification algorithms generated from a regularization kernel scheme associated with ${l}^{1}$-regularizer and convex loss function. Our main purpose is to provide an explicit convergence rate for the excess misclassification error of the produced classifiers. The error decomposition includes approximation error, hypothesis error, and sample error. We apply some novel techniques to estimate the hypothesis error and sample error. Learning rates are eventually derived under some assumptions on the kernel, the input space, the marginal distribution, and the approximation error.

Article information

Source
J. Appl. Math., Volume 2013 (2013), Article ID 496282, 11 pages.

Dates
First available in Project Euclid: 14 March 2014

https://projecteuclid.org/euclid.jam/1394808214

Digital Object Identifier
doi:10.1155/2013/496282

Mathematical Reviews number (MathSciNet)
MR3130980

Zentralblatt MATH identifier
06950708

Citation

Tong, Hongzhi; Chen, Di-Rong; Yang, Fenghong. Learning Rates for ${l}^{1}$ -Regularized Kernel Classifiers. J. Appl. Math. 2013 (2013), Article ID 496282, 11 pages. doi:10.1155/2013/496282. https://projecteuclid.org/euclid.jam/1394808214

References

• P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe, “Convexity, classification, and risk bounds,” Journal of the American Statistical Association, vol. 101, no. 473, pp. 138–156, 2006.
• V. Vapnik, Statistical Learning Theory, John Wiley & Sons, New York, NY, USA, 1998.
• N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University Press, Cambridge, UK, 2000.
• Q. Wu and D. Zhou, “Analysis of support vector machine classification,” Journal of Computational Analysis and Applications, vol. 8, no. 2, pp. 99–119, 2006.
• I. Steinwart and C. Scovel, “Fast rates for support vector mach-ines using Gaussian kernels,” Annals of Statistics, vol. 35, no. 2, pp. 575–607, 2007.
• J. A. K. Suykens and J. Vandewalle, “Least squares support vec-tor machine classifiers,” Neural Processing Letters, vol. 9, no. 3, pp. 293–300, 1999.
• T. Zhang, “Statistical behavior and consistency of classification methods based on convex risk minimization,” Annals of Statistics, vol. 32, no. 1, pp. 56–134, 2004.
• D. R. Chen, Q. Wu, Y. M. Ying, and D. X. Zhou, “Support vectormachine soft margin classifiers: error analysis,” Journal of Machine Learning Research, vol. 5, pp. 1143–1175, 2004.
• Y. Lin, “Support vector machines and the Bayes rule in classification,” Data Mining and Knowledge Discovery, vol. 6, no. 3, pp. 259–275, 2002.
• T. Hofmann, B. Schölkopf, and A. J. Smola, “Kernel methods in machine learning,” Annals of Statistics, vol. 36, no. 3, pp. 1171–1220, 2008.
• N. Aronszajn, “Theory of reproducing kernels,” Transactions of the American Mathematical Society, vol. 68, pp. 337–404, 1950.
• Q. Wu, Y. Ying, and D. Zhou, “Multi-kernel regularized classifiers,” Journal of Complexity, vol. 23, no. 1, pp. 108–134, 2007.
• H. Z. Tong, D. R. Chen, and L. Z. Peng, “Learning rates for reg-ularized classifiers using multivariate polynomial kernels,” Journal of Complexity, vol. 24, no. 5-6, pp. 619–631, 2008.
• D. X. Zhou and K. Jetter, “Approximation with polynomial ker-nels and SVM classifiers,” Advances in Computational Mathematics, vol. 25, no. 1–3, pp. 323–344, 2006.
• D. H. Xiang and D. X. Zhou, “Classification with Gassian and convex loss,” Journal of Machine Learning Research, vol. 10, pp. 1447–1468, 2009.
• R. Tibshirani, “Regression shrinkage and selection via the lasso,” Journal of the Royal Statistical Society B, vol. 58, pp. 267–288, 1996.
• T. Zhang, “Some sharp performance bounds for least squares regression with ${L}_{1}$ regularization,” Annals of Statistics A, vol. 37, no. 5, pp. 2109–2144, 2009.
• P. Zhao and B. Yu, “On model selection consistency of Lasso,” Journal of Machine Learning Research, vol. 7, pp. 2541–2563, 2006.
• E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Transactions on Information Theory, vol. 51, no. 12, pp. 4203–4215, 2005.
• E. J. Candès and J. Romberg, “Sparsity and incoherence in com-pressive sampling,” Inverse Problems, vol. 23, no. 3, pp. 969–985, 2007.
• Q. W. Xiao and D. X. Zhou, “Learning by nonsymmetric kernels with data dependent spaces and ${l}^{1}$–-regularizer,” Taiwanese Journal of Mathematics, vol. 14, no. 5, pp. 1821–1836, 2010.
• H. Z. Tong, D. R. Chen, and F. H. Yang, “Least square regression with ${l}^{p}$–-coefficient regularization,” Neural Computation, vol. 22, no. 12, pp. 3221–3235, 2010.
• H. W. Sun and Q. Wu, “Least square regression with indefinite kernels and coefficient regularization,” Applied and Computational Harmonic Analysis, vol. 30, no. 1, pp. 96–109, 2011.
• H. Z. Tong, D. R. Chen, and F. H. Yang, “Support vector mach-ines regression with ${l}^{1}$–-regularizer,” Journal of Approximation Theory, vol. 164, no. 10, pp. 1331–1344, 2012.
• H. Y. Wang, Q. W. Xiao, and D. X. Zhou, “An approximation the-ory approach to learning with ${l}^{1}$ regularization,” Journal of Approximation Theory, vol. 167, pp. 240–258, 2013.
• B. Tarigan and S. A. Van De Geer, “Classifiers of support vector machine type with ${l}_{1}$ complexity regularization,” Bernoulli, vol. 12, no. 6, pp. 1045–1076, 2006.
• Q. Wu and D. Zhou, “SVM soft margin classifiers: linear pro-gramming versus quadratic programming,” Neural Computation, vol. 17, no. 5, pp. 1160–1187, 2005.
• Q. Wu and D. Zhou, “Learning with sample dependent hypothesis spaces,” Computers and Mathematics with Applications, vol. 56, no. 11, pp. 2896–2907, 2008.
• P. L. Bartlett, “The sample complexity of pattern classification with neural networks: the size of the weights is more important than the size of the network,” IEEE Transactions on Information Theory, vol. 44, no. 2, pp. 525–536, 1998.
• Q. Wu, Y. Ying, and D. Zhou, “Learning rates of least-square reg-ularized regression,” Foundations of Computational Mathematics, vol. 6, no. 2, pp. 171–192, 2006.
• H. Wendland, “Local polynomial reproduction and moving least squares approximation,” IMA Journal of Numerical Analysis, vol. 21, no. 1, pp. 285–300, 2001.
• L. Shi, Y. Feng, and D. Zhou, “Concentration estimates for learn-ing with ${l}^{1}$–-regularizer and data dependent hypothesis spaces,” Applied and Computational Harmonic Analysis, vol. 31, no. 2, pp. 286–302, 2011.
• F. Cucker and S. Smale, “On the mathematical foundations of learning,” Bulletin of the American Mathematical Society, vol. 39, no. 1, pp. 1–49, 2002.
• O. Bousquet, “A Bennett concentration inequality and its appli-cation to suprema of empirical processes,” Comptes Rendus Mathematique, vol. 334, no. 6, pp. 495–500, 2002.
• G. Blanchard, G. Lugosi, and N. Vayatis, “On the rate of con-vergence of regularized boosting classifiers,” Journal of Machine Learning Research, vol. 4, no. 5, pp. 861–894, 2004.
• A. W. van der Vaart and J. A. Wellner, Weak Convergence and Empirical Processes, Springer, New York, NY, USA, 1996.