## Electronic Journal of Statistics

### A Fisher consistent multiclass loss function with variable margin on positive examples

#### Abstract

The concept of pointwise Fisher consistency (or classification calibration) states necessary and sufficient conditions to have Bayes consistency when a classifier minimizes a surrogate loss function instead of the 0-1 loss. We present a family of multiclass hinge loss functions defined by a continuous control parameter $\lambda$ representing the margin of the positive points of a given class. The parameter $\lambda$ allows shifting from classification uncalibrated to classification calibrated loss functions. Though previous results suggest that increasing the margin of positive points has positive effects on the classification model, other approaches have failed to give increasing weight to the positive examples without losing the classification calibration property. Our $\lambda$-based loss function can give unlimited weight to the positive examples without breaking the classification calibration property. Moreover, when embedding these loss functions into the Support Vector Machine’s framework ($\lambda$-SVM), the parameter $\lambda$ defines different regions for the Karush—Kuhn—Tucker conditions. A large margin on positive points also facilitates faster convergence of the Sequential Minimal Optimization algorithm, leading to lower training times than other classification calibrated methods. $\lambda$-SVM allows easy implementation, and its practical use in different datasets not only supports our theoretical analysis, but also provides good classification performance and fast training times.

#### Article information

Source
Electron. J. Statist. Volume 9, Number 2 (2015), 2255-2292.

Dates
First available in Project Euclid: 8 October 2015

https://projecteuclid.org/euclid.ejs/1444316740

Digital Object Identifier
doi:10.1214/15-EJS1073

Mathematical Reviews number (MathSciNet)
MR3406278

Zentralblatt MATH identifier
1336.68220

#### Citation

Rodriguez-Lujan, Irene; Huerta, Ramon. A Fisher consistent multiclass loss function with variable margin on positive examples. Electron. J. Statist. 9 (2015), no. 2, 2255--2292. doi:10.1214/15-EJS1073. https://projecteuclid.org/euclid.ejs/1444316740.

#### References

• [1] Allwein, E. L., Schapire, R. E. and Singer, Y. (2001). Reducing multiclass to binary: A unifying approach for margin classifiers., The Journal of Machine Learning Research 1 113–141.
• [2] Bartlett, P. L., Jordan, M. I. and Mcauliffe, J. D. (2004). Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates. In, Advances in Neural Information Processing Systems 16 (S. Thrun, L. K. Saul and B. Schölkopf, eds.) 1173–1180. MIT Press.
• [3] Bartlett, P. L., Jordan, M. I. and McAuliffe, J. D. (2006). Convexity, classification, and risk bounds., Journal of the American Statistical Association 101 138–156.
• [4] Ben-David, S., Eiron, N. and Long, P. M. (2003). On the difficulty of approximately maximizing agreements., Journal of Computer and System Sciences 66 496–514.
• [5] Boyd, S. and Vandenberghe, L. (2004)., Convex Optimization. Cambridge University Press.
• [6] Chang, C.-C. and Lin, C.-J. (2011). LIBSVM: A library for support vector machines., ACM Transactions on Intelligent Systems and Technology (TIST) 2 27.
• [7] Collins, M., Schapire, R. E. and Singer, Y. (2002). Logistic regression, AdaBoost and Bregman distances., Machine Learning 48 253–285.
• [8] Cortes, C. and Vapnik, V. (1995). Support-vector networks., Machine learning 20 273–297.
• [9] Crammer, K. and Singer, Y. (2002). On the algorithmic implementation of multiclass kernel-based vector machines., The Journal of Machine Learning Research 2 265–292.
• [10] Cristianini, N. and Shawe-Taylor, J. (2000)., An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press.
• [11] Feldman, V., Guruswami, V., Raghavendra, P. and Wu, Y. (2012). Agnostic learning of monomials by halfspaces is hard., SIAM Journal on Computing 41 1558–1590.
• [12] Frank, M. and Wolfe, P. (1956). An algorithm for quadratic programming., Naval research logistics quarterly 3 95–110.
• [13] Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting., Journal of computer and system sciences 55 119–139.
• [14] Friedman, J. H., Hastie, T. J. and Tibshirani, R. J. (2000). Additive logistic regression: A statistical view of boosting., Annals of Statistics 337–374.
• [15] Genton, M. G. (2002). Classes of kernels for machine learning: A statistics perspective., The Journal of Machine Learning Research 2 299–312.
• [16] Gneiting, T. (2002). Compactly supported correlation functions., Journal of Multivariate Analysis 83 493–508.
• [17] Guermeur, Y. (2012). A generic model of multi-class support vector machine., International Journal of Intelligent Information and Database Systems 6 555–577.
• [18] Guermeur, Y. and Monfrini, E. (2011). A quadratic loss multi-class SVM for which a radius-margin bound applies., Informatica (ISSN 0868-4952) International Journal 22 73–96.
• [19] Huerta, R., Vembu, S., Amigó, J. M., Nowotny, T. and Elkan, C. (2012). Inhibition in multiclass classification., Neural Computation 24 2473–2507.
• [20] Keerthi, S. S., Shevade, S. K., Bhattacharyya, C. and Murthy, K. R. K. (2001). Improvements to Platt’s SMO algorithm for SVM classifier design., Neural Computation 13 637–649.
• [21] Lauer, F. and Guermeur, Y. (2011). MSVMpack: A multi-class support vector machine package., The Journal of Machine Learning Research 12 2293–2296.
• [22] Lebanon, G. and Lafferty, J. (2001). Boosting and maximum likelihood for exponential models., Advances in Neural Information Processing Systems 14 447.
• [23] Lee, Y., Lin, Y. and Wahba, G. (2004). Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data., Journal of the American Statistical Association 99 67–81.
• [24] Lichman, M. (2013). UCI Machine Learning, Repository.
• [25] Lin, Y. (2002). Support vector machines and the Bayes rule in classification., Data Mining and Knowledge Discovery 6 259–275.
• [26] Liu, Y. (2007). Fisher consistency of multicategory support vector machines., The Journal of Machine Learning Research–Proceedings Track 2 291–298.
• [27] Liu, Y. and Shen, X. (2006). Multicategory $\psi$-learning., Journal of the American Statistical Association 101 500–509.
• [28] Liu, Y. and Yuan, M. (2011). Reinforced multicategory support vector machines., Journal of Computational and Graphical Statistics 20 901–919.
• [29] Notebaert, M. B., Eikland, K. andP. (2009). An Open Source (Mixed-Integer) Linear Programming System. Software available at, http://lpsolve.sourceforge.net/.
• [30] Platt, J. C. (1999). Fast training of support vector machines using sequential minimal optimization. In, Advances in Kernel Methods 185–208. MIT press.
• [31] Reid, M. D. and Williamson, R. C. (2010). Composite binary losses., The Journal of Machine Learning Research 11 2387–2422.
• [32] Rodriguez-Lujan, I., Fonollosa, J., Vergara, A., Homer, M. and Huerta, R. (2014). On the calibration of sensor arrays for pattern recognition using the minimal number of experiments., Chemometrics and Intelligent Laboratory Systems 130 123–134.
• [33] Rodriguez-Lujan, I. and Huerta, R. (2015). Supplement to “A Fisher consistent multiclass loss function with variable margin on positive examples”. DOI:, 10.1214/15-EJS1073SUPP.
• [34] Scholkopf, B. and Smola, A. J. (2001)., Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press.
• [35] Shen, X., Tseng, G. C., Zhang, X. and Wong, W. H. (2003). On $\psi$-learning., Journal of the American Statistical Association 98 724–734.
• [36] Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers., IEEE Transactions on Information Theory 51 128–142.
• [37] Tewari, A. and Bartlett, P. L. (2007). On the consistency of multiclass classification methods., The Journal of Machine Learning Research 8 1007–1025.
• [38] Wang, L. and Shen, X. (2007). On L1-norm multiclass support vector machines., Journal of the American Statistical Association 102.
• [39] Weston, J. and Watkins, C. (1998). Multi-class support vector machines Technical Report, Department of Computer Science, Royal Holloway, University of, London.
• [40] Zhang, C. and Liu, Y. (2013). Multicategory large-margin unified machines., The Journal of Machine Learning Research 14 1349–1386.
• [41] Zhang, T. (2004). Statistical analysis of some multi-category large margin classification methods., The Journal of Machine Learning Research 5 1225–1251.
• [42] Zhang, Z., Jordan, M. I., Li, W.-J. and Yeung, D.-Y. (2009). Coherence Functions for Multicategory Margin-based Classification Methods. In, Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS) 647–654.
• [43] Zhang, Z., Liu, D., Dai, G. and Jordan, M. I. (2012). Coherence functions with applications in large-margin classification methods., The Journal of Machine Learning Research 13 2705–2734.
• [44] Zou, H., Zhu, J. and Hastie, T. (2008). New multicategory boosting algorithms based on multicategory fisher-consistent losses., The Annals of Applied Statistics 1290–1306.