Electronic Journal of Statistics

Calibrated asymmetric surrogate losses

Clayton Scott

Full-text: Open access

Abstract

Surrogate losses underlie numerous state-of-the-art binary classification algorithms, such as support vector machines and boosting. The impact of a surrogate loss on the statistical performance of an algorithm is well-understood in symmetric classification settings, where the misclassification costs are equal and the loss is a margin loss. In particular, classification-calibrated losses are known to imply desirable properties such as consistency. While numerous efforts have been made to extend surrogate loss-based algorithms to asymmetric settings, to deal with unequal misclassification costs or training data imbalance, considerably less attention has been paid to whether the modified loss is still calibrated in some sense. This article extends the theory of classification-calibrated losses to asymmetric problems. As in the symmetric case, it is shown that calibrated asymmetric surrogate losses give rise to excess risk bounds, which control the expected misclassification cost in terms of the excess surrogate risk. This theory is illustrated on the class of uneven margin losses, and the uneven hinge, squared error, exponential, and sigmoid losses are treated in detail.

Article information

Source
Electron. J. Statist. Volume 6 (2012), 958-992.

Dates
First available in Project Euclid: 25 May 2012

Permanent link to this document
http://projecteuclid.org/euclid.ejs/1337951630

Digital Object Identifier
doi:10.1214/12-EJS699

Mathematical Reviews number (MathSciNet)
MR2988435

Zentralblatt MATH identifier
1335.62108

Keywords
Surrogate loss classification calibrated cost-sensitive classification imbalanced data uneven margin excess risk bound

Citation

Scott, Clayton. Calibrated asymmetric surrogate losses. Electron. J. Statist. 6 (2012), 958--992. doi:10.1214/12-EJS699. http://projecteuclid.org/euclid.ejs/1337951630.


Export citation

References

  • [1] Bach, F. R., Heckerman, D. and Horvitz, E. (2006). Considering cost asymmetry in learning classifiers., J. Machine Learning Research 1713–1741.
  • [2] Bartlett, P., Jordan, M. and McAuliffe, J. (2006). Convexity, classification, and risk bounds., J. Amer. Statist. Assoc. 101 138–156.
  • [3] Blanchard, G., Lugosi, G. and Vayatis, N. (2003). On the rate of convergence of regularized boosting classifiers., J. Machine Learning Research 4 861–894.
  • [4] Chew, H. G., Bogner, R. E. and Lim, C. C. (2001). Dual, ν-support vector machine with error rate and training size biasing. In Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing 2 1269–1272.
  • [5] Fan, W., Stolfo, S., Zhang, J. and Chan, P. (1999). AdaCost: Misclassification cost-sensitive boosting. In, Proceedings of the Sixteenth Annual Conference on Machine Learning (ICML ’99) 97–105. Morgan Kaufmann, San Francisco, CA.
  • [6] Guo, H. and Viktor, H. (2004). Learning from imbalanced data sets with boosting and data generation: the DataBoost-IM approach., SIGKDD Explor. Newsl. 6 30–39.
  • [7] He, J. and Thiesson, B. (2007). Asymmetric Gradient Boosting with Application to Spam Filtering. In, Conference on Email and Anti-Spam.
  • [8] Imam, T., Ting, K. and Kamruzzaman, J. (2006). z-SVM: An SVM for Improved Classification of Imbalanced Data. In, AI 2006: Advances in Artificial Intelligence, (A. Sattar and B. Kang, eds.). Lecture Notes in Computer Science 264–273. Springer Berlin / Heidelberg.
  • [9] Karakoulas, G. and Shawe-Taylor, J. (1999). Optimizing classifiers for imbalanced training sets. In, Proceedings of the 1998 Conference on Advances in Neural Information Processing Systems 253–259. MIT Press, Cambridge, MA, USA.
  • [10] Li, Y. and Shawe-Taylor, J. (2003). The SVM With Uneven Margins And Chinese Document Categorisation. In, Proceedings of The 17th Pacific Asia Conference on Language, Information and Computation 216–227.
  • [11] Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J. and Kandola, J. (2002). The perceptron algorithm with uneven margins. In, Proceedings of the Nineteenth International Conference on Machine Learning 379–386.
  • [12] Lin, Y., Lee, Y. and Wahba, G. (2002). Support Vector Machines for Classification in Nonstandard Situations., Machine Learning 46 191–202.
  • [13] Lozano, A. C. and Abe, N. (2008). Multi-class cost-sensitive boosting with, p-norm loss functions. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 506–514. ACM.
  • [14] Lugosi, G. and Vayatis, N. (2004). On the Bayes risk consistency of regularized boosting methods., The Annals of Statistics 32 30–55.
  • [15] Mannor, S., Meir, R. and Zhang, T. (2003). Greedy Algorithms for Classification–Consistency, Convergence Rates, and Adaptivity., J. Machine Learning Research 4 713–742.
  • [16] Masnadi-Shirazi, H. and Vasconcelos, N. (2007). Asymmetric Boosting. In, Proceedings of International Conference on Machine Learning 609–619.
  • [17] Masnadi-Shirazi, H. and Vasconcelos, N. (2009). On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost. In, Advances in Neural Information Processing Systems 21 (D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, eds.) 1049–1056.
  • [18] Masnadi-Shirazi, H. and Vasconcelos, N. (2010). Risk minimization, probability elicitation, and cost-sensitive SVMs. In, Proceedings of the 27th International Conference on Machine Learning (J. Fürnkranz and T. Joachims, eds.) 759–766. Omnipress, Haifa, Israel.
  • [19] Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Boosting Algorithms as Gradient Descent. In, In Advances in Neural Information Processing Systems 12 512–518. MIT Press.
  • [20] Merler, S., Furlanello, C., Larcher, B. and Sboner, A. (2003). Automatic Model Selection in Cost-sensitive Boosting., Information Fusion 4 3–10.
  • [21] Nock, R. and Nielsen, F. (2009). On the efficient minimization of classification calibrated surrogates. In, Advances in Neural Information Processing Systems 21 (D. Koller, ed.) 1201–1208.
  • [22] Osuna, E. E., Freund, R. and Girosi, F. (1997). Support Vector Machines: Training and Applications Technical Report, A.I. Memo No. 1602, MIT Artificial Intelligence Laboratory.,
  • [23] Reid, M. D. and Williamson, R. C. (2009). Surrogate Regret Bounds for Proper Losses. In, Proceedings of the 26th Annual International Conference on Machine Learning 897–904.
  • [24] Reid, M. D. and Williamson, R. C. (2010). Composite Binary Losses., J. Machine Learning Research 11 2387–2422.
  • [25] Reid, M. and Williamson, R. (2011). Information, Divergence and Risk for Binary Experiments., J. Machine Learning Research 12 731–817.
  • [26] Rockafellar, R. T. (1970)., Convex Analysis. Princeton University Press, Princeton, NJ.
  • [27] Santos-Rodríguez, R., Guerrero-Curieses, A., Alaiz-Rodríguez, R. and Cid-Sueiro, J. (2009). Cost-sensitive learning based on Bregman divergences., Machine Learning Journal 76 271–285.
  • [28] Scott, C. (2011). Surrogate Losses and Regret Bounds for Cost-Sensitive Classification with Example-Dependent Costs. In, Proceedings of the 28th International Conference on Machine Learning (ICML) (L. Getoor and T. Scheffer, eds.) 697–704. Omnipress.
  • [29] Seiffert, C., Khoshgoftaar, T. M., Hulse, J. V. and Napolitano, A. (2010). RUSBoost: A Hybrid Approach to Alleviating Class Imbalance., Systems, Man and Cybernetics, Part A: Systems and Humans, IEEE Transactions on 40 185–197.
  • [30] Steinwart, I. (2005). Consistency of support vector machines and other regularized kernel classifiers., IEEE Trans. Inform. Theory 51 128–142.
  • [31] Steinwart, I. (2007). How to compare different loss functions and their risks., Constructive Approximation 26 225–287.
  • [32] Steinwart, I. and Christmann, A. (2008)., Support Vector Machines. Springer.
  • [33] Sun, Y. (2007). Cost-sensitive boosting for classification of imbalanced data., Pattern Recognition 40 3358–3378.
  • [34] Sun, Y., Wong, A. and Wang, Y. (2005). Parameter Inference of Cost-Sensitive Boosting Algorithms. In, Machine Learning and Data Mining in Pattern Recognition, (P. Perner and A. Imiya, eds.). Lecture Notes in Computer Science 642–642. Springer Berlin / Heidelberg.
  • [35] Tewari, A. and Bartlett, P. (2007). On the Consistency of Multiclass Classification Methods., J. Machine Learning Research 8 1007–1025.
  • [36] Ting, K. M. (2000). A Comparative Study of Cost-Sensitive Boosting Algorithms. In, Proceedings of the Seventeenth Annual Conference on Machine Learning (ICML ’00) 983–990. Morgan Kaufmann, San Francisco, CA.
  • [37] Veropoulos, K., Campbell, C. and Cristianini, N. (1999). Controlling the Sensitivity of Support Vector Machines. In, Proceedings of the International Joint Conference on Artificial Intelligence 55–60.
  • [38] Viola, P. and Jones, M. (2002). Fast and Robust Classification using Asymmetric AdaBoost and a Detector Cascade. In, Advances in Neural Information Processing Systems 14 (T. G. Dietterich, S. Becker and Z. Ghahramani, eds.). MIT Press, Cambridge, MA.
  • [39] Wang, B. X. and Japkowicz, N. (2008). Boosting support vector machines for imbalanced data sets. In, Proceedings of the 17th international conference on Foundations of intelligent systems. ISMIS’08 38–47. Springer-Verlag.
  • [40] Wang, P., Shen, C., Barnes, N., Zheng, H. and Ren, Z. (2011). Asymmetric Totally-Corrective Boosting for Real-Time Object Detection. In, Computer Vision ACCV 2010, (R. Kimmel, R. Klette and A. Sugimoto, eds.). Lecture Notes in Computer Science 176–188. Springer Berlin / Heidelberg.
  • [41] Wu, X. and Srihari, R. K. (2003). New, ν-Support Vector Machines and their Sequential Minimal Optimization. In Proceedings of the 20th International Conference on Machine Learning (T. Fawcett and N. Mishra, eds.) 824–831. AAAI Press.
  • [42] Yang, C., Yang, J. and Wang, J. (2009). Margin calibration in SVM class-imbalanced learning., Neurocomputing 73 397–411.
  • [43] Zhang, T. (2004a). Statistical Analysis of Some Multi-Category Large Margin Classification Methods., J. Machine Learning Research 5 1225–1251.
  • [44] Zhang, T. (2004b). Statistical behavior and consistency of classification methods based on convex risk minimization., The Annals of Statistics 32 56–85.