The Annals of Applied Statistics

New multicategory boosting algorithms based on multicategory Fisher-consistent losses

Hui Zou, Ji Zhu, and Trevor Hastie

Full-text: Open access


Fisher-consistent loss functions play a fundamental role in the construction of successful binary margin-based classifiers. In this paper we establish the Fisher-consistency condition for multicategory classification problems. Our approach uses the margin vector concept which can be regarded as a multicategory generalization of the binary margin. We characterize a wide class of smooth convex loss functions that are Fisher-consistent for multicategory classification. We then consider using the margin-vector-based loss functions to derive multicategory boosting algorithms. In particular, we derive two new multicategory boosting algorithms by using the exponential and logistic regression losses.

Article information

Ann. Appl. Stat. Volume 2, Number 4 (2008), 1290-1306.

First available in Project Euclid: 8 January 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Boosting Fisher-consistent losses multicategory classification


Zou, Hui; Zhu, Ji; Hastie, Trevor. New multicategory boosting algorithms based on multicategory Fisher-consistent losses. Ann. Appl. Stat. 2 (2008), no. 4, 1290--1306. doi:10.1214/08-AOAS198.

Export citation


  • Allwein, E., Schapire, R. and Singer, Y. (2000). Reducing multiclass to binary: A unifying approach for margin classifiers. J. Mach. Learn. Res. 1 113–141.
  • Blanchard, G., Lugosi, G. and Vayatis, N. (2004). On the rate of convergence of regularized boosting classifiers. J. Mach. Learn. Res. 4 861–894.
  • Bühlmann, P. and Hothorn, T. (2007). Boosting algorithms: Regularization, prediction and model fitting (with discussion). Statist. Sci. 22 477–505.
  • Bühlmann, P. and Yu, B. (2003). Boosting with the l2 loss: Regression and classification. J. Amer. Statist. Assoc. 98 324–339.
  • Buja, A., Stuetzle, W. and Shen, Y. (2005). Loss functions for binary class probability estimation and classification: Structure and applications. Technical report, Dept. of Statistics, Univ. Pennsylvania.
  • Newman, C. B. D. J., Hettich, S. and Merz, C. (1998). UCI repository of machine learning databases. Dept. of Information and Computer Sciences, Univ. California, Irvine. Available at.
  • Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of online learning and an application to boosting. J. Comput. System Sci. 55 119–139.
  • Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337–407.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Data Mining, Inference and Prediction. Springer, New York.
  • Koltchinskii, V. and Panchenko, D. (2002). Empirical margin distributions and bounding the generalization error of combined classifiers. Ann. Statist. 30 1–50.
  • Lee, Y., Lin, Y. and Wahba, G. (2004). Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. J. Amer. Statist. Assoc. 99 67–81.
  • Lin, Y. (2002). Support vector machines and the Bayes rule in classification. Data Mining and Knowledge Discovery 6 259–275.
  • Lin, Y. (2004). A note on margin-based loss functions in classification. Statist. Probab. Lett. 68 73–82.
  • Liu, Y. and Shen, X. (2006). Multicategory ψ learning. J. Amer. Statist. Assoc. 101 500–509.
  • Liu, Y., Shen, X. and Doss, H. (2005). Multicategory psi-learning and support vector machine: Computational tools. J. Comput. Graph. Statist. 14 219–236.
  • Lugosi, G. and Vayatis, N. (2004). On the Bayes-risk consistency of regularized boosting methods (with discussion). Ann. Statist. 32 30–55.
  • Rifkin, R. and Klautau, A. (2004). In defense of one-vs-all classification. J. Mach. Learn. Res. 5 101–141.
  • Schapire, R. and Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning 37 297–336.
  • Vapnik, V. (1996). The Nature of Statistical Learning. Springer, New York.
  • Zhang, T. (2004a). Statistical analysis of some multi-category large margin classification methods. J. Mach. Learn. Res. 5 1225–1251.
  • Zhang, T. (2004b). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 469–475.