The Annals of Statistics

Classification by pairwise coupling

Trevor Hastie and Robert Tibshirani

Full-text: Open access

Abstract

We discuss a strategy for polychotomous classification that involves coupling the estimating class probabilities for each pair of classes, and estimates together. The coupling model is similar to the Bradley-Terry method for paired comparisons. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in real and simulated data sets. Classifiers used include linear discriminants, nearest neighbors, adaptive nonlinear methods and the support vector machine.

Article information

Source
Ann. Statist. Volume 26, Number 2 (1998), 451-471.

Dates
First available: 31 July 2002

Permanent link to this document
http://projecteuclid.org/euclid.aos/1028144844

Mathematical Reviews number (MathSciNet)
MR1626055

Digital Object Identifier
doi:10.1214/aos/1028144844

Zentralblatt MATH identifier
0932.62071

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30}
Secondary: 62J15: Paired and multiple comparisons

Keywords
Pairwise Bradley-Terry model

Citation

Hastie, Trevor; Tibshirani, Robert. Classification by pairwise coupling. The Annals of Statistics 26 (1998), no. 2, 451--471. doi:10.1214/aos/1028144844. http://projecteuclid.org/euclid.aos/1028144844.


Export citation

References

  • Bishop, Y., Fienberg, S. and Holland, P. (1975). Discrete Multivariate Analy sis. MIT Press.
  • Boser, B., Guy on, I. and Vapnik, I. (1992). A training algorithm for optimal margin classifiers. In Proceedings of COLT II, Philadelphia, PA.
  • Bradley, R. and Terry, M. (1952). The rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324-345.
  • Deming, W. and Stephan, F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statist. 11 427-444.
  • Friedman, J. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141. Friedman, J. (1996a). Another approach to poly chotomous classification. Technical report, Stanford Univ. Friedman, J. (1996b). Bias, variance, 0-1 loss and the curse of dimensionality. Technical report, Stanford Univ.
  • Hastie, T. (1989). Discussion of "Flexible parsimonious smoothing and additive modelling" by Friedman and Silverman. Technometrics 31 3-39.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Hastie, T., Tibshirani, R. and Buja, A. (1994). Flexible discriminant analysis by optimal scoring. J. Amer. Statist. Assoc. 89 1255-1270.
  • Robinson, A. J. (1989). Dy namic error propagation networks. Ph.D. dissertation, Dept. Electrical Engineering, Cambridge Univ.
  • Rosen, D., Burke, H. and Goodman, O. (1995). Local learning methods in high dimensions: beating the bias-variance dilemma via recalibration. In NIPS Workshop: Machines that Learn-Neural Networks for Computing.
  • Vapnik, V. (1996). The Nature of Statistical Learning Theory. Springer, New York.