The Annals of Statistics

Classification by pairwise coupling

Trevor Hastie and Robert Tibshirani

Full-text: Open access


We discuss a strategy for polychotomous classification that involves coupling the estimating class probabilities for each pair of classes, and estimates together. The coupling model is similar to the Bradley-Terry method for paired comparisons. We study the nature of the class probability estimates that arise, and examine the performance of the procedure in real and simulated data sets. Classifiers used include linear discriminants, nearest neighbors, adaptive nonlinear methods and the support vector machine.

Article information

Ann. Statist. Volume 26, Number 2 (1998), 451-471.

First available in Project Euclid: 31 July 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30}
Secondary: 62J15: Paired and multiple comparisons

Pairwise Bradley-Terry model


Hastie, Trevor; Tibshirani, Robert. Classification by pairwise coupling. Ann. Statist. 26 (1998), no. 2, 451--471. doi:10.1214/aos/1028144844.

Export citation


  • Bishop, Y., Fienberg, S. and Holland, P. (1975). Discrete Multivariate Analy sis. MIT Press.
  • Boser, B., Guy on, I. and Vapnik, I. (1992). A training algorithm for optimal margin classifiers. In Proceedings of COLT II, Philadelphia, PA.
  • Bradley, R. and Terry, M. (1952). The rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324-345.
  • Deming, W. and Stephan, F. (1940). On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Statist. 11 427-444.
  • Friedman, J. (1991). Multivariate adaptive regression splines (with discussion). Ann. Statist. 19 1-141. Friedman, J. (1996a). Another approach to poly chotomous classification. Technical report, Stanford Univ. Friedman, J. (1996b). Bias, variance, 0-1 loss and the curse of dimensionality. Technical report, Stanford Univ.
  • Hastie, T. (1989). Discussion of "Flexible parsimonious smoothing and additive modelling" by Friedman and Silverman. Technometrics 31 3-39.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Hastie, T., Tibshirani, R. and Buja, A. (1994). Flexible discriminant analysis by optimal scoring. J. Amer. Statist. Assoc. 89 1255-1270.
  • Robinson, A. J. (1989). Dy namic error propagation networks. Ph.D. dissertation, Dept. Electrical Engineering, Cambridge Univ.
  • Rosen, D., Burke, H. and Goodman, O. (1995). Local learning methods in high dimensions: beating the bias-variance dilemma via recalibration. In NIPS Workshop: Machines that Learn-Neural Networks for Computing.
  • Vapnik, V. (1996). The Nature of Statistical Learning Theory. Springer, New York.