Statistical Science

Models for Paired Comparison Data: A Review with Emphasis on Dependent Data

Manuela Cattelan

Full-text: Open access


Thurstonian and Bradley–Terry models are the most commonly applied models in the analysis of paired comparison data. Since their introduction, numerous developments have been proposed in different areas. This paper provides an updated overview of these extensions, including how to account for object- and subject-specific covariates and how to deal with ordinal paired comparison data. Special emphasis is given to models for dependent comparisons. Although these models are more realistic, their use is complicated by numerical difficulties. We therefore concentrate on implementation issues. In particular, a pairwise likelihood approach is explored for models for dependent paired comparison data, and a simulation study is carried out to compare the performance of maximum pairwise likelihood with other limited information estimation methods. The methodology is illustrated throughout using a real data set about university paired comparisons performed by students.

Article information

Statist. Sci. Volume 27, Number 3 (2012), 412-433.

First available in Project Euclid: 5 September 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bradley–Terry model limited information estimation paired comparisons pairwise likelihood Thurstonian models


Cattelan, Manuela. Models for Paired Comparison Data: A Review with Emphasis on Dependent Data. Statist. Sci. 27 (2012), no. 3, 412--433. doi:10.1214/12-STS396.

Export citation


  • Agresti, A. (1992). Analysis of ordinal paired comparison data. J. R. Stat. Soc. Ser. C Appl. Stat. 41 287–297.
  • Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
  • Barry, D. and Hartigan, J. A. (1993). Choice models for predicting divisional winners in major league baseball. J. Amer. Statist. Assoc. 88 766–774.
  • Bäuml, K. H. (1994). Upright versus upside-down faces: How interface attractiveness varies with orientation. Percept. Psychophys. 56 163–172.
  • Böckenholt, U. (1988). A logistic representation of multivariate paired-comparison models. J. Math. Psych. 32 44–63.
  • Böckenholt, U. (2001a). Hierarchical modeling of paired comparison data. Psychol. Methods 6 49–66.
  • Böckenholt, U. (2001b). Thresholds and intransitivities in pairwise judgments: A multilevel analysis. Journal of Educational and Behavioral Statistics 26 269–282.
  • Böckenholt, U. (2002). A Thurstonian analysis of preference change. J. Math. Psych. 46 300–314.
  • Böckenholt, U. (2004). Comparative judgments as an alternative to ratings: Identifying the scale origin. Psychol. Methods 9 453–465.
  • Böckenholt, U. (2006). Thurstonian-based analyses: Past, present, and future utilities. Psychometrika 71 615–629.
  • Böckenholt, U. and Dillon, W. R. (1997a). Modeling within-subject dependencies in ordinal paired comparison data. Psychometrika 62 411–434.
  • Böckenholt, U. and Dillon, W. R. (1997b). Some new methods for an old problem: Modeling preference changes and competitive market structures in pretest market data. Journal of Marketing Research 34 130–142.
  • Böckenholt, U. and Tsai, R. C. (2001). Individual differences in paired comparison data. Br. J. Math. Stat. Psychol. 54 265–277.
  • Böckenholt, U. and Tsai, R. C. (2007). Random-effects models for preference data. In Handbook of Statistics (C. R. Rao and S. Sinharay, eds.) 26 447–468. Elsevier, Amsterdam.
  • Bradley, R. A. (1976). Science, statistics, and paired comparisons. Biometrics 32 213–232.
  • Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324–345.
  • Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9–25.
  • Carroll, J. D. and De Soete, G. (1991). Toward a new paradigm for the study of multiattribute choice behavior. Spatial and discrete modeling of pairwise preferences. American Psychologist 46 342–351.
  • Cattelan, M. (2009). Correlation models for paired comparison data. Ph.D. thesis, Dept. Statistical Sciences, Univ. Padua.
  • Cattelan, M., Varin, C. and Firth, D. (2012). Dynamic Bradley–Terry modelling of sports tournaments. J. R. Stat. Soc. Ser. C Appl. Stat. To appear.
  • Causeur, D. and Husson, F. (2005). A 2-dimensional extension of the Bradley–Terry model for paired comparisons. J. Statist. Plann. Inference 135 245–259.
  • Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika 85 347–361.
  • Choisel, S. and Wickelmaier, F. (2007). Evaluation of multichannel reproduced sound: Scaling auditory attributes underlying listener preference. J. Acoust. Soc. Am. 121 388–400.
  • Cox, D. R. and Reid, N. (2004). A note on pseudolikelihood constructed from marginal densities. Biometrika 91 729–737.
  • Craig, P. (2008). A new reconstruction of multivariate normal orthant probabilities. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 227–243.
  • David, H. A. (1988). The Method of Paired Comparisons, 2nd ed. Griffin’s Statistical Monographs & Courses 41. Griffin, London.
  • Davidson, R. R. (1970). On extending the Bradley–Terry model to accommodate ties in paired comparison experiments. J. Amer. Statist. Assoc. 65 317–328.
  • Davidson, R. R. and Farquhar, P. H. (1976). A bibliography on the method of paired comparisons. Biometrics 32 241–252.
  • De Soete, G. and Winsberg, S. (1993). A Thurstonian pairwise choice model with univariate and multivariate spline transformations. Psychometrika 58 233–256.
  • Dillon, W. R., Kumar, A. and De Borrero, M. S. (1993). Capturing individual differences in paired comparisons: An extended BTL model incorporating descriptor variables. Journal of Marketing Research 30 42–51.
  • Dittrich, R., Francis, B. and Katzenbeisser, W. (2008). Temporal dependence in longitudinal paired comparisons. Research report, Dept. Statistics and Mathematics, WU Vienna Univ. Economics and Business.
  • Dittrich, R., Hatzinger, R. and Katzenbeisser, W. (1998). Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. J. R. Stat. Soc. Ser. C Appl. Stat. 47 511–525.
  • Dittrich, R., Hatzinger, R. and Katzenbeisser, W. (2001). Corrigendum: “Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings.” J. R. Stat. Soc. Ser. C Appl. Stat. 50 247–249.
  • Dittrich, R., Hatzinger, R. and Katzenbeisser, W. (2002). Modelling dependencies in paired comparison data: A log-linear approach. Comput. Statist. Data Anal. 40 39–57.
  • Dittrich, R., Hatzinger, R. and Katzenbeisser, W. (2004). A log-linear approach for modelling ordinal paired comparison data on motives to start a PhD program. Stat. Model. 4 1–13.
  • Dittrich, R., Francis, B., Hatzinger, R. and Katzenbeisser, W. (2006). Modelling dependency in multivariate paired comparisons: A log-linear approach. Math. Social Sci. 52 197–209.
  • Dittrich, R., Francis, B., Hatzinger, R. and Katzenbeisser, W. (2007). A paired comparison approach for the analysis of sets of Likert-scale responses. Stat. Model. 7 3–28.
  • Dittrich, R., Francis, B., Hatzinger, R. and Katzenbeisser, W. (2012). Missing observations in paired comparison data. Stat. Model. 12 117–143.
  • Duineveld, C. A. A., Arents, P. and King, B. M. (2000). Log-linear modelling of paired comparison data from consumer tests. Food Quality and Preference 11 63–70.
  • Ellermeier, W., Mader, M. and Daniel, P. (2004). Scaling the unpleasantness of sounds according to the BTL model: Ratio-scale representation and psychoacoustical analysis. Acta Acustica United with Acustica 90 101–107.
  • Fahrmeir, L. and Tutz, G. (1994). Dynamic stochastic models for time-dependent ordered paired comparison systems. J. Amer. Statist. Assoc. 89 1438–1449.
  • Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika 80 27–38.
  • Firth, D. (2005). Bradley–Terry models in R. Journal of Statistical Software 12 1–12.
  • Firth, D. (2008). BradleyTerry: Bradley–Terry models. Available at
  • Firth, D. and de Menezes, R. X. (2004). Quasi-variances. Biometrika 91 65–80.
  • Ford, L. R. Jr. (1957). Solution of a ranking problem from binary comparisons. Amer. Math. Monthly 64 28–33.
  • Francis, B., Dittrich, R. and Hatzinger, R. (2010). Modeling heterogeneity in ranked responses by nonparametric maximum likelihood: How do Europeans get their scientific knowledge? Ann. Appl. Stat. 4 2181–2202.
  • Francis, B., Dittrich, R., Hatzinger, R. and Penn, R. (2002). Analysing partial ranks by using smoothed paired comparison methods: An investigation of value orientation in Europe. J. R. Stat. Soc. Ser. C Appl. Stat. 51 319–336.
  • Genz, A. and Bretz, F. (2002). Comparison of methods for the computation of multivariate $t$ probabilities. J. Comput. Graph. Statist. 11 950–971.
  • Glenn, W. A. and David, H. A. (1960). Ties in paired-comparison experiments using a modified Thurstone–Mosteller model. Biometrics 16 86–109.
  • Glickman, M. E. (2001). Dynamic paired comparison models with stochastic variances. J. Appl. Stat. 28 673–689.
  • Goos, P. and Grossmann, H. (2011). Optimal design of factorial paired comparison experiments in the presence of within-pair order effects. Food Quality and Preference 22 198–204.
  • Graßhoff, U. and Schwabe, R. (2008). Optimal design for the Bradley-Terry paired comparison model. Stat. Methods Appl. 17 275–289.
  • Graßhoff, U., Großmann, H., Holling, H. and Schwabe, R. (2004). Optimal designs for main effects in linear paired comparison models. J. Statist. Plann. Inference 126 361–376.
  • Hatzinger, R. (2010). prefmod: Utilities to fit paired comparison models for preferences. Available at
  • Hatzinger, R. and Francis, B. J. (2004). Fitting paired comparison models in R. Research report, Univ. Wien. Available at
  • Head, M. L., Doughty, P., Blomberg, S. P. and Keogh, S. (2008). Chemical mediation of reciprocal mother–offspring recognition in the Southern Water Skink (Eulamprus heatwolei). Australian Ecology 33 20–28.
  • Henery, R. J. (1992). An extension to the Thurstone–Mosteller model for chess. The Statistician 41 559–567.
  • Huang, T.-K., Weng, R. C. and Lin, C.-J. (2006). Generalized Bradley-Terry models and multi-class probability estimates. J. Mach. Learn. Res. 7 85–115.
  • Joe, H. and Maydeu-Olivares, A. (2010). A general family of limited information goodness-of-fit statistics for multinomial data. Psychometrika 75 393–419.
  • Kent, J. T. (1982). Robust properties of likelihood ratio tests. Biometrika 69 19–27.
  • Kissler, J. and Bäuml, K. H. (2000). Effects of the beholder’s age on the perception of facial attractiveness. Acta Psychol. (Amst) 104 145–166.
  • Knorr-Held, L. (2000). Dynamic rating of sports teams. The Statistician 49 261–276.
  • Lancaster, J. F. and Quade, D. (1983). Random effects in paired-comparison experiments using the Bradley–Terry model. Biometrics 39 245–249.
  • Le Cessie, S. and Van Houwelingen, J. C. (1994). Logistic regression for correlated binary data. J. R. Stat. Soc. Ser. C Appl. Stat. 43 95–108.
  • Lele, S. R., Nadeem, K. and Schmuland, B. (2010). Estimability and likelihood inference for generalized linear mixed models using data cloning. J. Amer. Statist. Assoc. 105 1617–1625.
  • Lindsay, B. G. (1988). Composite likelihood methods. In Statistical Inference from Stochastic Processes (Ithaca, NY, 1987). Contemp. Math. 80 221–239. Amer. Math. Soc., Providence, RI.
  • Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley, New York.
  • Marschak, J. (1960). Binary-choice constraints and random utility indicators. In Mathematical Methods in the Social Sciences, 1959 (Arrow, K. J., Karlin, S. and Suppes, S., eds.) 312–329. Stanford Univ. Press, Stanford, CA.
  • Matthews, J. N. S. and Morris, K. P. (1995). An application of Bradley–Terry-type models to the measurement of pain. J. R. Stat. Soc. Ser. C Appl. Stat. 44 243–255.
  • Maydeu-Olivares, A. (2001). Limited information estimation and testing of Thurstonian models for paired comparison data under multiple judgment sampling. Psychometrika 66 209–227.
  • Maydeu-Olivares, A. (2002). Limited information estimation and testing of Thurstonian models for preference data. Math. Social Sci. 43 467–483.
  • Maydeu-Olivares, A. (2003). Thurstonian covariance and correlation structures for multiple judgment paired comparison data. Working Papers Economia, Instituto de Empresa, Area of Economic Environment. Available at
  • Maydeu-Olivares, A. (2006). Limited information estimation and testing of discretized multivariate normal structural models. Psychometrika 71 57–77.
  • Maydeu-Olivares, A. and Böckenholt, U. (2005). Structural equation modeling of paired-comparison and ranking data. Psychometrika 10 285–304.
  • Maydeu-Olivares, A. and Böckenholt, U. (2008). Modeling subject health outcomes. Top 10 reasons to use Thurstone’s method. Medical Care 46 346–348.
  • Maydeu-Olivares, A. and Hernández, A. (2007). Identification and small sample estimation of Thurstone’s unrestricted model for paired comparisons data. Multivariate Behavioral Research 42 323–347.
  • Maydeu-Olivares, A. and Joe, H. (2005). Limited- and full-information estimation and goodness-of-fit testing in $2^{n}$ contingency tables: A unified framework. J. Amer. Statist. Assoc. 100 1009–1020.
  • Maydeu-Olivares, A. and Joe, H. (2006). Limited information goodness-of-fit testing in multidimensional contingency tables. Psychometrika 71 713–732.
  • Mazzucchi, T. A., Linzey, W. G. and Bruning, A. (2008). A paired comparison experiment for gathering expert judgment for an aircraft wiring risk assessment. Reliability Engineering and System Safety 93 722–731.
  • McFadden, D. (2001). Economic choices. American Economic Review 91 351–378.
  • McHale, I. and Morton, A. (2011). A Bradley–Terry type model for forecasting tennis match results. International Journal of Forecasting 27 619–630.
  • Mease, D. (2003). A penalized maximum likelihood approach for the ranking of college football teams independent of victory margins. Amer. Statist. 57 241–248.
  • Menke, J. E. and Martinez, T. R. (2008). A Bradley–Terry artificial neural network model for individual ratings in group competitions. Neural Computing & Applications 17 175–186.
  • Miwa, T., Hayter, A. J. and Kuriki, S. (2003). The evaluation of general non-centred orthant probabilities. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 223–234.
  • Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data. Springer, New York.
  • Mosteller, F. (1951). Remarks on the method of paired comparisons. I. The least squares solution assuming equal standard deviations and equal correlations. II. The effect of an aberrant standard deviation when equal standard deviations and equal correlations are assumed. III. A test of significance for paired comparisons when equal standard deviations and equal correlations are assumed. Psychometrika 16 3–9, 203–218.
  • Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika 43 551–560.
  • Muthén, B. (1993). Goodness of fit with categorical and other non normal variables. In Structural Equation Models (K. A. Bollen, J. S. Long, eds.). 205–234. Sage, Newbury Park, CA.
  • Muthén, B., Du Toit, S. H. C. and Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Technical report.
  • R Development Core Team (2011). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available at
  • Rao, P. V. and Kupper, L. L. (1967). Ties in paired-comparison experiments: A generalization of the Bradley–Terry model. J. Amer. Statist. Assoc. 62 194–204.
  • Reiser, M. (2008). Goodness-of-fit testing using components based on marginal frequencies of multinomial data. British J. Math. Statist. Psych. 61 331–360.
  • Sham, P. C. and Curtis, D. (1995). An extended transmission/disequilibrium test (TDT) for multi-allele marker loci. Ann. Hum. Genet. 59 323–336.
  • Simons, G. and Yao, Y.-C. (1999). Asymptotics when the number of parameters tends to infinity in the Bradley–Terry model for paired comparisons. Ann. Statist. 27 1041–1060.
  • Springall, A. (1973). Response surface fitting using a generalization of the Bradley–Terry paired comparison model. J. R. Stat. Soc. Ser. C Appl. Stat. 22 59–68.
  • Stern, H. (1990). A continuum of paired comparisons models. Biometrika 77 265–273.
  • Stern, S. E. (2011). Moderated paired comparisons: A generalized Bradley–Terry model for continuous data using a discontinuous penalized likelihood function. J. R. Stat. Soc. Ser. C Appl. Stat. 60 397–415.
  • Stigler, S. M. (1994). Citation patterns in the journals of statistics and probability. Statist. Sci. 9 94–108.
  • Strobl, C., Wickelmaier, F. and Zeileis, A. (2011). Accounting for individual differences in Bradley–Terry models by means of recursive partitioning. Journal of Educational and Behavioral Statistics 36 135–153.
  • Stuart-Fox, D. M., Firth, D., Moussalli, A. and Whiting, M. J. (2006). Multiple signals in chameleon contests: Designing and analysing animal contests as a tournament. Animal Behavior 71 1263–1271.
  • Takane, Y. (1989). Analysis of covariance structures and probabilistic binary choice data. In New Developments in Psychological Choice Modeling (G. De Soete, H. Feger and K. C. Klauser, eds.). North-Holland, Amsterdam.
  • Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review 34 368–389.
  • Thurstone, L. L. and Jones, L. V. (1957). The rational origin for measuring subjective values. J. Amer. Statist. Assoc. 52 458–471.
  • Train, K. E. (2009). Discrete Choice Methods with Simulation, 2nd ed. Cambridge Univ. Press, Cambridge.
  • Tsai, R.-C. (2000). Remarks on the identifiability of Thurstonian ranking models: Case V, Case III, or neither? Psychometrika 65 233–240.
  • Tsai, R.-C. (2003). Remarks on the identifiability of Thurstonian paired comparison models under multiple judgment. Psychometrika 68 361–372.
  • Tsai, R.-C. and Böckenholt, U. (2002). Two-level linear paired comparison models: Estimation and identifiability issues. Math. Social Sci. 43 429–449.
  • Tsai, R.-C. and Böckenholt, U. (2006). Modelling intransitive preferences: A random-effects approach. J. Math. Psych. 50 1–14.
  • Tsai, R.-C. and Böckenholt, U. (2008). On the importance of distinguishing between within- and between-subject effects in intransitive intertemporal choice. J. Math. Psych. 52 10–20.
  • Turner, H. and Firth, D. (2010a). Bradley–Terry models in R: The BradleyTerry2 package. Available at
  • Turner, H. and Firth, D. (2010b). Generalized nonlinear models in R: An overview of the gnm package. Available at
  • Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review 79 281–299.
  • Usami, S. (2010). Individual differences multidimensional Bradley–Terry model using reversible jump Markov chain Monte Carlo algorithm. Behaviormetrika 37 135–155.
  • Varin, C., Reid, N. and Firth, D. (2011). An overview of composite likelihood methods. Statist. Sinica 21 5–42.
  • Walker, J. and Ben-Akiva, M. (2002). Generalized random utility model. Math. Social Sci. 43 303–343.
  • Whiting, M. J., Stuart-Fox, D. M., O’Connor, D., Firth, D., Bennett, N. C. and Blomberg, S. P. (2006). Ultraviolet signals ultra-aggression in a lizard. Animal Behavior 72 353–363.
  • Wickelmaier, F. and Schmid, C. (2004). A Matlab function to estimate choice model parameters from paired-comparison data. Behavior Research Methods, Instruments, and Computers 36 29–40.
  • Yan, T., Yang, Y. and Xu, J. (2012). Sparse paired comparisons in the Bradley–Terry model. Statist. Sinica 22 1305–1318.
  • Zermelo, E. (1929). Die Berechnung der Turnier-Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung. Math. Z. 29 436–460.
  • Zhao, Y. and Joe, H. (2005). Composite likelihood estimation in multivariate data analysis. Canad. J. Statist. 33 335–356.