Annals of Applied Statistics

A Bayesian Mallows approach to nontransitive pair comparison data: How human are sounds?

Marta Crispino, Elja Arjas, Valeria Vitelli, Natasha Barrett, and Arnoldo Frigessi

Full-text: Open access


We are interested in learning how listeners perceive sounds as having human origins. An experiment was performed with a series of electronically synthesized sounds, and listeners were asked to compare them in pairs. We propose a Bayesian probabilistic method to learn individual preferences from nontransitive pairwise comparison data, as happens when one (or more) individual preferences in the data contradicts what is implied by the others. We build a Bayesian Mallows model in order to handle nontransitive data, with a latent layer of uncertainty which captures the generation of preference misreporting. We then develop a mixture extension of the Mallows model, able to learn individual preferences in a heterogeneous population. The results of our analysis of the musicology experiment are of interest to electroacoustic composers and sound designers, and to the audio industry in general, whose aim is to understand how computer generated sounds can be produced in order to sound more human.

Article information

Ann. Appl. Stat., Volume 13, Number 1 (2019), 492-519.

Received: December 2017
Revised: June 2018
First available in Project Euclid: 10 April 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Nontransitive pairwise comparisons ranking Mallows model Bayesian preference learning recommender systems musicology acousmatic experiment


Crispino, Marta; Arjas, Elja; Vitelli, Valeria; Barrett, Natasha; Frigessi, Arnoldo. A Bayesian Mallows approach to nontransitive pair comparison data: How human are sounds?. Ann. Appl. Stat. 13 (2019), no. 1, 492--519. doi:10.1214/18-AOAS1203.

Export citation


  • Agresti, A. (1996). Categorical Data Analysis, Wiley, New York.
  • Barrett, N. and Crispino, M. (2018). The impact of 3-D sound spatialisation on listeners’ understanding of human agency in acousmatic music. J. New Music Res. 1–17.
  • Biernacki, C. and Jacques, J. (2013). A generative model for rank data based on insertion sort algorithm. Comput. Statist. Data Anal. 58 162–176.
  • Böckenholt, U. (1988). A logistic representation of multivariate paired-comparison models. J. Math. Psych. 32 44–63.
  • Böckenholt, U. (2001). Hierarchical modeling of paired comparison data. Psychol. Methods 6 49–66.
  • Böckenholt, U. (2006). Thurstonian-based analyses: Past, present, and future utilities. Psychometrika 71 615–629.
  • Böckenholt, U. and Tsai, R.-C. (2001). Individual differences in paired comparison data. Br. J. Math. Stat. Psychol. 54 265–277.
  • Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324–345.
  • Caron, F., Teh, Y. W. and Murphy, T. B. (2014). Bayesian nonparametric Plackett–Luce models for the analysis of preferences for college degree programmes. Ann. Appl. Stat. 8 1145–1181.
  • Cayley, A. (1849). LXXVII. Note on the theory of permutations. Philos. Mag. Ser. 3 34 527–529.
  • Crispino, M. and Frigessi, A. (2018). The hierarchical Bradley–Terry model. Draft.
  • Crispino, M., Arjas, E., Vitelli, V., Barrett, N. and Frigessi, A. (2019). Supplement to “A Bayesian Mallows approach to nontransitive pair comparison data: How human are sounds?” DOI:10.1214/18-AOAS1203SUPP.
  • Davidson, R. R. (1970). On extending the Bradley–Terry model to accommodate ties in paired comparison experiments. J. Amer. Statist. Assoc. 65 317–328.
  • de Borda, J. C. (1781). Mémoire sur les Élections Au Scrutin, Histoire de L’Académie Royale des Sciences. Paris, France.
  • Diaconis, P. (1988). Group Representations in Probability and Statistics. Institute of Mathematical Statistics Lecture Notes—Monograph Series 11. IMS, Hayward, CA.
  • Ding, W., Ishwar, P. and Saligrama, V. (2015). Learning mixed membership Mallows models from pairwise comparisons. Preprint. Available at ArXiv:1504.00757.
  • Dittrich, R., Hatzinger, R. and Katzenbeisser, W. (1998). Modelling the effect of subject-specific covariates in paired comparison studies with an application to university rankings. J. Roy. Statist. Soc. Ser. C 47 511–525.
  • Dittrich, R., Hatzinger, R. and Katzenbeisser, W. (2002). Modelling dependencies in paired comparison data: A log-linear approach. Comput. Statist. Data Anal. 40 39–57.
  • Dwork, C., Kumar, R., Naor, M. and Sivakumar, D. (2001). Rank aggregation methods for the Web. In Proceedings of the 10th International Conference on World Wide Web 613–622. ACM.
  • Fligner, M. A. and Verducci, J. S. (1986). Distance based ranking models. J. Roy. Statist. Soc. Ser. B 48 359–369.
  • Ford, L. R. Jr. (1957). Solution of a ranking problem from binary comparisons. Amer. Math. Monthly 64 28–33.
  • Francis, B., Dittrich, R. and Hatzinger, R. (2010). Modeling heterogeneity in ranked responses by nonparametric maximum likelihood: How do Europeans get their scientific knowledge? Ann. Appl. Stat. 4 2181–2202.
  • Gormley, I. C. and Murphy, T. B. (2006). Analysis of Irish third-level college applications data. J. Roy. Statist. Soc. Ser. A 169 361–379.
  • Grond, F. and Berger, J. (2011). Parameter Mapping Sonification. In The Sonification Handbook (T. Hermann, A. D. Hunt and J. Neuhoff, eds.) 363–398. Logos Publishing House, Berlin.
  • Irurozki, E., Calvo, B. and Lozano, A. (2014). Sampling and learning the Mallows and generalized Mallows models under the Hamming distance. Technical report, Univ. del Pais Vasco, San Sebastian, Spain.
  • Irurozki, E., Calvo, B. and Lozano, J. A. (2018). Sampling and learning Mallows and generalized Mallows models under the Cayley distance. Methodol. Comput. Appl. Probab. 20 1–35.
  • Jacques, J. and Biernacki, C. (2014). Model-based clustering for multivariate partial ranking data. J. Statist. Plann. Inference 149 201–217.
  • Kendall, M. G. (1938). A new measure of rank correlation. Biometrika 30 81–93.
  • Kenyon-Mathieu, C. and Schudy, W. (2007). How to rank with few errors. In STOC’07—Proceedings of the 39th Annual ACM Symposium on Theory of Computing 95–103. ACM, New York.
  • Liu, Q., Crispino, M., Scheel, I., Vitelli, V. and Frigessi, A. (2018). Model-based learning from preference data. Ann. Rev. Stat. Appl.. To appear.
  • Lu, T. and Boutilier, C. (2014). Effective sampling and learning for Mallows models with pairwise-preference data. J. Mach. Learn. Res. 15 3963-4009.
  • Luce, R. D. (1959). Individual Choice Behavior: A Theoretical Analysis. Wiley, New York.
  • Mallows, C. L. (1957). Non-null ranking models. I. Biometrika 44 114–130.
  • Marquis of Condorcet, M. J. A. N. d. C. (1785). Essai sur l’application de l’analyse à la probabilité des décisions rendues à la pluralité des voix. Paris: De L’imprimerie Royale.
  • Meilǎ, M. and Chen, H. (2010). Dirichlet process mixtures of generalized Mallows models. In Proceedings of the Twenty-Sixth Conference Annual Conference on Uncertainty in Artificial Intelligence (UAI-10) 358–367. AUAI Press, Corvallis, OR, USA.
  • Mukherjee, S. (2016). Estimation in exponential families on permutations. Ann. Statist. 44 853–875.
  • Murphy, T. B. and Martin, D. (2003). Mixtures of distance-based models for ranking data. Comput. Statist. Data Anal. 41 645–655.
  • Negahban, S., Oh, S. and Shah, D. (2012). Iterative ranking from pair-wise comparisons. In Advances in Neural Information Processing Systems 2474–2482.
  • Ollen, J. E. (2006). A criterion-related validity test of selected indicators of musical sophistication using expert ratings Ph.D. thesis, Ohio State Univ., Columbus, OH.
  • Plackett, R. L. (1975). The analysis of permutations. J. R. Stat. Soc. Ser. C. Appl. Stat. 24 193–202.
  • Rajkumar, A., Ghoshal, S., Lim, L.-H. and Agarwal, S. (2015). Ranking from stochastic pairwise preferences: Recovering Condorcet winners and tournament solution sets at the top. In ICML 665–673.
  • Rao, P. V. and Kupper, L. L. (1967). Ties in paired-comparison experiments: A generalization of the Bradley–Terry model. J. Amer. Statist. Assoc. 62 194–204.
  • Spearman, C. (1904). The proof and measurement of association between two things. Am. J. Psychol. 15 72–101.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 583–639.
  • Stephens, M. (2000). Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 62 795–809.
  • Thurstone, L. L. (1927). A law of comparative judgment. Psychol. Rev. 34 273.
  • Tversky, A. (1969). Intransitivity of preferences. Preference, Belief, and Similarity 433-461.
  • Vitelli, V., Sørensen, Ø., Crispino, M., Frigessi, A. and Arjas, E. (2018). Probabilistic preference learning with the Mallows rank model. J. Mach. Learn. Res. 18(158) 1–49.
  • Volkovs, M. N. and Zemel, R. S. (2014). New learning methods for supervised and unsupervised preference aggregation. J. Mach. Learn. Res. 15 1135–1176.
  • Yan, T. (2016). Ranking in the generalized Bradley–Terry models when the strong connection condition fails. Comm. Statist. Theory Methods 45 340–353.
  • Zermelo, E. (1929). Die Berechnung der Turnier–Ergebnisse als ein Maximumproblem der Wahrscheinlichkeitsrechnung. Math. Z. 29 436–460.

Supplemental materials

  • Supplement to “A Bayesian Mallows approach to nontransitive pair comparison data: How human are sounds?”. In supplement A the adaptations of the MCMC algorithm to the logistic and finite mixture model extensions (of Sections 3.2 and 3.3) are explained. Supplement B describes the procedure to randomly sample from the proposed model. The procedure was used to generate simulated and nested datasets for Section 6. Supplement C presents results obtained from experiments on simulated data generated from the logistic model for mistakes. Finally, in supplement D, we report diagnostic plots to study convergence and mixing of the MCMC procedure proposed in the paper.