The Annals of Applied Statistics

How often does the best team win? A unified approach to understanding randomness in North American sport

Michael J. Lopez, Gregory J. Matthews, and Benjamin S. Baumer

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Statistical applications in sports have long centered on how to best separate signal (e.g., team talent) from random noise. However, most of this work has concentrated on a single sport, and the development of meaningful cross-sport comparisons has been impeded by the difficulty of translating luck from one sport to another. In this manuscript we develop Bayesian state-space models using betting market data that can be uniformly applied across sporting organizations to better understand the role of randomness in game outcomes. These models can be used to extract estimates of team strength, the between-season, within-season and game-to-game variability of team strengths, as well each team’s home advantage. We implement our approach across a decade of play in each of the National Football League (NFL), National Hockey League (NHL), National Basketball Association (NBA) and Major League Baseball (MLB), finding that the NBA demonstrates both the largest dispersion in talent and the largest home advantage, while the NHL and MLB stand out for their relative randomness in game outcomes. We conclude by proposing new metrics for judging competitiveness across sports leagues, both within the regular season and using traditional postseason tournament formats. Although we focus on sports, we discuss a number of other situations in which our generalizable models might be usefully applied.

Article information

Source
Ann. Appl. Stat., Volume 12, Number 4 (2018), 2483-2516.

Dates
Received: June 2017
Revised: February 2018
First available in Project Euclid: 13 November 2018

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1542078053

Digital Object Identifier
doi:10.1214/18-AOAS1165

Mathematical Reviews number (MathSciNet)
MR3875709

Keywords
Sports analytics Bayesian modeling competitive balance MCMC

Citation

Lopez, Michael J.; Matthews, Gregory J.; Baumer, Benjamin S. How often does the best team win? A unified approach to understanding randomness in North American sport. Ann. Appl. Stat. 12 (2018), no. 4, 2483--2516. doi:10.1214/18-AOAS1165. https://projecteuclid.org/euclid.aoas/1542078053


Export citation

References

  • Baker, R. D. and McHale, I. G. (2015). Time varying ratings in association football: The all-time greatest team is … J. Roy. Statist. Soc. Ser. A 178 481–492.
  • Baumer, B. S. and Matthews, G. J. (2017). teamcolors: Color palettes for pro sports teams. R package version 0.0.1.9001.
  • Berri, D. (2014). Noll-scully. Available at http://wagesofwins.com/noll-scully/. Accessed May 19, 2016.
  • Berri, D. J. and Schmidt, M. B. (2006). On the road with the National Basketball Association’s superstar externality. J. Sports Econ. 7 347–358.
  • Boulier, B. L. and Stekler, H. O. (2003). Predicting the outcomes of National Football League games. Int. J. Forecast. 19 257–270.
  • Bradley, R. A. and Terry, M. E. (1952). Rank analysis of incomplete block designs. I. The method of paired comparisons. Biometrika 39 324–345.
  • Buttrey, S. E. (2016). Beating the market betting on NHL hockey games. J. Quant. Anal. Sports 12 87–98.
  • Carlin, B. P. (1996). Improved NCAA basketball tournament modeling via point spread and team strength information. Amer. Statist. 50 39–43.
  • Cattelan, M., Varin, C. and Firth, D. (2013). Dynamic Bradley–Terry modelling of sports tournaments. J. R. Stat. Soc. Ser. C. Appl. Stat. 62 135–150.
  • CFP (2014). Bowl Championship Series explained. Available at http://www.collegefootballpoll.com/bcs_explained.html. Accessed May 19, 2016.
  • Colquitt, L. L., Godwin, N. H. and Caudill, S. B. (2001). Testing efficiency across markets: Evidence from the NCAA basketball betting market. J. Bus. Finance Account. 28 231–248.
  • Crabtree, C. (2014). NFL wary of putting Seahawks home games in prime time. Available at http://profootballtalk.nbcsports.com/2014/04/24/nfl-wary-of-putting-seahawks-home-games-in-prime-time-due-to-recent-blowouts/. Accessed October 19, 2016.
  • Crooker, J. R. and Fenn, A. J. (2007). Sports leagues and parity when league parity generates fan enthusiasm. J. Sports Econ. 8 139–164.
  • Del Moral, P., Doucet, A. and Jasra, A. (2006). Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 411–436.
  • Demers, S. (2015). Riding a probabilistic support vector machine to the Stanley Cup. J. Quant. Anal. Sports 11 205–218.
  • Elo, A. E. (1978). The Rating of Chessplayers, Past and Present. Arco Publishing, New York.
  • Fahrmeir, L. and Tutz, G. (1994). Dynamic stochastic models for time-dependent ordered paired comparison systems. J. Amer. Statist. Assoc. 89 1438–1449.
  • Firth, D. (2017). Fair standings in soccer and other round-robin leagues. In New England Symposium on Statistics in Sports.
  • Gandar, J., Zuber, R., O’brien, T. and Russo, B. (1988). Testing rationality in the point spread betting market. J. Finance 43 995–1008.
  • Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1 515–533.
  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2014). Bayesian Data Analysis, 3rd ed. CRC Press, Boca Raton, FL.
  • Gilks, W. R. and Berzuini, C. (2001). Following a moving target—Monte Carlo inference for dynamic Bayesian models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 63 127–146.
  • Glickman, M. E. (1995). A comprehensive guide to chess ratings. Am. Chess J. 3 59–102.
  • Glickman, M. E. (2001). Dynamic paired comparison models with stochastic variances. J. Appl. Stat. 28 673–689.
  • Glickman, M. E. and Stern, H. S. (1998). A state-space model for National Football League scores. J. Amer. Statist. Assoc. 93 25–35.
  • Glickman, M. E. and Stern, H. S. (2016). Estimating team strength in the NFL. In Handbook of Statistical Methods and Analyses in Sports (J. Albert, M. E. Glickman, T. B. Swartz and R. H. Koning, eds.) 5, 113–135. Chapman and Hall/CRC Press, Boca Raton, FL.
  • Harville, D. (1980). Predictions for National Football League games via linear-model methodology. J. Amer. Statist. Assoc. 75 516–524.
  • Humphreys, B. R. (2002). Alternative measures of competitive balance in sports leagues. J. Sports Econ. 3 133–148.
  • James, B., Albert, J. and Stern, H. S. (1993). Answering questions about baseball using statistics. Chance 6 17–30.
  • Knorr-Held, L. (2000). Dynamic rating of sports teams. J. R. Stat. Soc. Ser. D Stat. 49 261–276.
  • Knowles, G., Sherony, K. and Haupert, M. (1992). The demand for Major League Baseball: A test of the uncertainty of outcome hypothesis. Am. Econ. 36 72–80.
  • Koopmeiners, J. S. (2012). A comparison of the autocorrelation and variance of NFL team strengths over time using a Bayesian state-space model. J. Quant. Anal. Sports 8 1–19.
  • Lacey, N. J. (1990). An estimation of market efficiency in the NFL point spread betting market. Appl. Econ. 22 117–129.
  • Lee, Y. H. and Fort, R. (2008). Attendance and the uncertainty-of-outcome hypothesis in baseball. Rev. Ind. Organ. 33 281–295.
  • Leeds, M. and Von Allmen, P. (2004). The economics of sports. Bus. Sports 361–366.
  • Lenten, L. J. (2015). Measurement of competitive balance in conference and divisional tournament design. J. Sports Econ. 16 3–25.
  • Loeffelholz, B., Bednar, E. and Bauer, K. W. (2009). Predicting NBA games using neural networks. J. Quant. Anal. Sports 5 Art. 7, 17.
  • Lopez, M. J. (2013). Inefficiencies in the national hockey league points system and the teams that take advantage. J. Sports Econ. 16 410–424.
  • Lopez, M. J. (2016). The making and comparison of draft curves. Available at https://statsbylopez.com/2016/06/22/the-making-and-comparison-of-draft-curves/. Accessed October 19, 2016.
  • Lopez, M. J. and Matthews, G. J. (2015). Building an NCAA men’s basketball predictive model and quantifying its success. J. Quant. Anal. Sports 11 5–12.
  • Lopez, M. J., Matthews, G. J. and Baumer, B. S. (2018). Supplement to “How often does the best team win? A unified approach to understanding randomness in North American sport.” DOI:10.1214/18-AOAS1165SUPP.
  • Lopez, M. J. and Schuckers, M. (2017). Predicting coin flips: Using resampling and hierarchical models to help untangle the NHL’s shoot-out. J. Sports Sci. 35 888–897.
  • Manner, H. (2016). Modeling and forecasting the outcomes of NBA basketball games. J. Quant. Anal. Sports 12 31–41.
  • Massey, K. (1997). Statistical models applied to the rating of sports teams. Technical report, Bluefield College. Honor’s thesis.
  • Matthews, G. J. (2005). Improving paired comparison models for NFL point spreads by data transformation. Ph.D. thesis, Worcester Polytechnic Institute.
  • Miljković, D., Gajić, L., Kovačević, A. and Konjović, Z. (2010). The use of data mining for basketball matches outcomes prediction. In IEEE 8th International Symposium on Intelligent Systems and Informatics 309–312. IEEE, New York.
  • Moskowitz, T. and Wertheim, L. J. (2011). Scorecasting: The Hidden Influences Behind How Sports Are Played and Games Are Won. Crown Archetype, New York, NY.
  • Mullet, G. M. (1977). Simeon Poisson and the National Hockey League. Amer. Statist. 31 8–12.
  • Nichols, M. W. (2012). The impact of visiting team travel on game outcome and biases in NFL betting markets. J. Sports Econ. 15 78–96.
  • Noll, R. G. (1991). Professional basketball: Economic and business perspectives. In The Business of Professional Sports (J. A. Mangan and P. D. Staudohar, eds.) 18–47. Univ. Illinois Press, Urbana, IL.
  • Owen, P. D. (2010). Limitations of the relative standard deviation of win percentages for measuring competitive balance in sports leagues. Econom. Lett. 109 38–41.
  • Owen, A. (2011). Dynamic Bayesian forecasting models of football match outcomes with estimation of the evolution variance parameter. IMA J. Manag. Math. 22 99–113.
  • Owen, P. D. and King, N. (2015). Competitive balance measures in sports leagues: The effects of variation in season length. Econ. Inq. 53 731–744.
  • Owen, P. D., Ryan, M. and Weatherston, C. R. (2007). Measuring competitive balance in professional team sports using the Herfindahl–Hirschman index. Rev. Ind. Organ. 31 289–302.
  • Paine, N. (2013). Analyzing real home court advantage. Available at http://insider.espn.com/nba/insider/story/_/id/9014283/nba-analyzing-real-home-court-advantage-utah-jazz-denver-nuggets. Accessed October 19, 2016.
  • Paul, R. J. and Weinbach, A. P. (2014). Market efficiency and behavioral biases in the wnba betting market. Int. J. Financial Stud. 2 193–202.
  • Plummer, M. (2016). rjags: Bayesian graphical models using MCMC. R package version 4-6.
  • R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rottenberg, S. (1956). The baseball players’ labor market. J. Polit. Econ. 64 242–258.
  • Scully, G. W. (1989). The Business of Major League Baseball. Univ. Chicago Press, Chicago, IL.
  • Soebbing, B. P. and Humphreys, B. R. (2013). Do gamblers think that teams tank? Evidence from the NBA. Contemp. Econ. Policy 31 301–313.
  • Spann, M. and Skiera, B. (2009). Sports forecasting: A comparison of the forecast accuracy of prediction markets, betting odds and tipsters. J. Forecast. 28 55–72.
  • Spiegelhalter, D. J. (1986). Probabilistic prediction in patient management and clinical trials. Stat. Med. 5 421–433.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 583–639.
  • Stern, H. (1991). On the probability of winning a football game. Amer. Statist. 45 179–183.
  • Thomas, A. C. (2007). Inter-arrival times of goals in ice hockey. J. Quant. Anal. Sports 3 Art. 5, 17.
  • Tutz, G. and Schauberger, G. (2015). Extended ordered paired comparison models with application to football data from German Bundesliga. AStA Adv. Stat. Anal. 99 209–227.
  • Utt, J. and Fort, R. (2002). Pitfalls to measuring competitive balance with Gini coefficients. J. Sports Econ. 3 367–373.
  • Wolfson, J., Koopmeiners, J. S. and DiLernia, A. (2018). Who’s good this year? Comparing the information content of games in the four major US sports. J. Sports Anal. 4 153–163.
  • Yang, T. Y. and Swartz, T. (2004). A two-stage Bayesian model for predicting winners in major league baseball. J. Data Sci. 2 61–73.

Supplemental materials

  • Supplement to “How often does the best team win? A unified approach to understanding randomness in North American sport”. We provide several plots corresponding to different portions of our paper. In addition, we describe a simulation analysis.