Bayesian Analysis

Semi-parametric Bayesian inference for multi-season baseball data

Mark Munsell,Peter Müller,Fernando A. Quintana, and Gary L. Rosner

Full-text: Open access

Abstract

We analyze complete sequences of successes (hits, walks, and sacrifices) for a group of players from the American and National Leagues, collected over 4 seasons. The goal is to describe how players' performances vary from season to season. In particular, we wish to assess and compare the effect of available occasion-specific covariates over seasons. The data are binary sequences for each player and each season. We model dependence in the binary sequence by an autoregressive logistic model. The model includes lagged terms up to a fixed order. For each player and season we introduce a different set of autologistic regression coefficients, i.e., the regression coefficients are random effects that are specific to each season and player. We use a nonparametric approach to define a random effects distribution. The nonparametric model is defined as a mixture with a Dirichlet process prior for the mixing measure. The described model is justified by a representation theorem for order-$k$ exchangeable sequences. Besides the repeated measurements for each season and player, multiple seasons within a given player define an additional level of repeated measurements. We introduce dependence at this level of repeated measurements by relating the season-specific random effects vectors in an autoregressive fashion. We ultimately conclude that while some covariates like the ERA of the opposing pitcher are always relevant, others like an indicator for the game being into the seventh inning may be significant only for certain seasons, and some others, like the score of the game, can safely be ignored.

Article information

Source
Bayesian Anal. Volume 3, Number 2 (2008), 317-338.

Dates
First available: 22 June 2012

Permanent link to this document
http://projecteuclid.org/euclid.ba/1340370550

Digital Object Identifier
doi:10.1214/08-BA312

Mathematical Reviews number (MathSciNet)
MR2407429

Citation

Quintana, Fernando A.; Müller, Peter; Rosner, Gary L.; Munsell, Mark. Semi-parametric Bayesian inference for multi-season baseball data. Bayesian Analysis 3 (2008), no. 2, 317--338. doi:10.1214/08-BA312. http://projecteuclid.org/euclid.ba/1340370550.


Export citation

References

  • Albright, S. C. (1993). "A statistical analysis of hitting streaks in baseball (with Discussion and a reply from the author)"." Journal of the American Statistical Association, 88: 1175–1196.
  • Antoniak, C. E. (1974). "Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems"." The Annals of Statistics, 2: 1152–1174.
  • Basu, S. and Mukhopadhyay, S. (2000). "Bayesian analysis of binary regression using symmetric and asymmetric links." Sankhyā, 62: 372–387.
  • Bush, C. A. and MacEachern, S. N. (1996). "A semiparametric Bayesian model for randomised block designs." Biometrika, 83(2): 275–285.
  • Carlin, B. P. and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman & Hall.
  • Dahl, D. B. (2003). "An improved merge-split sampler for conjugate Dirichlet Process mixture models"." Technical Report 1086, Department of Statistics, University of Wisconsin.
  • Erkanli, A., Soyer, R., and Angold, A. (2001). "Bayesian Analyses of Longitudinal Binary Data Using Markov Regression Models of Unknown Order." Statistics in Medicine, 20(5): 755–770.
  • Ferguson, T. S. (1973). "A Bayesian analysis of some nonparametric problems"." The Annals of Statistics, 1: 209–230.
  • Freedman, D. A. (1962). "Invariants under mixing which generalize de Finetti's theorem." Annals of Mathematical Statistics, 33: 916–923.
  • –- (1962). "Mixtures of Markov processes." Annals of Mathematical Statistics, 33: 114–118.
  • Gart, J. J. (1966). "Alternative analyses of contingency tables." Journal of the Royal Statistical Society. Series B. Methodological, 28: 164–179.
  • Jain, S. and Neal, R. M. (2004). "A split-merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model." Journal of Computational and Graphical Statistics, 13(1): 158–182.
  • Kleinman, K. and Ibrahim, J. (1998). "A Semi-parametric Bayesian Approach to the Random Effects Model." Biometrics, 54: 921–938.
  • Liu, J. S. (1996). "Nonparametric Hierarchical Bayes via Sequential Imputations"." The Annals of Statistics, 24(3): 911–930.
  • MacEachern, S. N. and Müller, P. (1998). "Estimating Mixture of Dirichlet Process Models"." Journal of Computational and Graphical Statistics, 7(2): 223–338.
  • –- (2000). "Efficient MCMC Schemes for Robust Model Extensions using Encompassing Dirichlet Process Mixture Models"." In Ruggeri, F. and Ríos-Insua, D. (eds.), Robust Bayesian Analysis, 295–316. New York: Springer-Verlag.
  • Mukhopadhyay, S. and Gelfand, A. E. (1997). "Dirichlet Process Mixed Generalized Linear Models." Journal of the American Statistical Association, 92: 633–639.
  • Müller, P. and Quintana, F. (2004). "Nonparametric Bayesian Data Analysis"." Statistical Science, 19: 95–110.
  • Müller, P. and Rosner, G. (1997). "A Bayesian population model with hierarchical mixture priors applied to blood count data." Journal of the American Statistical Association, 92: 1279–1292.
  • Neal, R. M. (2000). "Markov chain sampling methods for Dirichlet process mixture models." Journal of Computational and Graphical Statistics, 9: 249–265.
  • Quintana, F. and Müller, P. (2004). "Nonparametric Bayesian Assessment of the Order of Dependence for Binary Sequences." Journal of Computational and Graphical Statistics, 13: 213–231.
  • Quintana, F. A. and Newton, M. A. (1998). "Assessing the Order of Dependence for Partially Exchangeable Binary Data"." Journal of the American Statistical Association, 93: 194–202.
  • Sethuraman, J. (1994). "A constructive definition of Dirichlet priors." Statistica Sinica, 4(2): 639–650.
  • Walker, S. G., Damien, P., Laud, P. W., and Smith, A. F. M. (1999). "Bayesian Nonparametric Inference for Random Distributions and Related Functions (with discussion and a reply from the authors)." Journal of the Royal Statistical Society, Series B: Statistical Methodology, 61: 485–527.