Brazilian Journal of Probability and Statistics

Bayesian analysis and diagnostic of overdispersion models for binomial data

Carolina C. M. Paraíba, Carlos A. R. Diniz, and Rubiane M. Pires

Full-text: Open access

Abstract

In the present paper, we focus our attention on the multiplicative binomial model, the double binomial model and the beta-binomial model considering the Bayesian perspective, modeling both the probability of success and the dispersion parameters. A Bayesian methodology is considered for estimation and diagnostic under these three overdispersed binomial regression models. A teratology data set is analyzed using the considered methodology. We present a simulation study, based on data sets generated mimicking the characteristics of the teratology data to assess the quality of Bayesian estimates and to assess the performance of the considered Bayesian diagnostic tools under each regression model. An extended study based on simulated data is also performed to compare the logit and probit link functions in a setting of overdispersed binomial data. We also consider simulated data sets to illustrate how to detect overdispersion using posterior predictive checks.

Article information

Source
Braz. J. Probab. Stat., Volume 29, Number 3 (2015), 608-639.

Dates
Received: May 2012
Accepted: January 2014
First available in Project Euclid: 11 June 2015

Permanent link to this document
https://projecteuclid.org/euclid.bjps/1433983068

Digital Object Identifier
doi:10.1214/14-BJPS236

Mathematical Reviews number (MathSciNet)
MR3355750

Zentralblatt MATH identifier
1326.62061

Keywords
Bayesian analysis diagnostic overdispersion multiplicative binomial double binomial beta-binomial

Citation

Paraíba, Carolina C. M.; Diniz, Carlos A. R.; Pires, Rubiane M. Bayesian analysis and diagnostic of overdispersion models for binomial data. Braz. J. Probab. Stat. 29 (2015), no. 3, 608--639. doi:10.1214/14-BJPS236. https://projecteuclid.org/euclid.bjps/1433983068


Export citation

References

  • Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley Series in Probability and Statistics. New York: Wiley-Interscience.
  • Aitkin, M. (1991). Posterior Bayes factor (with discussion). Journal of the Royal Statistical Society, Series B 53, 111–142.
  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association 88, 669–679.
  • Albert, J. H. and Chib, S. (1995). Bayesian residual analysis for binary response regression models. Biometrika 82, 747–759.
  • Altham, P. M. E. (1978). Two generalizations of the binomial distribution. Journal of the Royal Statistical Society, Series C 27, 162–167.
  • Carlin, B. P. and Louis, T. A. (2009). Bayesian Methods for Data Analysis, 3rd ed. Boca Raton: Chapman & Hall/CRC.
  • Chambers, E. A. and Cox, D. R. (1967). Discrimination between alternative binary response models. Biometrika 54, 573–578.
  • Cho, H., Ibrahim, J. G., Sinha, D. and Zhu, H. (2009). Bayesian case in influence diagnostics for survival models. Biometrics 65, 116–124.
  • Cook, J. D., Skikne, B. S. and Baynes, R. D. (1994). Iron deficiency: The global perspective. Advances in Experimental Medicine and Biology 356, 219–228.
  • Cottet, R., Kohn, R. J. and Nott, D. J. (2008). Variable selection and model averaging in semiparametric overdispersed generalized linear models. Journal of the American Statistical Association 103, 661–671.
  • Dey, K., Gelfand, A. E. and Peng, F. (1997). Overdispersed generalized linear models. Journal of Statistical Planning and Inference 64, 93–107.
  • Efron, B. (1986). Double exponential families and their use in generalized linear regression. Journal of the American Statistical Association 81, 709–721.
  • Gelfand, A. E., Dey, D. and Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods (with discussion). In Bayesian Statistics 4 (Bernardo, J. M., Berger, J., Dawid, A. P. and Smith, J. F. M., eds.) 147–167. Oxford: Oxford Univ. Press.
  • Gelman, A., Meng, X. L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 6, 733–807.
  • Gelman, A., Goegebeur, Y., Tuerlinckx, F. and Van Mechelen, I. (2000). Diagnostic checks for discrete data regression models using posterior predictive simulations. Journal of the Royal Statistical Society, Series C 49, 247–268.
  • Gelman, A., Carlin, J., Stern, H. and Rubin, D. B. (2003). Bayesian Data Analysis, 2nd ed. Texts in Statistical Science. London: Chapman & Hall.
  • Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to calculating posterior moments. In Bayesian Statistics 4 (Bernardo, J. M., Berger, J., Dawid, A. P. and Smith, J. F. M., eds.). Oxford: Oxford Univ. Press.
  • Kahn, M. J. and Raftery, A. E. (1996). Discharge rates of medicare stroke patients to skilled nursing facilities: Bayesian logistic regression with unobserved heterogeneity. Journal of the American Statistical Association 91, 29–41.
  • Kass, R. E. and Raftery, A. E. (1995). Bayes factor. Journal of the American Statistical Association 90, 773–795.
  • Lee, J. C. and Sabavala, D. J. (1987). Bayesian estimation and prediction for the beta-binomial model. Journal of Business & Economic Statistics 5, 357–367.
  • Lee, Y. and Nelder, J. A. (2000). The relationship between double-exponential families and extended quasilikelihood families, with application to modelling Geissler’s human sex ratio data. Journal of the Royal Statistical Society, Series C 49, 413–419.
  • Lindsey, J. K. and Altham, P. M. E. (1998). Analysis of the human sex ratio by using overdispersion models. Journal of the Royal Statistical Society, Series C 1, 149–157.
  • Lynch, S. M. and Western, B. (2004). Bayesian posterior predictive checks for complex models. Sociological Methods & Research 32, 301–335.
  • Meng, X. L. (1994). Posterior predictive p-values. The Annals of Statistics 22, 1142–1160.
  • Moore, D. F. and Tsiatis, A. (1991). Robust estimation of the variance in moment methods for extra-binomial and extra-Poisson. Biometrics 47, 383–401.
  • Nott, D. (2006). Semiparametric estimation of the mean and variance functions for non-Gaussian data. Computational Statistics 21, 603–620.
  • Peng, F. and Dey, D. (1995). Bayesian analysis of outlier problems using divergence measures. The Canadian Journal of Statistics 23, 199–213.
  • Pires, R. M. and Diniz, C. A. R. (2012). Correlated binomial regression models. Computational Statistics and Data Analysis 56, 2513–2525.
  • Prentice, R. L. (1986). Binary regression using an extended beta-binomial distribution, with discussion of correlation induced by covariate measurement errors. Journal of the American Statistical Association 81, 321–327.
  • R Core Team (2013). R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available at http://www.R-project.org/.
  • Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods. Springer Texts in Statistics. New York: Springer.
  • Shepard, T. H., Mackler, B. and Finch, C. A. (1980). Reproductive studies in the iron-deficient rat. Teratology 22, 329–334.
  • Skellam, J. G. (1948). A probability distribution derived from the binomial distribution by regarding the probability of success as variable between the sets of trials. Journal of the Royal Statistical Society, Series B 10, 257–261.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B 64, 583–639.