Bayesian Analysis

A New Regression Model for Bounded Responses

Sonia Migliorati, Agnese Maria Di Brisco, and Andrea Ongaro

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access

Abstract

Aim of this contribution is to propose a new regression model for continuous variables bounded to the unit interval (e.g. proportions) based on the flexible beta (FB) distribution. The latter is a special mixture of two betas, which greatly extends the shapes of the beta distribution mainly in terms of asymmetry, bimodality and heavy tail behaviour. Its special mixture structure ensures good theoretical properties, such as strong identifiability and likelihood boundedness, quite uncommon for mixture models. Moreover, it makes the model computationally very tractable also within the Bayesian framework here adopted.

At the same time, the FB regression model displays easiness of interpretation as well as remarkable fitting capacity for a variety of data patterns, including unimodal and bimodal ones, heavy tails and presence of outliers. Indeed, simulation studies and applications to real datasets show a general better performance of the FB regression model with respect to competing ones, namely the beta (Ferrari and Cribari-Neto, 2004) and the beta rectangular (Bayes et al., 2012), in terms of precision of estimates, goodness of fit and posterior predictive intervals.

Article information

Source
Bayesian Anal. (2017), 28 pages.

Dates
First available in Project Euclid: 25 October 2017

Permanent link to this document
https://projecteuclid.org/euclid.ba/1508897093

Digital Object Identifier
doi:10.1214/17-BA1079

Keywords
proportions beta regression flexible beta mixture models MCMC outliers heavy tails

Rights
Creative Commons Attribution 4.0 International License.

Citation

Migliorati, Sonia; Di Brisco, Agnese Maria; Ongaro, Andrea. A New Regression Model for Bounded Responses. Bayesian Anal., advance publication, 25 October 2017. doi:10.1214/17-BA1079. https://projecteuclid.org/euclid.ba/1508897093


Export citation

References

  • Akaike, H. (1998). “Information theory and an extension of the maximum likelihood principle.” InSelected Papers of Hirotugu Akaike, 199–213. Springer.
  • Albert, J. (2009).Bayesian computation with R. Springer Science $\&$ Business Media.
  • Bayes, C. L., Bazàn, J. L., and Garcìa, C. (2012). “A new robust regression model for proportions.”Bayesian Analysis, 7(4): 841–866.
  • Branscum, A. J., Johnson, W. O., and Thurmond, M. C. (2007). “Bayesian Beta Regression: Applications to Household Expenditure Data and Genetic Distance between Foot-and-Mouth Disease Viruses.”Australian $\&$ New Zealand Journal of Statistics, 49(3): 287–301.
  • Brooks, S. (2002). “Discussion on the paper by Spiegelhalter, Best, Carlin and Van Der Linde.”
  • Celeux, G., Forbes, F., Robert, C. P., Titterington, D. M., et al. (2006). “Deviance information criteria for missing data models.”Bayesian Analysis, 1(4): 651–673.
  • Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). “Maximum likelihood from incomplete data via the EM algorithm.”Journal of the Royal Statistical Society. Series B, 39: 1–38.
  • Ferrari, S. and Cribari-Neto, F. (2004). “Beta regression for modelling rates and proportions.”Journal of Applied Statistics, 31(7): 799–815.
  • Ferrari, S. L., Espinheira, P. L., and Cribari-Neto, F. (2011). “Diagnostic tools in beta regression with varying dispersion.”Statistica Neerlandica, 65(3): 337–351.
  • Frühwirth-Schnatter, S. (2006).Finite mixture and Markov switching models. Springer Science $\&$ Business Media.
  • García, C., Pérez, J. G., and van Dorp, J. R. (2011). “Modeling heavy-tailed, skewed and peaked uncertainty phenomena with bounded support.”Statistical Methods $\&$ Applications, 20(4): 463–486.
  • Gelfand, A. E. and Smith, A. F. (1990). “Sampling-based approaches to calculating marginal densities.”Journal of the American Statistical Association, 85(410): 398–409.
  • Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2014).Bayesian data analysis, volume 2. Taylor $\&$ Francis.
  • Hahn, E. D. (2008). “Mixture densities for project management activity times: A robust approach to PERT.”European Journal of Operational Research, 188(2): 450–459.
  • Kieschnick, R. and McCullough, B. D. (2003). “Regression analysis of variates observed on (0, 1): percentages, proportions and fractions.”Statistical Modelling, 3(3): 193–213.
  • Lunn, D. J., Thomas, A., Best, N., and Spiegelhalter, D. (2000). “WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility.”Statistics and Computing, 10(4): 325–337.
  • Markatou, M. (2000). “Mixture models, robustness, and the weighted likelihood methodology.”Biometrics, 56: 483–486.
  • McCullagh, P. and Nelder, J. A. (1989).Generalized linear models, volume 37. CRC press.
  • Mengersen, K. L., Robert, C. P., and Guihenneuc-Jouyaux, C. (1999). “MCMC convergence diagnostics: a review.”Bayesian Statistics, 6: 415–440.
  • Migliorati,S., Di Brisco, A. M., and Ongaro, A. (2017). “Supplementary Material for A New Regression Model for Bounded Responses”.Bayesian Analysis.
  • Migliorati, S., Ongaro, A., and Monti, G. S. (2016). “A structured Dirichlet mixture model for compositional data: inferential and applicative issues.”Statistics and Computing, 1–21.
  • Ntzoufras, I. (2011).Bayesian modeling using WinBUGS, volume 698. John Wiley & Sons.
  • Ongaro, A. and Migliorati, S. (2013). “A generalization of the Dirichlet distribution.”Journal of Multivariate Analysis, 114: 412–426.
  • Pammer, K. and Kevan, A. (2007). “The contribution of visual sensitivity, phonological processing, and nonverbal IQ to children’s reading.”Scientific Studies of Reading, 11(1): 33–53.
  • Paolino, P. (2001). “Maximum likelihood estimation of models with beta-distributed dependent variables.”Political Analysis, 9(4): 325–346.
  • R Core Team (2016).R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. URLhttp://www.R-project.org/.
  • Schwarz, G. (1978). “Estimating the dimension of a model.”The Annals of Statistics, 6(2): 461–464.
  • Smithson, M. and Verkuilen, J. (2006). “A better lemon squeezer? Maximum-likelihood regression with beta-distributed dependent variables.”Psychological Methods, 11(1): 54.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). “Bayesian measures of model complexity and fit (with discussion).”Journal of the Royal Statistical Society: Series B, 64(4): 583–639.
  • Stan Development Team (2016).Stan Modeling Language Users Guide and Reference Manual. URLhttp://mc-stan.org/.
  • Tanner, M. A. and Wong, W. H. (1987). “The calculation of posterior distributions by data augmentation.”Journal of the American Statistical Association, 82(398): 528–540.
  • Thomas, A. (1994). “BUGS: A statistical modelling package.”RTA/BCS Modular Languages Newsletter, 2: 36–38.

Supplemental materials

  • Supplementary Material for A New Regression Model for Bounded Responses. The online supplementary material contains proofs of the Propositions 1 and 2, sensible recommendations about the choice of priors for ϕ, a list of the thinning intervals adopted as to guarantee Raftery–Lewis diagnostics in most cases around 1. Furthermore, it includes a detailed description of the Simulation scenarios of Section 5.1, further results for the regression model for the mean with two explanatory variables for the reading accuracy dataset (Section 6.1) and a detailed analysis of residuals for sport data of Section 6.2.