The Annals of Applied Statistics

A flexible regression model for count data

Kimberly F. Sellers and Galit Shmueli

Full-text: Open access

Abstract

Poisson regression is a popular tool for modeling count data and is applied in a vast array of applications from the social to the physical sciences and beyond. Real data, however, are often over- or under-dispersed and, thus, not conducive to Poisson regression. We propose a regression model based on the Conway–Maxwell-Poisson (COM-Poisson) distribution to address this problem. The COM-Poisson regression generalizes the well-known Poisson and logistic regression models, and is suitable for fitting count data with a wide range of dispersion levels. With a GLM approach that takes advantage of exponential family properties, we discuss model estimation, inference, diagnostics, and interpretation, and present a test for determining the need for a COM-Poisson regression over a standard Poisson regression. We compare the COM-Poisson to several alternatives and illustrate its advantages and usefulness using three data sets with varying dispersion.

Article information

Source
Ann. Appl. Stat. Volume 4, Number 2 (2010), 943-961.

Dates
First available: 3 August 2010

Permanent link to this document
http://projecteuclid.org/euclid.aoas/1280842147

Digital Object Identifier
doi:10.1214/09-AOAS306

Mathematical Reviews number (MathSciNet)
MR2758428

Citation

Sellers, Kimberly F.; Shmueli, Galit. A flexible regression model for count data. The Annals of Applied Statistics 4 (2010), no. 2, 943--961. doi:10.1214/09-AOAS306. http://projecteuclid.org/euclid.aoas/1280842147.


Export citation

References

  • Ben, M. G. and Yohai, V. J. (2004). Quantile quantile plot for deviance residuals in the generalized linear model. J. Comput. Graph. Statist. 13 36–47.
  • Boatwright, P., Borle, S. and Kadane, J. B. (2003). A model of the joint distribution of purchase quantity and timing. J. Amer. Statist. Assoc. 98 564–572.
  • Borle, S., Boatwright, P. and Kadane, J. B. (2006). The timing of bid placement and extent of multiple bidding: An empirical investigation using ebay online auctions. Statist. Sci. 21 194–205.
  • Borle, S., Boatwright, P., Kadane, J. B., Nunes, J. C. and Shmueli, G. (2005). The effect of product assortment changes on customer retention. Marketing Science 24 616–622.
  • Borle, S., Dholakia, U., Singh, S. and Westbrook, R. (2007). The impact of survey participation on subsequent behavior: An empirical investigation. Marketing Science 26 711–726.
  • Cui, Y., Kim, D.-Y. and Zhu, J. (2006). On the generalized Poisson regression mixture model for mapping quantitative trait loci with count data. Genetics 174 2159–2172.
  • Davison, A. and Tsai, C.-L. (1992). Regression model diagnostics. International Statistical Review 60 337–353.
  • Famoye, F. (1993). Restricted generalized Poisson regression model. Comm. Statist. Theory Methods 22 1335–1354.
  • Famoye, F., Wulu, J. J. and K. P. Singh (2004). On the generalized Poisson regression model with an application to accident data. Journal of Data Science 2 287–295.
  • Kadane, J. B., Krishnan, R. and Shmueli, G. (2006). A data disclosure policy for count data based on the COM-Poisson distribution. Management Science 52 1610–1617.
  • Kadane, J. B., Shmueli, G., Minka, T. P., Borle, S. and Boatwright, P. (2005). Conjugate analysis of the Conway–Maxwell-Poisson distribution. Bayesian Anal. 1 363–374.
  • Kalyanam, K., Borle, S. and Boatwright, P. (2007). Deconstructing each item’s category contribution. Marketing Science 26 327–341.
  • Kutner, M. H., Nachtsheim, C. J. and Neter, J. (2003). Applied Linear Regression Models, 4th ed. McGraw-Hill, New York.
  • Lattin, J. M., Green, P. E. J. and Caroll, D. (2003). Analyzing Mulivariate Data. Duxbury, Pacific Grove, CA.
  • Long, J. S. (1997). Regression Models for Categorical and Limited Dependent Variables. Sage, London.
  • Lord, D., Guikema, S. D. and Geedipally, S. R. (2008). Application of the Conway–Maxwell-Poisson generalized linear model for analyzing motor vehicle crashes. Accident Analysis & Prevention 40 1123–1134.
  • McCullagh, P. and Nelder, J. A. (1997). Generalized Linear Models, 2nd ed. Chapman & Hall/CRC, London.
  • Minka, T. P., Shmueli, G., Kadane, J. B., Borle, S. and Boatwright, P. (2003). Computing with the COM-Poisson distribution. Technical Report 776, Dept. Statistics, Carnegie Mellon Univ., Pittsburgh, PA.
  • Shmueli, G., Minka, T. P., Kadane, J. B., Borle, S. and Boatwright, P. (2005). A useful distribution for fitting discrete data: Revival of the Conway–Maxwell-Poisson distribution. Appl. Statist. 54 127–142.

Supplemental materials

  • Supplementary materials: . Materials include details of the iterative reweighted least squares estimation, the Fisher information matrix components associated with the COM-Poisson coefficients, the full airfreight data set and diagnostics under various regression models for the airfreight and crash data, and additional logistic regression examples.