Annals of Applied Statistics

Mixed model and estimating equation approaches for zero inflation in clustered binary response data with application to a dating violence study

Kara A. Fulton, Danping Liu, Denise L. Haynie, and Paul S. Albert

Full-text: Open access


The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian–Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored.

Article information

Ann. Appl. Stat., Volume 9, Number 1 (2015), 275-299.

First available in Project Euclid: 28 April 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Zero inflation clustered binary data maximum likelihood generalized estimating equations adolescent dating violence


Fulton, Kara A.; Liu, Danping; Haynie, Denise L.; Albert, Paul S. Mixed model and estimating equation approaches for zero inflation in clustered binary response data with application to a dating violence study. Ann. Appl. Stat. 9 (2015), no. 1, 275--299. doi:10.1214/14-AOAS791.

Export citation


  • Abramowitz, M. and Stegun, I. A. (1972). Handbook of Mathematical Functions. Dover, Mineola, NY. (Original work published 1964).
  • Ackard, D. M., Eisenberg, M. E. and Neumark-Sztainer, D. (2007). Long-term impact of adolescent dating violence on the behavioral and psychological health of male and female youth. J. Pediatr. 151 476–481.
  • Archer, J. (2000). Sex differences in aggression between heterosexual partners: A meta-analytic review. Psychol. Bull. 126 651–680.
  • Collins, W. (2003). More than myth: The developmental significance of romantic relationships during adolescence. Journal of Research on Adolescents 13 1–24.
  • Cruyff, M. J. L. F., Böckenholt, U., van den Hout, A. and van der Heijden, P. G. M. (2008). Accounting for self-protective responses in randomized response data from a social security survey using the zero-inflated Poisson model. Ann. Appl. Stat. 2 316–331.
  • Diop, A., Diop, A. and Dupuy, J.-F. (2011). Maximum likelihood estimation in the logistic regression model with a cure fraction. Electron. J. Stat. 5 460–483.
  • Exner-Cortens, D., Eckenrode, J. and Rothman, E. (2013). Longitudinal associations between teen dating violence victimization and adverse health outcomes. Pediatrics 131 71–78.
  • Follmann, D. A. and Lambert, D. (1991). Identifiability of finite mixtures of logistic regression models. J. Statist. Plann. Inference 27 375–381.
  • Follmann, D. and Wu, M. (1995). An approximate generalized linear model with random effects for informative missing data. Biometrics 51 151–168.
  • Foshee, V. A. (1996). Gender differences in adolescent dating prevalence, types and injuries. Health Education Research 11 275–286.
  • Fulton, K. A., Liu, D., Haynie, D. L. and Albert, P. S. (2015). Supplement to “Mixed model and estimating equation approaches for zero inflation in clustered binary response data with application to a dating violence study.” DOI:10.1214/14-AOAS791SUPP.
  • Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics 56 1030–1039.
  • Hall, D. B. and Berenhaut, K. S. (2002). Score tests for heterogeneity and overdispersion in zero-inflated Poisson and binomial regression models. Canad. J. Statist. 30 415–430.
  • Hall, D. B. and Zhang, Z. (2004). Marginal models for zero inflated clustered data. Stat. Model. 4 161–180.
  • Haynie, D. L., Farhat, T., Brooks-Russell, A., Wang, J., Barbieri, B. and Iannotti, R. J. (2013). Dating violence perpetration and victimization among US adolescents: Prevalence, patterns, and associations with health complaints and substance use. J. Adolesc. Health 53 194–201.
  • Kelley, M. E. and Anderson, S. J. (2008). Zero inflation in ordinal data: incorporating susceptibility to response through the use of a mixture model. Stat. Med. 27 3674–3688.
  • Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34 1–14.
  • Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22.
  • Liang, K.-Y., Zeger, S. L. and Qaqish, B. (1992). Multivariate regression analyses for categorical data. J. R. Stat. Soc. Ser. B Stat. Methodol. 54 3–40.
  • Min, Y. and Agresti, A. (2002). Modeling nonnegative data with clumping at zero: A survey. J. Iran. Stat. Soc. (JIRSS) 1 7–33.
  • Min, Y. and Agresti, A. (2005). Random effect models for repeated measures of zero-inflated count data. Stat. Model. 5 1–19.
  • Offenhauer, P. and Buchalter, A. (2011). Teen dating violence: A literature review and annotated bibliography. A report prepared by the Federal Research Division, Library of Congress under an interagency agreement with the Violence and Victimization Research Division, National Institute of Justice.
  • Prentice, R. L. (1988). Correlated binary regression with covariates specific to each binary observation. Biometrics 44 1033–1048.
  • Prentice, R. L. and Zhao, L. P. (1991). Estimating equations for parameters in means and covariances of multivariate discrete and continuous responses. Biometrics 47 825–839.
  • Self, S. G. and Liang, K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Statist. Assoc. 82 605–610.
  • Short, M. B., Catallozzi, M., Breitkopf, C. R., Auslander, B. A. and Rosenthal, S. L. (2013). Adolescent intimate heterosexual relationships: Measurement issues. J. Pediatr. Adolesc. Gynecol. 26 3–6.
  • R Core Team (2014) R: A Language and Environment for Statistical Computing. Vienna, Austria.
  • van den Broek, J. (1995). A score test for zero inflation in a Poisson distribution. Biometrics 51 738–743.
  • Zeger, S. L., Liang, K.-Y. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44 1049–1060.

Supplemental materials

  • Supplement to "Mixed model and estimating equation approaches for zero inflation in clustered binary response data with application to a dating violence study".: Supplement A: Additional simulation one. Examine the performance of the proposed model with a smaller sample size ($N=500$). Supplement B: Additional Simulation Two. Examine the sensitivity of assuming a constant zero-inflation probability when the probability is affected by covariates. Supplement C: Additional Simulation Three. Examine the performance of zero-inflated beta-binomial model.