Statistical Science

Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review

Lei Liu, Ya-Chen Tina Shih, Robert L. Strawderman, Daowen Zhang, Bankole A. Johnson, and Haitao Chai

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Zero-inflated nonnegative continuous (or semicontinuous) data arise frequently in biomedical, economical, and ecological studies. Examples include substance abuse, medical costs, medical care utilization, biomarkers (e.g., CD4 cell counts, coronary artery calcium scores), single cell gene expression rates, and (relative) abundance of microbiome. Such data are often characterized by the presence of a large portion of zero values and positive continuous values that are skewed to the right and heteroscedastic. Both of these features suggest that no simple parametric distribution may be suitable for modeling such type of outcomes. In this paper, we review statistical methods for analyzing zero-inflated nonnegative outcome data. We will start with the cross-sectional setting, discussing ways to separate zero and positive values and introducing flexible models to characterize right skewness and heteroscedasticity in the positive values. We will then present models of correlated zero-inflated nonnegative continuous data, using random effects to tackle the correlation on repeated measures from the same subject and that across different parts of the model. We will also discuss expansion to related topics, for example, zero-inflated count and survival data, nonlinear covariate effects, and joint models of longitudinal zero-inflated nonnegative continuous data and survival. Finally, we will present applications to three real datasets (i.e., microbiome, medical costs, and alcohol drinking) to illustrate these methods. Example code will be provided to facilitate applications of these methods.

Article information

Statist. Sci., Volume 34, Number 2 (2019), 253-279.

First available in Project Euclid: 19 July 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Two-part model Tobit model health econometrics semiparametric regression joint model cure rate frailty model splines


Liu, Lei; Shih, Ya-Chen Tina; Strawderman, Robert L.; Zhang, Daowen; Johnson, Bankole A.; Chai, Haitao. Statistical Analysis of Zero-Inflated Nonnegative Continuous Data: A Review. Statist. Sci. 34 (2019), no. 2, 253--279. doi:10.1214/18-STS681.

Export citation


  • Aitchison, J. (1955). On the distribution of a positive random variable having a discrete probability mass at the origin. J. Amer. Statist. Assoc. 50 901–908.
  • Albert, P. S. (2005). Letter to the editor. Biometrics 61 879–881.
  • Amemiya, T. (1994). Introduction to Statistics and Econometrics. Harvard Univ. Press, Boston, MA.
  • Bang, H. and Tsiatis, A. A. (2002). Median regression with censored cost data. Biometrics 58 643–649.
  • Basu, A. and Manning, W. G. (2006). A test for proportional hazards assumption within the exponential conditional mean framework. Health Serv. Outcomes Res. Methodol. 6 81–100.
  • Basu, A., Manning, W. G. and Mullahy, J. (2004). Comparing alternative models: Log vs Cox proportional hazard? Health Econ. 13 749–765.
  • Basu, A. and Rathous, P. J. (2005). Estimating marginal and incremental effects on health outcomes using flexible link and variance function models. Biostatistics 6 93–109.
  • Berk, K. N. and Lachenbruch, P. A. (2002). Repeated measures with zeros. Stat. Methods Med. Res. 11 303–316.
  • Bjerre, B., Marques, P., Selen, J. and Thorsson, U. (2007). Swedish alcohol ignition interlock programme for drink-drivers: Effects on hospital care utilization and sick leave. Addiction 102 560–570.
  • Blough, D. K., Madden, C. W. and Hornbrook, M. C. (1999). Modeling risk using generalized linear models. J. Health Econ. 18 153–171.
  • Boag, J. W. (1949). Maximum likelihood estimates of the proportion of patients cured by cancer therapy. J. Roy. Statist. Soc. 11 15–53.
  • Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations. (With discussion.) J. R. Stat. Soc. Ser. B. Stat. Methodol. 26 211–252.
  • Breton, C. V., Kile, M. L., Catalano, P. J., Hoffman, E., Quamruzzaman, Q., Rahman, M., Mahiuddin, G. and Christiani, D. C. (2007). GSTM1 and APE1 genotypes affect arsenic-induced oxidative stress: A repeated measures study. Environ. Health 6 39.
  • Chai, H. S. and Bailey, K. R. (2008). Use of log-skew-normal distribution in analysis of continuous data with a discrete component at zero. Stat. Med. 27 3643–3655.
  • Chai, H., Jiang, H., Lin, L. and Liu, L. (2018). A marginalized two-part Beta regression model for microbiome compositional data. PLoS Comput. Biol. 14 e1006329.
  • Chen, E. Z. and Li, H. (2016). A two-part mixed-effect model for analyzing longitudinal microbiome compositional data. Bioinformatics 32 2611–2617.
  • Chen, J., Johnson, B. A., Wang, X. Q., O’Quigley, J., Isaac, M., Zhang, D. and Liu, L. (2012). Trajectory analyses in alcohol treatment research. Alcohol. Clin. Exp. Res. 36 1442–1448.
  • Chen, J., Liu, L., Johnson, B. A. and O’Quigley, J. (2013a). Penalized likelihood estimation for semiparametric mixed models, with application to alcohol treatment research. Stat. Med. 32 335–346.
  • Chen, J., Liu, L., Zhang, D. and Shih, Y.-C. T. (2013b). A flexible model for the mean and variance functions, with application to medical cost data. Stat. Med. 32 4306–4318.
  • Chen, J., Liu, L., Shih, Y.-C. T., Zhang, D. and Severini, T. A. (2016). A flexible model for correlated medical costs, with application to medical expenditure panel survey data. Stat. Med. 35 883–894.
  • Cooper, N. J., Lambert, P. C., Abrams, K. R. and Sutton, A. J. (2007). Predicting costs over time using Bayesian Markov chain Monte Carlo methods: An application to early inflammatory polyarthritis. Health Econ. 16 37–56.
  • Cotter, D., Thamer, M., Narasimhan, K., Zhang, Y. and Bullock, K. (2006). Translating epoetin research into practice: The role of government and the use of scientific evidence. Health Aff. 25 1249–1259.
  • Dominici, F. and Zeger, S. L. (2005). Smooth quantile ratio estimation with regression: Estimating medical expenditures for smoking-attributable diseases. Biostatistics 6 505–519.
  • Dominici, F., Cope, L., Naiman, D. Q. and Zeger, S. L. (2005). Smooth quantile ratio estimation. Biometrika 92 543–557.
  • Dow, W. H. and Norton, E. C. (2003). Choosing between and interpreting the heckit and two-part models for corner solutions. Health Serv. Outcomes Res. Methodol. 4 5–18.
  • Duan, N. (1983). Smearing estimate: A nonparametric retransformation method. J. Amer. Statist. Assoc. 78 605–610.
  • Duan, N., Manning, W. G., Morris, C. and Newhouse, J. P. (1983). A comparison of alternative models for the demand for medical care. J. Bus. Econom. Statist. 1 115–126.
  • Dudley, R. A., Harrell, F. E. Jr, Smith, L. R., Mark, D. B., Califf, R. M., Pryor, D. B., Glower, D., Lipscomb, J. and Hlatky, M. (1993). Comparison of analytic models for estimating the effect of clinical factors on the cost of coronary artery bypass graft surgery. J. Clin. Epidemiol. 46 261–271.
  • Falk, D., Wang, X. Q., Liu, L., Fertig, J., Mattson, M., Ryan, M., Johnson, B., Stout, R. and Litten, R. Z. (2010). Percentage of subjects with no heavy drinking days: Evaluation as an efficacy endpoint for alcohol clinical trials. Alcohol. Clin. Exp. Res. 34 2022–2034.
  • Farewell, V. T. (1982). The use of mixture models for the analysis of survival data with long-term survivors. Biometrics 38 1041–1046.
  • Finak, G., McDavid, A., Yajima, M., Deng, J., Gersuk, V., Shalek, A. K., Slichter, C. K., Miller, H. W., McElrath, M. J., Prlic, M., Linsley, P. S. and Gottardo, R. (2015). MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 16 278.
  • Food and Drug Administration (2006). Medical Review of Vivitrol 21-897. U.S. Government, Rockville, MD.
  • Gatsonis, C., Epstein, A. M., Newhouse, J. P., Normand, S. L. and McNeil, B. J. (1995). Variations in the utilization of coronary angiography for elderly patients with an acute myocaridal infaction: An analysis using hierarchical logistic regression. Med. Care 33 625–642.
  • Ghosh, P. and Albert, P. S. (2009). A Bayesian analysis for longitudinal semicontinuous data with an application to an acupuncture clinical trial. Comput. Statist. Data Anal. 53 699–706.
  • Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: A case study. Biometrics 56 1030–1039.
  • Hall, D. B. and Severini, T. A. (1998). Extended generalized estimating equations for clustered data. J. Amer. Statist. Assoc. 93 1365–1375.
  • Han, D., Liu, L., Su, X., Johnson, B. and Sun, L. (2018). Variable selection for random effects two-part model. Stat. Methods Med. Res. DOI:10.1177/0962280218784712.
  • Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica 47 153–161.
  • Heitjan, D. F., Kim, C. Y. and Li, H. (2004). Bayesian estimation of cost-effectiveness from censored data. Stat. Med. 23 1297–1309.
  • Henderson, R., Diggle, P. and Dobson, A. (2000). Joint modelling of longitudinal measurements and event time data. Biostatistics 1 465–480.
  • Hyndman, R. and Grunwald, G. (2000). Generalized additive modelling of mixed distribution Markov models with application to Melbourne’s rainfall. Aust. N. Z. J. Stat. 42 145–158.
  • Jain, A. K. and Strawderman, R. L. (2002). Flexible hazard regression modeling for medical cost data. Biostatistics 3 101–118.
  • James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics 103. Springer, New York.
  • Jha, A. K., Varosy, P. D., Kanaya, A. K., Hunninghake, D. B., Hlatky, M. A., Waters, D. D., Furberg, C. D. and Shlipak, M. G. (2003). Differences in medical care and disease outcomes among black and white women with heart disease. Circulation 108 1089–1094.
  • Johnson, B. A., Rosenthal, N., Capece, J. A., Wiegand, F., Mao, L., Bayers, K., McKay, A., Ait-Daoud, N., Anton, R. F., Ciraulo, D. A., Kranzler, H. R., Mann, K., O’Malley, S. S. and Swift, R. M. (2007). Topiramate for treating alcohol dependence—a randomized controlled trial. J. Am. Med. Assoc. 298 1641–1651.
  • Johnson, B. A., Ait-Daoud, N., Wang, X.-Q., Penberthy, J. K., Javors, M. A., Seneviratne, C. and Liu, L. (2013). Topiramate for the treatment of cocaine addiction: A randomized clinical trial. J. Am. Med. Dir. Assoc. Psychiatr. 70 1338–1346.
  • Kalbfleisch, J. D. and Prentice, R. L. (2002). The Statistical Analysis of Failure Time Data, 2nd ed. Wiley Series in Probability and Statistics. Wiley Interscience, Hoboken, NJ.
  • Kuk, A. Y. C. and Chen, C. (1992). A mixture model combining logistic regression with proportional hazards regression. Biometrika 79 531–541.
  • Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 34 1–14.
  • Leung, S. F. and Yu, S. (1996). On the choice between sample selection and two-part models. J. Econometrics 72 197–229.
  • Lewis, J. D., Chen, E. Z., Baldassano, R. N., Otley, A. R., Griffiths, A. M., Lee, D., Bittinger, K., Bailey, A., Friedman, E. S., Hoffmann, C., Albenberg, L., Sinha, R., Compher, C., Gilroy, E., Nessel, L., Grant, A., Chehoud, C., Li, H., Wu, G. D. and Bushman, F. D. (2015). Inflammation, antibiotics, and diet as environmental stressors of the gut microbiome in pediatric Crohn’s disease. Cell Host Microbe 18 489–500.
  • Li, P., Schneider, J. E. and Ward, M. M. (2007). Effect of critical access hospital conversion on patient safety. Health Serv. Res. 42 2089–2108; discussion 2294–2323.
  • Li, C.-S. and Taylor, J. M. G. (2002). A semi-parametric accelerated failure time cure model. Stat. Med. 21 3235–3247.
  • Lin, D. Y., Etzioni, R., Feuer, E. J. and Wax, Y. (1997). Estimating medical costs from incomplete follow-up data. Biometrics 53 419–434.
  • Lipscomb, J., Ancukiewicz, M., Parmigiani, G., Hasselblad, V., Samsa, G. and Matchar, D. B. (1998). Predicting the cost of illness: A comparison of alternative models applied to stroke. Med. Decis. Mak. 18 S39–S56.
  • Littell, R. C., Milliken, G. A., Stroup, W. W., Wolfinger, R. D. and Schabernberger, O. (2006). SAS for Mixed Model, 2nd ed. SAS Institute Inc., Cary, NC.
  • Liu, L. (2009). Joint modeling longitudinal semi-continuous data and survival, with application to longitudinal medical cost data. Stat. Med. 28 972–986.
  • Liu, L. and Huang, X. (2008). The use of Gaussian quadrature for estimation in frailty proportional hazards models. Stat. Med. 27 2665–2683.
  • Liu, Y. and Liu, L. (2015). Joint models for longitudinal data and time-to-event occurrence. In Routledge International Handbook of Advanced Quantitative Methods in Nursing Research (S. J. Henly, ed.) 253–263. Taylor and Francis, London.
  • Liu, L., Ma, J. Z. and Johnson, B. A. (2008). A multi-level two-part random effects model, with application to an alcohol-dependence study. Stat. Med. 27 3528–3539.
  • Liu, L., Wolfe, R. A. and Huang, X. (2004). Shared frailty models for recurrent events and a terminal event. Biometrics 60 747–756.
  • Liu, L., Wolfe, R. A. and Kalbfleisch, J. D. (2007). A shared random effects model for censored medical costs and mortality. Stat. Med. 26 139–155.
  • Liu, L., Conaway, M. R., Knaus, W. A. and Bergin, J. D. (2008). A random effects four-part model, with application to correlated medical costs. Comput. Statist. Data Anal. 52 4458–4473.
  • Liu, L., Strawderman, R. L., Cowen, M. E. and Shih, Y. C. T. (2010). A flexible two-part random effects model for correlated medical costs. J. Health Econ. 29 110–123.
  • Liu, L., Huang, X., Yaroshinsky, A. and Cormier, J. N. (2016a). Joint frailty models for zero-inflated recurrent events in the presence of a terminal event. Biometrics 72 204–214.
  • Liu, L., Strawderman, R. L., Johnson, B. A. and O’Quigley, J. M. (2016b). Analyzing repeated measures semi-continuous data, with application to an alcohol dependence study. Stat. Methods Med. Res. 25 133–152.
  • Lu, S.-E., Lin, Y. and Shih, W.-C. J. (2004). Analyzing excessive no changes in clinical trials with clustered data. Biometrics 60 257–267.
  • Mahmud, S., Lou, W. W. and Johnston, N. W. (2010). A probit- log- skew-normal mixture model for repeated measures data with excess zeros, with application to a cohort study of paediatric respiratory symptoms. BMC Med. Res. Methodol. 10 55.
  • Manning, W. G. (1998). The logged dependent variable, heteroscedasticity, and the retransformation problem. J. Health Econ. 17 283–295.
  • Manning, W. G., Basu, A. and Mullahy, J. (2005). Generalized modeling approaches to risk adjustment of skewed outcomes data. J. Health Econ. 20 465–488.
  • Manning, W. G., Duan, N. and Rogers, W. H. (1987). Monte-Carlo evidence on the choice between sample selection and 2-part models. J. Econometrics 35 59–82.
  • Manning, W. G. and Mullahy, J. (2001). Estimating log models: To transform or not to transform? J. Health Econ. 20 461–494.
  • Manning, W., Morris, C., Newhouse, J. et al. (1981). A two-part model of the demand for medical care: Preliminary results from the health insurance study. In Health, Economics, and Health Economics (J. van der Gaag and M. Perlman, eds.) 103–123. North-Holland, Amsterdam.
  • Martinussen, T. and Scheike, T. H. (2006). Dynamic Regression Models for Survival Data. Statistics for Biology and Health. Springer, New York.
  • McDavid, A., Finak, G., Chattopadyay, P. K., Dominguez, M., Lamoreaux, L., Ma, S. S., Roederer, M. and Gottardo, R. (2013). Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments. Bioinformatics 29 461–467.
  • Min, Y. and Agresti, A. (2005). Random effect models for repeated measures of zero-inflated count data. Stat. Model. 5 1–19.
  • Moulton, L. and Halsey, N. (1995). A mixture model with detection limits for regression analyses of antibody response to vaccine. Biometrics 51 1570–1578.
  • Mullahy, J. (1998). Much ado about two: Reconsidering retransformation and the two-part model in health econometrics. J. Health Econ. 17 247–281.
  • Neelon, B., O’Malley, A. J. and Normand, S.-L. T. (2011). A Bayesian two-part latent class model for longitudinal medical expenditure data: Assessing the impact of mental health and substance abuse parity. Biometrics 67 280–289.
  • Neelon, B., O’Malley, A. J. and Smith, V. A. (2016). Modeling zero-modified count and semicontinuous data in health services research part 1: Background and overview. Stat. Med. 35 5070–5093.
  • Neelon, B., Zhu, L. and Neelon, S. E. B. (2015). Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics 16 465–479.
  • Neelon, B., Chang, H. H., Ling, Q. and Hastings, N. S. (2016). Spatiotemporal hurdle models for zero-inflated count data: Exploring trends in emergency department visits. Stat. Methods Med. Res. 25 2558–2576.
  • Olsen, M. K. and Schafer, J. L. (2001). A two-part random-effects model for semicontinuous longitudinal data. J. Amer. Statist. Assoc. 96 730–745.
  • Othus, M., Barlogie, B., LeBlanc, M. L. and Crowley, J. J. (2012). Cure models as a useful statistical tool for analyzing survival. Clin. Cancer Res. 18 3731–3736.
  • Park, R. E. (1966). Estimation with heteroscedastic error terms. Econometrica 34 888.
  • Peng, Y. (2000). A nonparametric mixture model for cure rate estimation. Biometrics 56 237–243.
  • Peng, Y. (2003). Fitting semiparametric cure models. Comput. Statist. Data Anal. 41 481–490.
  • Peng, Y., Taylor, J. M. G. and Yu, B. (2007). A marginal regression model for multivariate failure time data with a surviving fraction. Lifetime Data Anal. 13 351–369.
  • Pullenayegum, E. M. and Willan, A. R. (2007). Semi-parametric regression models for cost-effectiveness analysis: Improving the efficiency of estimation from censored data. Stat. Med. 26 3274–3299.
  • Raudenbush, S. W., Yang, M.-L. and Yosef, M. (2000). Maximum likelihood for generalized linear models with nested random effects via high-order, multivariate Laplace approximation. J. Comput. Graph. Statist. 9 141–157.
  • Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive models for location, scale and shape. J. R. Stat. Soc. Ser. C. Appl. Stat. 54 507–554.
  • Robert, C. P. (2007). The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd ed. Springer Texts in Statistics. Springer, New York.
  • Rondeau, V., Schaffner, E., Corbière, F., Gonzalez, J. R. and Mathoulin-Pélissier, S. (2013). Cure frailty models for survival data: Application to recurrences for breast cancer and to hospital readmissions for colorectal cancer. Stat. Methods Med. Res. 22 243–260.
  • Schoenfeld, D. (1982). Partial residuals for the proportional hazards regression model. Biometrika 69 239–241.
  • Smith, V. A., Preisser, J. S., Neelon, B. and Maciejewski, M. L. (2014). A marginalized two-part model for semicontinuous data. Stat. Med. 33 4891–4903.
  • Smith, V. A., Neelon, B., Maciejewski, M. L. and Preisser, J. S. (2017a). Two parts are better than one. Health Serv. Outcomes Res. Methodol. 17 198–218.
  • Smith, V. A., Neelon, B., Preisser, J. S. and Maciejewski, M. L. (2017b). A marginalized two-part model for longitudinal semicontinuous data. Stat. Methods Med. Res. 26 1949–1968.
  • Sobell, L. C. and Sobell, M. B. (1992). Timeline follow-back: A technique for assessing self-reported alcohol consumption. In Measuring Alcohol Consumption: Psychosocial and Biochemical Methods (R. Z. Litten and J. P. Allen, eds.) 41–72. Humana Press Inc., Totowa, NJ.
  • Sposto, R. (2002). Cure model analysis in cancer: An application to data from the children’s cancer group. Stat. Med. 21 293–312.
  • Stram, D. O. and Lee, J. W. (1994). Variance components testing in the longitudinal mixed effects model. Biometrics 50 1171–1177.
  • Stukel, T. A., Lucas, F. L. and Wennberg, D. E. (2005). Long-term outcomes of regional variations in intensity of invasive vs medical management of medicare patients with acute myocardial infarction. J. Am. Med. Assoc. 293 1329–1337.
  • Stukel, T. A., Fisher, E. S., Wennberg, D. E., Alter, D. A., Gottlieb, D. J. and Vermeulen, M. J. (2007). Analysis of observational studies in the presence of treatment selection bias effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. J. Am. Med. Assoc. 297 278–285.
  • Su, L., Tom, B. D. M. and Farewell, V. T. (2009). Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics 10 374–389.
  • Su, X., Wijayasinghe, C. S., Fan, J. and Zhang, Y. (2016). Sparse estimation of Cox proportional hazards models via approximated information criteria. Biometrics 72 751–759.
  • Sy, J. P. and Taylor, J. M. G. (2000). Estimation in a Cox proportional hazards cure model. Biometrics 56 227–236.
  • Therneau, T. M. and Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Statistics for Biology and Health. Springer, New York.
  • Tian, L., Zucker, D. and Wei, L. J. (2005). On the Cox model with time-varying regression coefficients. J. Amer. Statist. Assoc. 100 172–183.
  • Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica 26 24–36.
  • Tooze, J. A., Grunwald, G. K. and Jones, R. H. (2002). Analysis of repeated measures data with clumping at zero. Stat. Methods Med. Res. 11 341–355.
  • Tooze, J. A., Midthune, D., Dodd, K. W., Freedman, L. S., Krebs-Smith, S. M., Subar, A. F., Guenther, P. M., Carroll, R. J. and Kipnis, V. (2006). A new statistical method for estimating the usual intake of episodically consumed foods with application to their distribution. J. Am. Diet. Assoc. 106 1575–1587.
  • Tsiatis, A. A. and Davidian, M. (2004). Joint modeling of longitudinal and time-to-event data: An overview. Statist. Sinica 14 809–834.
  • Twisk, J. and Rijmen, F. (2009). Longitudinal tobit regression: A new approach to analyze outcome variables with floor or ceiling effects. J. Clin. Epidemiol. 62 953–958.
  • Tyler, A. D., Smith, M. I. and Silverberg, M. S. (2014). Analyzing the human microbiome: A how to guide for physicians. Am. J. Gastroenterol. 109 983–993.
  • Vonesh, E. F., Greene, T. and Schluchter, M. D. (2006). Shared parameter models for the joint analysis of longitudinal data and event times. Stat. Med. 25 143–163.
  • Vuong, Q. H. (1989). Likelihood ratio tests for model selection and nonnested hypotheses. Econometrica 57 307–333.
  • Wang, M.-C., Qin, J. and Chiang, C.-T. (2001). Analyzing recurrent event data with informative censoring. J. Amer. Statist. Assoc. 96 1057–1065.
  • Williamson, J. M., Datta, S. and Satten, G. A. (2003). Marginal analyses of clustered data when cluster size is informative. Biometrics 59 36–42.
  • Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, MA.
  • Wulfsohn, M. S. and Tsiatis, A. A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics 53 330–339.
  • Xie, H., McHugo, G., Sengupta, A., Clark, R. and Drake, R. (2004). A method for analyzing long longitudinal outcomes with many zeros. Ment. Health Serv. Res. 6 239–246.
  • Yabroff, K. R., Warren, J. L., Schrag, D., Mariotto, A., Meekins, A., Topor, M. and Brown, M. L. (2009). Comparison of approaches for estimating incidence costs of care for colorectal cancer patients. Med. Care 47 S56–S63.
  • Yamaguchi, K. (1992). Accelerated failure-time regression models with a regression model of surviving fraction: An application to the analysis of “Permanent Employment” in Japan. J. Amer. Statist. Assoc. 87 284–292.
  • Yu, Z., Liu, L., Bravata, D. M., Williams, L. S. and Tepper, R. S. (2013). A semiparametric recurrent events model with time-varying coefficients. Stat. Med. 32 1016–1026.
  • Zhang, M., Strawderman, R. L., Cowen, M. E. and Wells, M. T. (2006). Bayesian inference for a two-part hierarchical model: An application to profiling providers in managed health care. J. Amer. Statist. Assoc. 101 934–945.
  • Zhou, X. H. and Tu, W. (1999). Comparison of several independent population means when their samples contain log-normal and possibly zero observations. Biometrics 55 645–651.

Supplemental materials

  • Supplement to “Statistical Analysis of Zero- Inflated Nonnegative Continuous Data: A Review”. Data and programming codes are available at