## Electronic Journal of Statistics

### Maximum likelihood estimation in the logistic regression model with a cure fraction

#### Abstract

Logistic regression is widely used in medical studies to investigate the relationship between a binary response variable Y and a set of potential predictors X. The binary response may represent, for example, the occurrence of some outcome of interest (Y=1 if the outcome occurred and Y=0 otherwise). In this paper, we consider the problem of estimating the logistic regression model with a cure fraction. A sample of observations is said to contain a cure fraction when a proportion of the study subjects (the so-called cured individuals, as opposed to the susceptibles) cannot experience the outcome of interest. One problem arising then is that it is usually unknown who are the cured and the susceptible subjects, unless the outcome of interest has been observed. In this setting, a logistic regression analysis of the relationship between X and Y among the susceptibles is no more straightforward. We develop a maximum likelihood estimation procedure for this problem, based on the joint modeling of the binary response of interest and the cure status. We investigate the identifiability of the resulting model. Then, we establish the consistency and asymptotic normality of the proposed estimator, and we conduct a simulation study to investigate its finite-sample behavior.

#### Article information

Source
Electron. J. Statist., Volume 5 (2011), 460-483.

Dates
First available in Project Euclid: 23 May 2011

https://projecteuclid.org/euclid.ejs/1306175113

Digital Object Identifier
doi:10.1214/11-EJS616

Mathematical Reviews number (MathSciNet)
MR2802052

Zentralblatt MATH identifier
1274.62480

Subjects
Primary: 62J12: Generalized linear models
Secondary: 62F12: Asymptotic properties of estimators

#### Citation

Diop, Aba; Diop, Aliou; Dupuy, Jean-François. Maximum likelihood estimation in the logistic regression model with a cure fraction. Electron. J. Statist. 5 (2011), 460--483. doi:10.1214/11-EJS616. https://projecteuclid.org/euclid.ejs/1306175113

#### References

• [1] Czado, C. and Santner, T. J. (1992). The effect of link misspecification on binary regression inference., Journal of Statistical Planning and Inference 33, 213–231.
• [2] Dietz, E. and Böhning, D. (2000). On estimation of the Poisson parameter in zero-modified Poisson models., Computational Statistics & Data Analysis 34, 441–459.
• [3] Dussart, P., Baril, L., Petit, L., Beniguel, L., Quang, L. C., Ly, S., do Socorro Azevedo, R., Meynard, J.-B., Vong, S., Chartier, L., Diop, A., Sivuth, O., Duong, V., Thang, C. M., Jacobs, M., Sakuntabhai, A., Texeira Nunes, M. R., Que Huong, V. T., Buchy, P. and Vasconcelos, P. F. (2011). Study of dengue cases and the members of their households: a familial cluster analysis in the multinational DENFRAME project., Submitted.
• [4] Eicker, F. (1966). A multivariate central limit theorem for random linear vector forms., Annals of Mathematical Statistics 37, 1825–1828.
• [5] Fahrmeir, L. and Kaufmann, H. (1985). Consistency and asymptotic normality of the maximum likelihood estimator in generalized linear models., The Annals of Statistics 13, 342–368.
• [6] Famoye, F. and Singh, K. P. (2006). Zero-inflated generalized Poisson regression model with an application to domestic violence data., Journal of Data Science 4, 117–130.
• [7] Fang, H.-B., Li, G. and Sun, J. (2005). Maximum likelihood estimation in a semiparametric logistic/proportional-hazards mixture model., Scandinavian Journal of Statistics 32, 59–75.
• [8] Follmann, D. A. and Lambert, D. (1991). Identifiability of finite mixtures of logistic regression models., Journal of Statistical Planning and Inference 27, 375–381.
• [9] Gouriéroux, C. and Monfort, A. (1981). Asymptotic properties of the maximum likelihood estimator in dichotomous logit models., Journal of Econometrics 17, 83–97.
• [10] Guyon, X. (2001)., Statistique et économétrie - Du modèle linéaire aux modèles non-linéaires. Ellipses Marketing.
• [11] Hall, D. B. (2000). Zero-inflated Poisson and binomial regression with random effects: a case study., Biometrics 56, 1030–1039.
• [12] Hilbe, J. M. (2009)., Logistic regression models. Chapman & Hall: Boca Raton.
• [13] Hosmer, D.W. and Lemeshow, S. (2000)., Applied logistic regression. Wiley: New York.
• [14] Huang, J., Ma, S. and Zhang C. H. (2008). The iterated lasso for high-dimensional logistic regression., Technical report No. 392, The University of Iowa.
• [15] Kelley, M. E. and Anderson, S. J. (2008). Zero inflation in ordinal data: incorporating susceptibility to response through the use of a mixture model., Statistics in Medicine 27, 3674–3688.
• [16] Lam, K. F., Xue, H. and Cheung, Y. B. (2006). Semiparametric analysis of zero-inflated count data., Biometrics 62, 996–1003.
• [17] Lambert, D. (1992). Zero-inflated Poisson regression, with an application to defects in manufacturing., Technometrics 34, 1–14.
• [18] Lee, A. H., Wang, K., Scott, J. A., Yau, K. K. W. and McLachlan, G. J. (2006). Multi-level zero-inflated Poisson regression modelling of correlated count data with excess zeros., Statistical Methods in Medical Research 15, 47–61.
• [19] Lu, W. (2008). Maximum likelihood estimation in the proportional hazards cure model., Annals of the Institute of Statistical Mathematics 60, 545–574.
• [20] Lu, W. (2010). Efficient estimation for an accelerated failure time model with a cure fraction., Statistica Sinica 20, 661–674.
• [21] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression., Journal of the Royal Statistical Society. Series B 70, 53–71.
• [22] Ridout, M., Hinde, J. and Demétrio, C. G. B. (2001). A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives., Biometrics 57, 219–223.
• [23] Xiang, L., Lee, A. H., Yau, K. K. W. and McLachlan, G. J. (2007). A score test for overdispersion in zero-inflated Poisson mixed regression model., Statistics in Medicine 26, 1608–1622.