The Annals of Statistics

Semiparametric efficient estimation for shared-frailty models with doubly-censored clustered data

Yu-Ru Su and Jane-Ling Wang

Full-text: Open access


In this paper, we investigate frailty models for clustered survival data that are subject to both left- and right-censoring, termed “doubly-censored data”. This model extends current survival literature by broadening the application of frailty models from right-censoring to a more complicated situation with additional left-censoring.

Our approach is motivated by a recent Hepatitis B study where the sample consists of families. We adopt a likelihood approach that aims at the nonparametric maximum likelihood estimators (NPMLE). A new algorithm is proposed, which not only works well for clustered data but also improve over existing algorithm for independent and doubly-censored data, a special case when the frailty variable is a constant equal to one. This special case is well known to be a computational challenge due to the left-censoring feature of the data. The new algorithm not only resolves this challenge but also accommodates the additional frailty variable effectively.

Asymptotic properties of the NPMLE are established along with semi-parametric efficiency of the NPMLE for the finite-dimensional parameters. The consistency of Bootstrap estimators for the standard errors of the NPMLE is also discussed. We conducted some simulations to illustrate the numerical performance and robustness of the proposed algorithm, which is also applied to the Hepatitis B data.

Article information

Ann. Statist., Volume 44, Number 3 (2016), 1298-1331.

Received: September 2013
Revised: October 2015
First available in Project Euclid: 11 April 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62N02: Estimation
Secondary: 62E20: Asymptotic distribution theory

Frailty model semi-parametric efficiency EM algorithm Monte Carlo integrations


Su, Yu-Ru; Wang, Jane-Ling. Semiparametric efficient estimation for shared-frailty models with doubly-censored clustered data. Ann. Statist. 44 (2016), no. 3, 1298--1331. doi:10.1214/15-AOS1406.

Export citation


  • Booth, J. G. and Hobert, J. P. (1999). Maximizing generalized linear mixed model likelihoods with an automated Monte Carlo EM algorithm. J. R. Statist. Soc. B 61 265–285.
  • Caffo, B. S., Jank, W. and Jones, G. L. (2005). Ascent-based Monte Carlo expectation-maximization. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 235–251.
  • Cai, T. and Cheng, S. (2004). Semiparametric regression analysis for doubly censored data. Biometrika 91 277–290.
  • Chan, K. S. and Ledolter, J. (1995). Monte Carlo EM estimation for time series models involving counts. J. Amer. Statist. Assoc. 90 242–252.
  • Chang, M. N. (1990). Weak convergence of a self-consistent estimator of the survival function with doubly censored data. Ann. Statist. 18 391–404.
  • Chang, M. N. and Yang, G. L. (1987). Strong consistency of a nonparametric estimator of the survival function with doubly censored data. Ann. Statist. 15 1536–1547.
  • Cheng, G. (2015). Moment consistency of the exchangeably weighted bootstrap for semiparametric M-estimation. Scand. J. Stat. 42 665–684.
  • Cheng, G. and Huang, J. Z. (2010). Bootstrap consistency for general semiparametric M-estimation. Ann. Statist. 38 2884–2915.
  • Cox, D. R. (1972). Regression models and life-tables. J. Roy. Statist. Soc. Ser. B 34 187–220.
  • Cox, D. R. (1975). Partial likelihood. Biometrika 62 269–276.
  • De Gruttola, V. and Lagakos, S. W. (1989). Analysis of doubly-censored survival data, with application to AIDS. Biometrics 45 1–11.
  • Dupuy, J.-F., Grama, I. and Mesbah, M. (2006). Asymptotic theory for the Cox model with missing time-dependent covariate. Ann. Statist. 34 903–924.
  • Fort, G. and Moulines, E. (2003). Convergence of the Monte Carlo expectation maximization for curved exponential families. Ann. Statist. 31 1220–1259.
  • Kim, Y.-J. (2006). Regression analysis of doubly censored failure time data with frailty. Biometrics 62 458–464.
  • Kim, M. Y., De Gruttola, V. and Lagakos, S. W. (1993). Analyzing doubly censored data with covariates, with application to AIDS. Biometrics 49 13–22.
  • Kim, Y., Kim, B. and Jang, W. (2010). Asymptotic properties of the maximum likelihood estimator for the proportional hazards model with doubly censored data. J. Multivariate Anal. 101 1339–1351.
  • Kim, Y., Kim, J. and Jang, W. (2013). An EM algorithm for the proportional hazards model with doubly censored data. Comput. Statist. Data Anal. 57 41–51.
  • Murphy, S. A. (1994). Consistency in a proportional hazards model incorporating a random effect. Ann. Statist. 22 712–731.
  • Murphy, S. A. (1995). Asymptotic theory for the frailty model. Ann. Statist. 23 182–198.
  • Murphy, S. A., Rossini, A. J. and van der Vaart, A. W. (1997). Maximum likelihood estimation in the proportional odds model. J. Amer. Statist. Assoc. 92 968–976.
  • Mykland, P. A. and Ren, J.-J. (1996). Algorithms for computing self-consistent and maximum likelihood estimators with doubly censored data. Ann. Statist. 24 1740–1764.
  • Nielsen, G. G., Gill, R. D., Andersen, P. K. and Sørensen, T. I. A. (1992). A counting process approach to maximum likelihood estimation in frailty models. Scand. J. Stat. 19 25–43.
  • Parner, E. (1998). Asymptotic theory for the correlated gamma-frailty model. Ann. Statist. 26 183–214.
  • Ripatti, S. and Palmgren, J. (2000). Estimation of multivariate frailty models using penalized partial likelihood. Biometrics 56 1016–1022.
  • Su, Y.-R. (2011). Survival analysis for incomplete data. Ph.D. thesis, Univ. California, Davis.
  • Su, Y. and Wang, J. (2015). Supplement to “Semiparametric efficient estimation for shared-frailty models with doubly-censored clustered data.” DOI:10.1214/15-AOS1406SUPP.
  • Therneau, T. M., Grambsch, P. M. and Pankratz, V. S. (2003). Penalized survival models and frailty. J. Comput. Graph. Statist. 12 156–175.
  • Tseng, Y.-K., Hsieh, F. and Wang, J.-L. (2005). Joint modelling of accelerated failure time and longitudinal data. Biometrika 92 587–603.
  • Turnbull, B. W. (1974). Nonparametric estimation of a survivorship function with doubly censored data. J. Amer. Statist. Assoc. 69 169–173.
  • van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
  • Vaupel, J. W., Manton, K. G. and Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16 439–454.
  • Wu, J. F., Chen, C. C., Hsieh, R. P., Shih, H. H., Chen, Y. H., Li, C. R., Chiang, C. Y., Shau, W. Y., Ni, Y. H., Chen, H. L., Hsu, H. Y. and Chang, M. H. (2006). HLA typing associated with hepatitis B E antigen seroconversion in children with chronic hepatitis B virus infection: A long-term prospective sibling cohort study in Taiwan. J. Pediatr. 148 647–651.
  • Zeng, D. and Cai, J. (2005). Asymptotic results for maximum likelihood estimators in joint analysis of repeated measurements and survival time. Ann. Statist. 33 2132–2163.
  • Zhang, Y. and Jamshidian, M. (2004). On algorithms for the nonparametric maximum likelihood estimator of the failure function with censored data. J. Comput. Graph. Statist. 13 123–140.

Supplemental materials

  • Supplement to “Semiparametric efficient estimation for shared-frailty models with doubly-censored clustered data”. Owing to the space constraints, we present the proof of Proposition 3.2 in the supplemental material [\inlinecite{SuWSupp15} ].