The Annals of Statistics

Robust inference for univariate proportional hazards frailty regression models

Michael R. Kosorok, Bee Leng Lee, and Jason P. Fine

Full-text: Open access


We consider a class of semiparametric regression models which are one-parameter extensions of the Cox [J. Roy. Statist. Soc. Ser. B 34 (1972) 187–220] model for right-censored univariate failure times. These models assume that the hazard given the covariates and a random frailty unique to each individual has the proportional hazards form multiplied by the frailty. The frailty is assumed to have mean 1 within a known one-parameter family of distributions. Inference is based on a nonparametric likelihood. The behavior of the likelihood maximizer is studied under general conditions where the fitted model may be misspecified. The joint estimator of the regression and frailty parameters as well as the baseline hazard is shown to be uniformly consistent for the pseudo-value maximizing the asymptotic limit of the likelihood. Appropriately standardized, the estimator converges weakly to a Gaussian process. When the model is correctly specified, the procedure is semiparametric efficient, achieving the semiparametric information bound for all parameter components. It is also proved that the bootstrap gives valid inferences for all parameters, even under misspecification. We demonstrate analytically the importance of the robust inference in several examples. In a randomized clinical trial, a valid test of the treatment effect is possible when other prognostic factors and the frailty distribution are both misspecified. Under certain conditions on the covariates, the ratios of the regression parameters are still identifiable. The practical utility of the procedure is illustrated on a non-Hodgkin’s lymphoma dataset.

Article information

Ann. Statist., Volume 32, Number 4 (2004), 1448-1491.

First available in Project Euclid: 4 August 2004

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62N01: Censored data models 60F05: Central limit and other weak theorems
Secondary: 62B10: Information-theoretic topics [See also 94A17] 62F40: Bootstrap, jackknife and other resampling methods

Empirical process implied parameter Laplace transform misspecification nonparametric maximum likelihood semiparametric information bound unobservable heterogeneity


Kosorok, Michael R.; Lee, Bee Leng; Fine, Jason P. Robust inference for univariate proportional hazards frailty regression models. Ann. Statist. 32 (2004), no. 4, 1448--1491. doi:10.1214/009053604000000535.

Export citation


  • Aalen, O. O. (1978). Nonparametric inference for a family of counting processes. Ann. Statist. 6 701–726.
  • Aalen, O. O. (1980). A model for nonparametric regression analysis of counting processes. In Mathematical Statistics and Probability Theory. Lecture Notes in Statist. 2 1–25. Springer, New York.
  • Aalen, O. O. (1988). Heterogeneity in survival analysis. Statistics in Medicine 7 1121–1137.
  • Andersen, P. K. and Gill, R. D. (1982). Cox's regression model for counting processes: A large sample study. Ann. Statist. 10 1100–1120.
  • Bagdonavičius, V. B. and Nikulin, M. S. (1999). Generalized proportional hazards models based on modified partial likelihood. Lifetime Data Anal. 5 329–350.
  • Bennett, S. (1983). Analysis of survival data by the proportional odds model. Statistics in Medicine 2 273–277.
  • Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press, Baltimore, MD.\goodbreak
  • Brillinger, D. R. (1983). A generalized linear model with “Gaussian” regression variables. In A Festschrift for Erich L. Lehmann 97–114. Wadsworth, Belmont, CA.
  • Cheng, S. C., Wei, L. J. and Ying, Z. (1995). Analysis of transformation models with censored data. Biometrika 82 835–845.
  • Cheng, S. C., Wei, L. J. and Ying, Z. (1997). Predicting survival probabilities with semiparametric transformation models. J. Amer. Statist. Assoc. 92 227–235.
  • Cook, R. D. and Nachtsheim, C. J. (1994). Reweighting to achieve elliptically contoured covariates in regression. J. Amer. Statist. Assoc. 89 592–599.
  • Cox, D. R. (1972). Regression models and life-tables (with discussion). J. Roy. Statist. Soc. Ser. B 34 187–220.
  • Dabrowska, D. M. and Doksum, K. A. (1988). Estimation and testing in a two-sample generalized odds-rate model. J. Amer. Statist. Assoc. 83 744–749.
  • Elbers, C. and Ridder, G. (1982). True and spurious duration dependence: The identifiability of the proportional hazard model. Rev. Econom. Stud. 49 403–410.
  • Fine, J. P., Ying, Z. and Wei, L. J. (1998). On the linear transformation model with censored data. Biometrika 85 980–986.
  • Gray, R. J. (2000). Estimation of regression parameters and the hazard function in transformed linear survival models. Biometrics 56 571–576.
  • Heckman, J. J. and Singer, B. (1984). The identifiability of the proportional hazard model. Rev. Econom. Stud. 51 231–243.
  • Heckman, J. J. and Taber, C. R. (1994). Econometric mixture models and more general models for unobservables in duration analysis. Statistical Methods in Medical Research 3 279–299.
  • Hougaard, P. (1984). Life table methods for heterogeneous populations: Distributions describing the heterogeneity. Biometrika 71 75–83.
  • Hougaard, P. (1986). Survival models for heterogeneous populations derived from stable distributions. Biometrika 73 387–396.
  • Hougaard, P. (2000). Analysis of Multivariate Survival Data. Springer, New York.
  • Hsieh, F. (2001). On heteroscedastic hazards regression models: Theory and application. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 63–79.
  • Kong, F. H. and Slud, E. (1997). Robust covariate-adjusted logrank tests. Biometrika 84 847–862.
  • Kortram, R. A., van Rooij, A. C. M., Lenstra, A. J. and Ridder, G. (1995). Constructive identification of the mixed proportional hazards model. Statist. Neerlandica 49 269–281.
  • Lee, S. and Klein, J. P. (1988). Bivariate models with a random environmental factor. Industrial J. Productivity, Reliability, and Quality Control 13 1–18.
  • Li, K.-C. and Duan, N. (1989). Regression analysis under link violation. Ann. Statist. 17 1009–1052.
  • Lin, D. Y. and Wei, L. J. (1989). The robust inference for the Cox proportional hazards model. J. Amer. Statist. Assoc. 84 1074–1078.
  • Lin, D. Y. and Ying, Z. (1994). Semiparametric analysis of the additive risk model. Biometrika 81 61–71.
  • Lindley, D. V. and Singpurwalla, N. A. (1986). Multivariate distributions for the life lengths of components of a system sharing a common environment. J. Appl. Probab. 23 418–431.
  • McGilchrist, C. A. and Aisbett, C. W. (1991). Regression with frailty in survival analysis. Biometrics 47 461–466.
  • Murphy, S. A. (1994). Consistency in a proportional hazards model incorporating a random effect. Ann. Statist. 22 712–731.
  • Murphy, S. A. (1995). Asymptotic theory for the frailty model. Ann. Statist. 23 182–198.
  • Murphy, S. A., Rossini, A. J. and van der Vaart, A. W. (1997). Maximum likelihood estimation in the proportional odds model. J. Amer. Statist. Assoc. 92 968–976.
  • Newton, M. A. and Raftery, A. E. (1994). Approximate Bayesian inference with the weighted likelihood bootstrap (with discussion). J. Roy. Statist. Soc. Ser. B 56 3–48.
  • Nielsen, G. G., Gill, R. D., Andersen, P. K. and Sørensen, T. I. A. (1992). A counting process approach to maximum likelihood estimation in frailty models. Scand. J. Statist. 19 25–44.
  • Non-Hodgkin's Lymphoma Prognostic Factors Project (1993). A prediction model for aggressive non-Hodgkin's lymphoma: The international non-Hodgkin's lymphoma prognostic factors project. New England J. Medicine 329 987–994.
  • Parner, E. (1998). Asymptotic theory for the correlated gamma-frailty model. Ann. Statist. 26 183–214.
  • Pettitt, A. N. (1982). Inference for the linear model using a likelihood based on ranks. J. Roy. Statist. Soc. Ser. B 44 234–243.
  • Pettitt, A. N. (1984). Proportional odds models for survival data and estimates using ranks. Appl. Statist. 33 169–175.
  • Sargent, D. J. (1997). A flexible approach to time-varying coefficients in the Cox regression setting. Lifetime Data Anal. 3 13–25.
  • Sasieni, P. (1992). Information bounds for the conditional hazard ratio in a nested family of regression models. J. Roy. Statist. Soc. Ser. B 54 617–635.
  • Sasieni, P. (1993). Some new estimators for Cox regression. Ann. Statist. 21 1721–1759.
  • Scharfstein, D. O., Tsiatis, A. A. and Gilbert, P. B. (1998). Semiparametric efficient estimation in the generalized odds-rate class of regression models for right-censored time-to-event data. Lifetime Data Anal. 4 355–391.
  • Shen, X. (1998). Proportional odds regression and sieve maximum likelihood estimation. Biometrika 85 165–177.
  • Slud, E. V. and Vonta, F. (2004). Consistency of the NPML estimator in the right-censored transformation model. Scand. J. Statist. 31 21–41.
  • Struthers, C. A. and Kalbfleisch, J. D. (1986). Misspecified proportional hazard models. Biometrika 73 363–369.
  • Tsiatis, A. A. (1990). Estimating regression parameters using linear rank tests for censored data. Ann. Statist. 18 354–372.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
  • Wei, L. J., Ying, Z. and Lin, D. Y. (1990). Linear regression analysis of censored survival data based on rank tests. Biometrika 77 845–851.
  • White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1–25.