The Annals of Statistics

Case-control survival analysis with a general semiparametric shared frailty model: A pseudo full likelihood approach

Malka Gorfine, David M. Zucker, and Li Hsu

Full-text: Open access


In this work we deal with correlated failure time (age at onset) data arising from population-based, case-control studies, where case and control probands are selected by population-based sampling and an array of risk factor measures is collected for both cases and controls and their relatives. Parameters of interest are effects of risk factors on the failure time hazard function and within-family dependencies among failure times after adjusting for the risk factors. Due to the retrospective sampling scheme, large sample theory for existing methods has not been established. We develop a novel technique for estimating the parameters of interest under a general semiparametric shared frailty model. We also present a simple, easily computed, and noniterative nonparametric estimator for the cumulative baseline hazard function. We provide rigorous large sample theory for the proposed method. We also present simulation results and a real data example for illustrating the utility of the proposed method.

Article information

Ann. Statist. Volume 37, Number 3 (2009), 1489-1517.

First available in Project Euclid: 10 April 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62N01: Censored data models 62N02: Estimation 62H12: Estimation

Case-control study correlated failure times family study frailty model multivariate survival model


Gorfine, Malka; Zucker, David M.; Hsu, Li. Case-control survival analysis with a general semiparametric shared frailty model: A pseudo full likelihood approach. Ann. Statist. 37 (2009), no. 3, 1489--1517. doi:10.1214/08-AOS615.

Export citation


  • [1] Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. Ann. Statist. 10 1100–1120.
  • [2] Aalen, O. (1976). Nonparametric inference in connection with multiple decrement models. Scand. J. Statist. 3 15–27.
  • [3] Aalen, O. (1978). Nonparametric inference for a family of counting processes. Ann. Statist. 6 701–726.
  • [4] Breslow, N. E. and Day, N. E. (1980). Statistical Methods in Cancer Research: Vol. 1: The Analysis of Case-Control Studies. IARC Scientific Publication, Lyon, France.
  • [5] Chen, M. C. and Bandeen–Roche, K. (2005). A diagnostic for association in bivariate survival models. Lifetime Data Anal. 11 245–264.
  • [6] Clayton, D. G. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65 141–151.
  • [7] Coditz, G. A., Rosner, B. A. and Speizer, F. E. (1996). Risk factors for breast cancer according to family history of breast cancer. For the Nurses’ Health Study Research Group. Journal of National Cancer Institute 88 365–371.
  • [8] Foutz, R. V. (1977). On the unique consistent solution to the likelihood equation. J. Amer. Statist. Assoc. 72 147–148.
  • [9] Gill, R. D. (1985). Discussion of the paper by D. Clayton and J. Cuzick. J. Roy. Statist. Soc. Ser. A 148 108–109.
  • [10] Gill, R. D. (1989). Non- and semi-parametric maximum likelihood estimators and the von Mises method. I. Scand. J. Statist. 16 97–128.
  • [11] Gill, R. D. (1992). Marginal partial likelihood. Scand. J. Statist. 19 133–137.
  • [12] Glidden, D. V. (1999). Checking the adequacy of the gamma frailty model for multivariate failure times. Biometrika 86 381–393.
  • [13] Glidden, D. V. (2007). Pairwise dependence diagnostics for clustered failure time data. Biometrika 94 371–385.
  • [14] Gorfine, M., Zucker, D. M. and Hsu, L. (2006). Prospective survival analysis with a general semiparametric shared frailty model: A pseudo full likelihood approach. Biometrika 93 735–741.
  • [15] Hartman, P. (1973). Ordinary Differential Equations, 2nd ed. Birkhäuser, Boston, MA.
  • [16] Hopper, J. L. (2003). Commentary: Case-control-family design: A paradigm for future epidemiology research? International Journal of Epidemiology 32 48–50.
  • [17] Hougaard, P. (2000). Analysis of Multivariate Survival Data. Springer, New York.
  • [18] Hsu, L., Chen, L., Gorfine, M. and Malone, K. (2004). Semiparametric estimation of marginal hazard function from case-control family studies. Biometrics 60 936–944.
  • [19] Hsu, L. and Gorfine, M. (2006). Multivariate survival analysis for case-control family data. Biostatistics 7 387–398.
  • [20] Hsu, L., Gorfine, M. and Malone, K. (2007). On robustness of marginal regression coefficient estimates and hazard functions in multivariate survival analysis of family data when the frailty distribution is misspecified. Stat. Med. 26 4657–4678.
  • [21] Keiding, N. and Gill, R. (1990). Random truncation models and Markov processes. Ann. Statist. 18 582–602.
  • [22] Kosorok, M. R., Lee, B. L. and Fine, J. P. (2004). Robust inference for univariate proportional hazards frailty regression models. Ann. Statist. 32 1448–1491.
  • [23] Malone, K. E., Daling, J. R., Thompson, J. D., Cecilia, A. O. Francisco, L. V. and Ostrander E. A. (1998). BRCA1 mutations and breast cancer in the general population. Journal of the American Medical Association 279 922–929.
  • [24] Malone, K. E., Daling, J. R., Neal, C., Suter, N. M., O’brien, C., Cushing-Haugen, K., Jonasdottir, T. J., Thompson, J. D. and Ostrander E. A. (2000). Frequency of BRCA1/BRCA2 mutations in a population-based sample of young breast carcinoma cases. Cancer 88 1393–1402.
  • [25] Malone, K. M., Daling, J. R., Doody, D. R., Hsu, L., Bernstein, L., Coates, R. J., Marchbanks, P. A., Simon, M. S., McDonald, J. A., Norman, S. A., Strom, B. L., Burkman, R. T., Ursin, G., Deapen, D., Weiss, L. K., Folger, S., Madeoy, J. J., Friedrichsen, D. M., Suter, N. M., Humphrey, M. C., Spirtas, R. and Ostrander, E. A. (2006). Prevalence and predictors of BRCA1 and BRCA2 mutations in a population-based study of breast cancer in white and black American women aged 35–64 years. Cancer Research 16 8297–8308.
  • [26] Nielsen, G. G., Gill, R. D., Andersen, P. K. and Sørensen, T. I. A. (1992). A counting process approach to maximum likelihood estimation of frailty models. Scand. J. Statist. 19 25–43.
  • [27] Oakes, D. (1989). Bivariate survival models induced by frailties. J. Amer. Statist. Assoc. 84 487–493.
  • [28] Parner, E. (1998). Asymptotic theory for the correlated gamma-frailty model. Ann. Statist. 26 183–214.
  • [29] Prentice, R. L. and Breslow, N. E. (1978). Retrospective studies and failure time models. Biometrika 65 153–158.
  • [30] Shih, J. H. (1998). A goodness-of-fit test for association in a bivariate survival model. Biometrika 85 189–200.
  • [31] Shih, J. H. and Chatterjee, N. (2002). Analysis of survival data from case-control family studies. Biometrics 58 502–509.
  • [32] Shih, J. H. and Louis, T. A. (1995). Inference on the association parameter in copula models for bivariate survival data. Biometrics 51 1384–1399.
  • [33] Viswanathan, B. and Manatunga, A. K. (2001). Diagnostic plots for assessing the frailty distribution in multivariate survival data. Lifetime Data Anal. 7 143–155.
  • [34] Zeger, S., Liang, K.-Y. and Albert, P. S. (1988). Models for longitudinal data: A generalized estimating equation approach. Biometrics 44 1049–1060.
  • [35] Zucker, D. M. (2005). A pseudo-partial likelihood method for semiparametric survival regression with covariate errors. J. Amer. Statist. Assoc. 100 1264–1277.
  • [36] Zucker, D. M., Gorfine, M. and Hsu, L. (2008). Pseudo full likelihood estimation for prospective survival analysis with a general semiparametric shared frailty model: Asymptotic theory. J. Statist. Plann. Inference 138 1998–2016.