The Annals of Applied Statistics

Modeling county level breast cancer survival data using a covariate-adjusted frailty proportional hazards model

Haiming Zhou, Timothy Hanson, Alejandro Jara, and Jiajia Zhang

Full-text: Open access


Understanding the factors that explain differences in survival times is an important issue for establishing policies to improve national health systems. Motivated by breast cancer data arising from the Surveillance Epidemiology and End Results program, we propose a covariate-adjusted proportional hazards frailty model for the analysis of clustered right-censored data. Rather than incorporating exchangeable frailties in the linear predictor of commonly-used survival models, we allow the frailty distribution to flexibly change with both continuous and categorical cluster-level covariates and model them using a dependent Bayesian nonparametric model. The resulting process is flexible and easy to fit using an existing R package. The application of the model to our motivating example showed that, contrary to intuition, those diagnosed during a period of time in the 1990s in more rural and less affluent Iowan counties survived breast cancer better. Additional analyses showed the opposite trend for earlier time windows. We conjecture that this anomaly has to be due to increased hormone replacement therapy treatments prescribed to more urban and affluent subpopulations.

Article information

Ann. Appl. Stat., Volume 9, Number 1 (2015), 43-68.

First available in Project Euclid: 28 April 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Clustered time-to-event data proportional hazards model spatial tailfree process


Zhou, Haiming; Hanson, Timothy; Jara, Alejandro; Zhang, Jiajia. Modeling county level breast cancer survival data using a covariate-adjusted frailty proportional hazards model. Ann. Appl. Stat. 9 (2015), no. 1, 43--68. doi:10.1214/14-AOAS793.

Export citation


  • Chlebowski, R. T., Anderson, G. L., Gass, M., Lane, D. S., Aragaki, A. K., Kuller, L. H., Manson, J. E., Stefanick, M. L., Ockene, J., Sarto, G. E. et alet al. (2010). Estrogen plus progestin and breast cancer incidence and mortality in postmenopausal women. JAMA 304 1684–1692.
  • Clayton, D. and Cuzick, J. (1985). Multivariate generalizations of the proportional hazards model. J. Roy. Statist. Soc. Ser. A 148 82–117.
  • Cottone, F. (2008). Covariate dependent random effects in survival analysis, Ph.D. dissertation, Univ. degli Studi “Roma Tre”.
  • Geisser, S. and Eddy, W. F. (1979). A predictive approach to model selection. J. Amer. Statist. Assoc. 74 153–160.
  • Gelfand, A. E. and Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. J. Roy. Statist. Soc. Ser. B 56 501–514.
  • Goodman, L. A. and Kruskal, W. H. (1954). Measures of association for cross classifications. J. Amer. Statist. Assoc. 49 732–764.
  • Gustafson, P. (1997). Large hierarchical Bayesian analysis of multivariate survival data. Biometrics 53 230–242.
  • Hanson, T. E. (2006a). Inference for mixtures of finite Polya tree models. J. Amer. Statist. Assoc. 101 1548–1565.
  • Hanson, T. E. (2006b). Modeling censored lifetime data using a mixture of gammas baseline. Bayesian Anal. 1 575–593 (electronic).
  • Hanson, T., Johnson, W. and Laud, P. (2009). Semiparametric inference for survival models with step process covariates. Canad. J. Statist. 37 60–79.
  • Hausauer, A. K., Keegan, T. H., Chang, E. T., Glaser, S. L., Howe, H. and Clarke, C. A. (2009). Recent trends in breast cancer incidence in US white women by county-level urban/rural and poverty status. BMC Medicine 7 31.
  • Hennerfeind, A., Brezger, A. and Fahrmeir, L. (2006). Geoadditive survival models. J. Amer. Statist. Assoc. 101 1065–1075.
  • Jara, A. and Hanson, T. E. (2011). A class of mixtures of dependent tail-free processes. Biometrika 98 553–566.
  • Jara, A., Hanson, T. E., Quintana, F. A., Müller, P. and Rosner, G. L. (2011). DPpackage: Bayesian semi- and nonparametric modeling in R. J. Stat. Softw. 40 1.
  • Kalbfleisch, J. D. (1978). Non-parametric Bayesian analysis of survival time data. J. Roy. Statist. Soc. Ser. B 40 214–221.
  • Krieger, N., Chen, J. T. and Waterman, P. D. (2010). Decline in US breast cancer rates after the women’s health initiative: Socioeconomic and racial/ethnic differentials. American Journal of Public Health 100 132–139.
  • Laird, N. and Olivier, D. (1981). Covariance analysis of censored survival data using log-linear analysis techniques. J. Amer. Statist. Assoc. 76 231–240.
  • Liu, D., Kalbfleisch, J. D. and Schaubel, D. E. (2011). A positive stable frailty model for clustered failure time data with covariate-dependent frailty. Biometrics 67 8–17.
  • McCulloch, C. E. and Neuhaus, J. M. (2011). Misspecifying the shape of a random effects distribution: Why getting it wrong may not matter. Statist. Sci. 26 388–402.
  • Noh, M., Ha, I. D. and Lee, Y. (2006). Dispersion frailty models and HGLMs. Stat. Med. 25 1341–1354.
  • Qiou, Z., Ravishanker, N. and Dey, D. K. (1999). Multivariate survival analysis with positive stable frailties. Biometrics 55 637–644.
  • Reich, B. J., Bondell, H. D. and Wang, H. J. (2010). Flexible Bayesian quantile regression for independent and clustered data. Biostatistics 11 337–352.
  • Rossouw, J. E., Anderson, G. L., Prentice, R. L., LaCroix, A. Z., Kooperberg, C., Stefanick, M. L., Jackson, R. D., Beresford, S. A. A., Howard, B. V., Johnson, K. C., Kotchen, J. M. and Ockene, J. (2002). Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the women’s health initiative randomized controlled trial. JAMA 288 321–333.
  • Sahu, S. K. and Dey, D. K. (2004). On a Bayesian multivariate survival model with a skewed frailty. In Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality (M. G. Genton, ed.) 321–338. Chapman&Hall/CRC, Boca Raton, FL.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583–639.
  • Sprague, B. L., Trentham-Dietz, A., Gangnon, R. E., Ramchandani, R., Hampton, J. M., Robert, S. A., Remington, P. L. and Newcomb, P. A. (2011). Socioeconomic status and survival after an invasive breast cancer diagnosis. Cancer 117 1542–1551.
  • Therneau, T. M. and Grambsch, P. M. (2000). Modeling Survival Data: Extending the Cox Model. Springer, New York.
  • Therneau, T. M., Grambsch, P. M. and Pankratz, V. S. (2003). Penalized survival models and frailty. J. Comput. Graph. Statist. 12 156–175.
  • Trippa, L., Müller, P. and Johnson, W. (2011). The multivariate beta process and an extension of the Polya tree model. Biometrika 98 17–34.
  • Vaupel, J. W., Manton, K. G. and Stallard, E. (1979). The impact of heterogeneity in individual frailty on the dynamics of mortality. Demography 16 439–454.
  • Walker, S. G. and Mallick, B. K. (1997). Hierarchical generalized linear models and frailty models with Bayesian nonparametric mixing. J. Roy. Statist. Soc. Ser. B 59 845–860.
  • Wang, Z. and Louis, T. A. (2004). Marginalized binary mixed-effects models with covariate-dependent random effects and likelihood inference. Biometrics 60 884–891.
  • Wassell, J. T. and Moeschberger, M. L. (1993). A bivariate survival model with modified gamma frailty for assessing the impact of interventions. Stat. Med. 12 241–248.
  • Wei, L. J., Lin, D. Y. and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Amer. Statist. Assoc. 84 1065–1073.
  • Wysowski, D. K. and Governale, L. A. (2005). Use of menopausal hormones in the United States, 1992 through June, 2003. Pharmacoepidemiology and Drug Safety 14 171–176.
  • Yashin, A. I. and Iachine, I. A. (1999). What difference does the dependence between durations make? Insights for population studies of aging. Lifetime Data Anal. 5 5–22.
  • Zhao, L. and Hanson, T. E. (2011). Spatially dependent Polya tree modeling for survival data. Biometrics 67 391–403.
  • Zhao, L., Hanson, T. E. and Carlin, B. P. (2009). Mixtures of Polya trees for flexible spatial frailty survival modelling. Biometrika 96 263–276.
  • Zhou, H., Hanson, T., Jara, A. and Zhang, J. (2015). Supplement to “Modeling county level breast cancer survival data using a covariate-adjusted frailty proportional hazards model.” DOI:10.1214/14-AOAS793SUPP.

Supplemental materials

  • Supplement to “Modeling county-level breast cancer survival data using a covariate-adjusted frailty proportional hazards model”.: In this online supplemental article we provide (A) technical details on the mixture of linear dependent tailfree processes, (B) a detailed description of the MCMC algorithm, (C) sample R code to analyze the SEER data, (D) additional simulation studies and (E) additional analysis of the SEER data.