Statistical Science

Longitudinal Data with Follow-up Truncated by Death: Match the Analysis Method to Research Aims

Brenda F. Kurland, Laura L. Johnson, Brian L. Egleston, and Paula H. Diehr

Full-text: Open access


Diverse analysis approaches have been proposed to distinguish data missing due to death from nonresponse, and to summarize trajectories of longitudinal data truncated by death. We demonstrate how these analysis approaches arise from factorizations of the distribution of longitudinal data and survival information. Models are illustrated using cognitive functioning data for older adults. For unconditional models, deaths do not occur, deaths are independent of the longitudinal response, or the unconditional longitudinal response is averaged over the survival distribution. Unconditional models, such as random effects models fit to unbalanced data, may implicitly impute data beyond the time of death. Fully conditional models stratify the longitudinal response trajectory by time of death. Fully conditional models are effective for describing individual trajectories, in terms of either aging (age, or years from baseline) or dying (years from death). Causal models (principal stratification) as currently applied are fully conditional models, since group differences at one timepoint are described for a cohort that will survive past a later timepoint. Partly conditional models summarize the longitudinal response in the dynamic cohort of survivors. Partly conditional models are serial cross-sectional snapshots of the response, reflecting the average response in survivors at a given timepoint rather than individual trajectories. Joint models of survival and longitudinal response describe the evolving health status of the entire cohort. Researchers using longitudinal data should consider which method of accommodating deaths is consistent with research aims, and use analysis methods accordingly.

Article information

Statist. Sci., Volume 24, Number 2 (2009), 211-222.

First available in Project Euclid: 14 January 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Censoring generalized estimating equations longitudinal data missing data quality of life random effects models truncation by death


Kurland, Brenda F.; Johnson, Laura L.; Egleston, Brian L.; Diehr, Paula H. Longitudinal Data with Follow-up Truncated by Death: Match the Analysis Method to Research Aims. Statist. Sci. 24 (2009), no. 2, 211--222. doi:10.1214/09-STS293.

Export citation


  • Burke, G. L., Arnold, A. M., Bild, D. E., Cushman, M., Fried, L. P., Newman, A., Nunn, C. and Robbins, J. (2001). Factors associated with healthy aging: the cardiovascular health study. Journal of the American Geriatrics Society 49 254–262.
  • De Gruttola, V. and Tu, X. M. (1994). Modelling progression of CD4-lymphocyte count and its relationship to survival time. Biometrics 50 1003–1014.
  • Diehr, P., Patrick, D., Hedrick, S., Rothman, M., Grembowski, D., Raghunathan, T. E. and Beresford, S. (1995). Including deaths when measuring health status over time. Medical Care 33 AS164–AS172.
  • Diehr, P., Patrick, D. L., Bild, D. E., Burke, G. L. and Williamson, J. D. (1998). Predicting future years of healthy life for older adults. Journal of Clinical Epidemiology 51 343–353.
  • Diehr, P., Patrick, D. L., Spertus, J., Kiefe, C. I., McDonell, M. and Fihn, S. D. (2001a). Transforming self-rated health and the SF-36 scales to include death and improve interpretability. Medical Care 39 670–680.
  • Diehr, P., Williamson, J., Patrick, D. L., Bild, D. E. and Burke, G. L. (2001b). Patterns of self-rated health in older adults before and after sentinel health events. Journal of the American Geriatrics Society 49 36–44.
  • Diehr, P., Williamson, J., Burke, G. L. and Psaty, B. M. (2002). The aging and dying processes and the health of older adults. Journal of Clinical Epidemiology 55 269–278.
  • Dufouil, C., Brayne, C. and Clayton, D. (2004). Analysis of longitudinal studies with death and drop-out: A case study. Stat. Med. 23 2215–2226.
  • Egleston, B. L., Scharfstein, D. O., Freeman, E. E. and West, S. K. (2007). Causal inference for non-mortality outcomes in the presence of death. Biostatistics 8 526–545.
  • Egleston, B. L., Scharfstein, D. O. and MacKenzie, E. (2009). On estimation of the survivor average causal effect in observational studies when important confounders are missing due to death. Biometrics 65 497–504.
  • Elliott, M. R., Joffe, M. M. and Chen, Z. (2006). A potential outcomes approach to developmental toxicity analyses. Biometrics 62 352–360.
  • Fitzmaurice, G. M. and Laird, N. M. (2000). Generalized linear mixture models for handling nonignorable dropouts in longitudinal studies. Biostatistics 1 141–156.
  • Frangakis, C. E. and Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics 58 21–29.
  • Frangakis, C. E., Rubin, D. B., An, M. W. and MacKenzie, E. (2007). Principal stratification designs to estimate input data missing due to death. Biometrics 63 641–649; discussion 650–662.
  • Fried, L. P., Borhani, N. O., Enright, P., Furberg, C. D., Gardin, J. M., Kronmal, R. A., Kuller, L. H., Manolio, T. A., Mittelmark, M. B., Newman, A., O’Leary, D. H., Psaty, B., Rautaharju, P., Tracy, R. P., Weiler, P. G. and research group (CHS) (1991). The cardiovascular health study: Design and rationale. Annals of Epidemiology 1 263–276.
  • Gray, S. M. and Brookmeyer, R. (2000). Multidimensional longitudinal data: estimating a treatment effect from continuous, discrete, or time-to-event response variables. J. Amer. Statist. Assoc. 95 396–406.
  • Harel, O., Hofer, S. M., Hoffman, L., Pedersen, N. L. and Johansson, B. (2007). Population inference with mortality and attrition in longitudinal studies on aging: A two-stage multiple imputation method. Experimental Aging Research 33 187–203.
  • Hayden, D., Pauler, D. K. and Schoenfeld, D. (2005). An estimator for treatment comparisons among survivors in randomized trials. Biometrics 61 305–310.
  • Heagerty, P. J. and Zeger, S. L. (2000). Marginalized multilevel models and likelihood inference (with discussion). Statist. Sci. 15 1–26.
  • Holland, P. W. (1986). Statistics and causal inference (C/R: P961–P970). J. Amer. Statist. Assoc. 81 945–960.
  • Joffe, M. M., Small, D. and Hsu, C. Y. (2007). Defining and estimating intervention effects for groups that will develop an auxiliary outcome. Statist. Sci. 22 74–97.
  • Johnson, L. L. (2002). Incorporating death into the statistical analysis of categorical longitudinal health status data. Ph.D. thesis, Univ. Washington.
  • Kaplan, R. C., Tirschwell, D. L., Longstreth, W. T., J., Manolio, T. A., Heckbert, S. R., Lefkowitz, D., El-Saed, A. and Psaty, B. M. (2005). Vascular events, mortality, and preventive therapy following ischemic stroke in the elderly. Neurology 65 835–842.
  • Kurland, B. F. and Heagerty, P. J. (2004). Marginalized transition models for longitudinal binary data with ignorable and non-ignorable drop-out. Stat. Med. 23 2673–2695.
  • Kurland, B. F. and Heagerty, P. J. (2005). Directly parameterized regression conditioning on being alive: Analysis of longitudinal data truncated by deaths. Biostatistics 6 241–258.
  • Laird, N. M. (1988). Missing data in longitudinal studies. Stat. Med. 7 305–315.
  • Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963–974.
  • Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 7 13–22.
  • Little, R. J. A. (1995). Modeling the drop-out mechanism in repeated-measures studies. J. Amer. Statist. Assoc. 90 1112–1121.
  • Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data. Wiley, New York.
  • Neuhaus, J. M. and Kalbfleisch, J. D. (1998). Between- and within-cluster covariate effects in the analysis of clustered data. Biometrics 54 638–645.
  • Paik, M. C. (1997). The generalized estimating equation approach when data are not missing completely at random. J. Amer. Statist. Assoc. 92 1320–1329.
  • Pauler, D. K., McCoy, S. and Moinpour, C. (2003). Pattern mixture models for longitudinal quality of life studies in advanced stage disease. Stat. Med. 22 795–809.
  • Ratcliffe, S. J., Guo, W. and Ten Have, T. R. (2004). Joint modeling of longitudinal and survival data via a common frailty. Biometrics 60 892–899.
  • Ribaudo, H. J., Thompson, S. G. and Allen-Mersh, T. G. (2000). A joint analysis of quality of life and survival using a random effect selection model. Stat. Med. 19 3237–3250.
  • Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Amer. Statist. Assoc. 90 106–121.
  • Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects (with discussion). Statist. Sci. 6 15–51.
  • Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
  • Rubin, D. B. (2006). Causal inference through potential outcomes and principal stratification: Application to studies with “censoring” due to death. Statist. Sci. 21 299–309.
  • Siegler, I. C. (1975). The terminal drop hypothesis: Fact or artifact? Experimental Aging Research 1 169–185.
  • Teng, E. L. and Chui, H. C. (1987). The Modified Mini-Mental State (3MS) examination. Journal of Clinical Psychiatry 48 314–318.
  • Wilson, R. S., Beckett, L. A., Bienias, J. L., Evans, D. A. and Bennett, D. A. (2003). Terminal decline in cognitive function. Neurology 60 1782–1787.
  • Wulfsohn, M. S. and Tsiatis, A. A. (1997). A joint model for survival and longitudinal data measured with error. Biometrics 53 330–339.
  • Ye, W., Lin, X. and Taylor, J. M. G. (2008). Semiparametric modeling of longitudinal measurements and time-to-event data—a two-stage regression calibration approach. Biometrics 64 1238–1246.