The Annals of Applied Statistics

Random survival forests

Hemant Ishwaran, Udaya B. Kogalur, Eugene H. Blackstone, and Michael S. Lauer

Full-text: Open access

Abstract

We introduce random survival forests, a random forests method for the analysis of right-censored survival data. New survival splitting rules for growing survival trees are introduced, as is a new missing data algorithm for imputing missing data. A conservation-of-events principle for survival forests is introduced and used to define ensemble mortality, a simple interpretable measure of mortality that can be used as a predicted outcome. Several illustrative examples are given, including a case study of the prognostic implications of body mass for individuals with coronary artery disease. Computations for all examples were implemented using the freely available R-software package, randomSurvivalForest.

Article information

Source
Ann. Appl. Stat., Volume 2, Number 3 (2008), 841-860.

Dates
First available in Project Euclid: 13 October 2008

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1223908043

Digital Object Identifier
doi:10.1214/08-AOAS169

Mathematical Reviews number (MathSciNet)
MR2516796

Zentralblatt MATH identifier
1149.62331

Keywords
Conservation of events cumulative hazard function ensemble out-of-bag prediction error survival tree

Citation

Ishwaran, Hemant; Kogalur, Udaya B.; Blackstone, Eugene H.; Lauer, Michael S. Random survival forests. Ann. Appl. Stat. 2 (2008), no. 3, 841--860. doi:10.1214/08-AOAS169. https://projecteuclid.org/euclid.aoas/1223908043


Export citation

References

  • Adams, K. F., Schatzkin, A., Harris, T. B. et al. (2006). Overweight, obesity, and mortality in a large prospective cohort of persons 50 to 71 years old. N. Engl. J. Med. 355 763–778.
  • Breiman, L. (1996). Bagging predictors. Machine Learning 26 123–140.
  • Breiman, L. (2001). Random forests. Machine Learning 45 5–32.
  • Breiman, L. (2002). Software for the masses. Slides presented at the Wald Lectures, Meeting of the Institute of Mathematical Statistics, Banff, Canada. Available at http://www.stat.berkeley.edu/users/breiman.
  • Breiman, L. (2003). Manual—setting up, using and understanding random forests V4.0. Available at ftp://ftp.stat.berkeley.edu/pub/users/breiman/Using_random_forests_v4.0.pdf.
  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, California.
  • Cortes, C. and Vapnik, V. N. (1995). Support-vector networks. Machine Learning 20 273–297.
  • Eagle, K. A., Guyton, R. A., Davidoff, R. et al. (2004). ACC/AHA 2004 guideline update for coronary artery bypass graft surgery: Summary article. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines (Committee to Update the 1999 Guidelines for Coronary Artery Bypass Graft Surgery). J. Am. Coll. Cardiol. 44 213–310.
  • Flegal, K. M., Graubard, B. I., Williamson, D. F. and Gail, M. H. (2005). Excess deaths associated with underweight, overweight and obesity. J. Amer. Med. Assoc. 293 1861–1867.
  • Flegal, K. M., Graubard, B. I., Williamson, D. F. and Gail, M. H. (2007). Cause-specific excess deaths associated with underweight, overweight and obesity. J. Amer. Med. Assoc. 298 2028–2037.
  • Fontaine, K. R., Redden, D. T., Wang, C., Westfall, A. O. and Allison, D. B. (2003). Years of life lost due to obesity. J. Amer. Med. Assoc. 289 187–193.
  • Fleming, T. and Harrington, D. (1991). Counting Processes and Survival Analysis. Wiley, New York.
  • Harrell, F., Califf, R., Pryor, D., Lee, K. and Rosati, R. (1982). Evaluating the yield of medical tests. J. Amer. Med. Assoc. 247 2543–2546.
  • Heagerty, P. J. and Zheng, Y. (2005). Survival model predictive accurracy and ROC curves. Biometrics 61 92–105.
  • Hothorn, T. and Lausen, B. (2003). On the exact distribution of maximally selected rank statistics. Comput. Statist. Data Anal. 43 121–137.
  • Hothorn, T., Buhlmann, P., Dudoit, S., Molinaro, A. and van der Laan, M. J. (2006). Survival ensembles. Biostat. 7 355–373.
  • Ishwaran, H. (2007). Variable importance in binary regression trees and forests. Electron. J. Statist. 1 519–537.
  • Ishwaran, H., Blackstone, E. H., Pothier, C. and Lauer, M. S. (2004). Relative risk forests for exercise heart rate recovery as a predictor of mortality. J. Amer. Statist. Assoc. 99 591–600.
  • Ishwaran, H. and Kogalur, U. B. (2007). Random survival forests for R. Rnews 7 25–31.
  • Ishwaran, H. and Kogalur, U. B. (2008). RandomSurvivalForest 3.2.2. R package. Available at http://cran.r-project.org.
  • Ishwaran, H., Blackstone, E. H., Apperson, C. A. and Rice, T. W. A novel data-driven approach to stage grouping of esophageal cancer. Cleveland Clinic technical report.
  • Kalbfleisch, J. and Prentice, R. (1980). The Statistical Analysis of Failure Time Data. Wiley, New York.
  • Kattan, M. (2003). Comparison of Cox regression with other methods for determining prediction models and nomograms. J. Urol. 170 S6–S10.
  • LeBlanc, M. and Crowley, J. (1992). Relative risk trees for censored survival data. Biometrics 48 411–425.
  • LeBlanc, M. and Crowley, J. (1993). Survival trees by goodness of split. J. Amer. Statist. Assoc. 88 457–467.
  • Liaw, A. and Wiener, M. (2002). Classification and regression by randomForest. Rnews 2/3 18–22.
  • Liaw, A. and Wiener, M. (2007). RandomForest 4.5-18. R package. Available at http://cran.r-project.org.
  • Molinaro, A. M., Dudoit, S. and van der Laan, M. J. (2004). Tree-based multivariate regression and density estimation with right-censored data. J. Multivariate Anal. 90 154–177.
  • Mokdad, A. H, Ford, E. S., Bowman, B. A. et al. (2003). Prevalence of obesity, diabetes and obesity-related health risk factors. J. Amer. Med. Assoc. 289 76–79.
  • Naftel, D., Blackstone, E. H. and Turner, M. (1985). Conservation of events. Unpublished notes.
  • Olshansky, S. J., Passaro, D. J., Hershow, R. C. et al. (2005). A potential decline in life expectancy in the United States in the 21st century. N. Engl. J. Med. 352 1138–1145.
  • Puskas, J. D., Williams, W. H., Mahoney, E. M. et al. (2004). Off-pump vs conventional coronary artery bypass grafting: Early and 1-year graft patency, cost and quality-of-life outcomes: A randomized trial. J. Amer. Med. Assoc. 291 1841–1849.
  • Rossi, P. H., Berk, R. A. and Lenihan, K. J. (1980). Money, Work and Crime: Some Experimental Results. Academic Press, New York.
  • Schapire, R., Freund, Y., Bartlett, P. and Lee, W. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. Ann. Statist. 26 1651–1686.
  • Segal, M. R. (1988). Regression trees for censored data. Biometrics 44 35–47.
  • Uretsky, S., Messerli, F. H., Bangalore, S. et al. (2007). Obesity paradox in patients with hypertension and coronary artery disease. Am. J. Med. 120 863–870.
  • Yusuf, S., Zucker, D., Peduzzi, P. et al. (1994). Effect of coronary artery bypass graft surgery on survival: Overview of 10-year results from randomised trials by the Coronary Artery Bypass Graft Surgery Trialists Collaboration. Lancet 344 563–570.