Electronic Journal of Statistics

Tree-based censored regression with applications in insurance

Olivier Lopez, Xavier Milhaud, and Pierre-E. Thérond

Full-text: Open access


We propose a regression tree procedure to estimate the conditional distribution of a variable which is not directly observed due to censoring. The model that we consider is motivated by applications in insurance, including the analysis of guarantees that involve durations, and claim reserving. We derive consistency results for our procedure, and for the selection of an optimal subtree using a pruning strategy. These theoretical results are supported by a simulation study, and two applications involving insurance datasets. The first concerns income protection insurance, while the second deals with reserving in third-party liability insurance.

Article information

Electron. J. Statist. Volume 10, Number 2 (2016), 2685-2716.

Received: October 2015
First available in Project Euclid: 12 September 2016

Permanent link to this document

Digital Object Identifier

Primary: 62N01: Censored data models 62N02: Estimation 62G08: Nonparametric regression
Secondary: 91B30: Risk theory, insurance 97M30: Financial and insurance mathematics

Survival analysis censoring regression tree model selection insurance


Lopez, Olivier; Milhaud, Xavier; Thérond, Pierre-E. Tree-based censored regression with applications in insurance. Electron. J. Statist. 10 (2016), no. 2, 2685--2716. doi:10.1214/16-EJS1189. https://projecteuclid.org/euclid.ejs/1473685451

Export citation


  • Bacchetti, P. and Segal, M. R. (1995). Survival trees with time-dependent covariates: application to estimating changes in the incubation period of, AIDS.Lifetime Data Analysis135–47.
  • Beran, R. (1981). Nonparametric regression with randomly censored survival data, Technical Report, University of California, Berkeley.
  • Bitouzé, D., Laurent, B. and Massart, P. (1999). A Dvoretzky-Kiefer-Wolfowitz type inequality for the Kaplan-Meier, estimator.Ann. Inst. H. Poincaré Probab. Statist.35735–763.
  • Bou-Hamad, I., Larocque, D. and Ben-Ameur, H. (2011). A review of survival, trees.Statistics Surveys544–71.
  • Breiman, L., Friedman, J., Olshen, R. A. and Stone, C. J., (1984).Classification and Regression Trees. Chapman and Hall.
  • Chaudhuri, P. (2000). Asymptotic consistency of median regression, trees.JSPI91229–238.
  • Chaudhuri, P. and Loh, W.-Y. (2002). Nonparametric estimation of conditional quantiles using quantile regression, trees.Bernoulli8561–576.
  • Ciampi, A., Negassa, A. and Lou, Z. (1995). Tree-structured prediction for censored survival data and the Cox, model.Journal of Clinical Epidemiology48675–689.
  • Dabrowska, D. M. (1989). Uniform consistency of the kernel conditional Kaplan-Meier, estimate.Ann. Statist.171157–1167.
  • Dudley, R. M., (1999).Uniform Central Limit Theorems. Cambridge Studies in Advanced Mathematics.
  • Dudoit, S., van Der Laan, M. J., Keles, S., Molinaro, A., Sinisi, S. E. and Teng, S. L. (2003). Loss-based estimation with cross-validation: Applications to microarray data analysis and motif, finding.
  • Einmahl, U. and Mason, D. M. (2000). An empirical process approach to the uniform consistency of kernel-type function, estimators.J. Theoret. Probab.131–37.
  • Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function, estimators.Ann. Statist.331380–1403.
  • Fan, J., Nunn, M. E. and Su, X. (2009). Multivariate exponential survival trees and their application to tooth, prognosis.CSDA531110–1121.
  • Gannoun, A., Saracco, J., Yuan, A. and Bonney, G. E. (2005). Non-parametric quantile regression with censored, data.Scand. J. Statist.32527–550.
  • Gao, F., Manatunga, A. K. and Chen, S. (2004). Identification of prognostic factors with multivariate survival, data.CSDA45813–824.
  • Gey, S. and Nedelec, E. (2005). Model selection for CART regression, trees.IEEE Transactions on Information Theory51658–670.
  • Heagerty, P. J., Lumley, T. and Pepe, M. S. (2000). Time-Dependent ROC Curves for Censored SurvivalData and a Diagnostic, Marker.Biometrics56337-344.
  • Heagerty, P. J. and Zheng, Y. (2005). Survival Model Predictive Accuracy and ROC, Curves.Biometrics6192-105.
  • Heuchenne, C. and Van Keilegom, I. (2010a). Estimation in nonparametric location-scale regression models with censored, data.Ann. Inst. Statist. Math.62439–463.
  • Heuchenne, C. and Van Keilegom, I. (2010b). Goodness-of-fit tests for the error distribution in nonparametric, regression.Comput. Statist. Data Anal.541942–1951.
  • Hothorn, T., Bühlmann, P., Dudoit, S., Molinaro, A. andVan Der Laan, M. J. (2006). Survival, ensembles.Biostatistics7355-373.
  • Kaplan, E. L. and Meier, P. (1958). Nonparametric estimation from incomplete, observations.J. Amer. Statist. Assoc.53457–481.
  • Lopez, O. (2011). Nonparametric estimation of the multivariate distribution function in a censored regression model with, applications.Communications in Statistics: Theory and Methods402639–2660.
  • Lopez, O., Patilea, V. and Van Keilegom, I. (2013). Single index regression models in the presence of censoring depending on the, covariates.Bernoulli19721–747.
  • Meinshausen, N. (2009). Forest, garrote.Electronic Journal of Statistics31288–1304.
  • Molinaro, A. M., Dudoit, S. and van der Laan, M. J. (2004). Tree-based multivariate regression and density estimation with right-censored, data.JMVA90154–177.
  • Olbricht, W. (2012). Tree-based methods: a useful tool for life, insurance.European Actuarial Journal2129–147.
  • Sánchez Sellero, C., González Manteiga, W. and Van Keilegom, I. (2005). Uniform representation of product-limit integrals with, applications.Scand. J. Statist.32563–581.
  • Satten, G. A. and Datta, S. (2001). The Kaplan-Meier estimator as an inverse-probability-of-censoring weighted, average.Amer. Statist.55207–210.
  • Stute, W. (1993). Consistent estimation under random censorship when covariables are, present.J. Multivariate Anal.4589–103.
  • Stute, W. (1999). Nonlinear censored, regression.Statist. Sinica91089–1102.
  • Stute, W. and Wang, J. L. (1993). The strong law under random, censorship.Ann. Statist.211591–1607.
  • Talagrand, M. (1994). Sharper bounds for Gaussian and empirical, processes.Ann. Probab.2228–76.
  • van Der Laan, M. J. and Dudoit, S. (2003). Unified Cross-Validation Methodology for Selection Among Estimators and a General Cross-Validated Adaptive Epsilon-Net Estimator: Finite Sample Oracle Inequalities and, Examples.
  • van Der Laan, M. J., Dudoit, S. and van der Vaart, A. W. (2006). The cross-validated adaptive epsilon-net, estimator.Statistics and Decisions24373–395.
  • van der Laan, M. J. and Robins, J. M., (2003).Unified Methods for Censored Longitudinal Data and Causality.Springer Series in Statistics. Springer-Verlag, New York.
  • van der Vaart, A. W., (1998).Asymptotic Statistics.Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge.
  • van der Vaart, A. W. and Wellner, J. A., (1996).Weak Convergence and Empirical Processes with Applications to Statistics.Springer Series in Statistics. Springer-Verlag, New York.
  • Van Keilegom, I. and Akritas, M. G. (1999). Transfer of tail information in censored regression, models.Ann. Statist.271745–1784.
  • Wang, H. J. and Wang, L. (2009). Locally weighted censored quantile, regression.JASA1041117–1128.
  • Wey, A., Wang, L. and Rudser, K. (2014). Censored quantile regression with recursive partitioning based, weights.Biostatistics15170–181.