Open Access
September 2011 Risk prediction for prostate cancer recurrence through regularized estimation with simultaneous adjustment for nonlinear clinical effects
Qi Long, Matthias Chung, Carlos S. Moreno, Brent A. Johnson
Ann. Appl. Stat. 5(3): 2003-2023 (September 2011). DOI: 10.1214/11-AOAS458


In biomedical studies it is of substantial interest to develop risk prediction scores using high-dimensional data such as gene expression data for clinical endpoints that are subject to censoring. In the presence of well-established clinical risk factors, investigators often prefer a procedure that also adjusts for these clinical variables. While accelerated failure time (AFT) models are a useful tool for the analysis of censored outcome data, it assumes that covariate effects on the logarithm of time-to-event are linear, which is often unrealistic in practice. We propose to build risk prediction scores through regularized rank estimation in partly linear AFT models, where high-dimensional data such as gene expression data are modeled linearly and important clinical variables are modeled nonlinearly using penalized regression splines. We show through simulation studies that our model has better operating characteristics compared to several existing models. In particular, we show that there is a nonnegligible effect on prediction as well as feature selection when nonlinear clinical effects are misspecified as linear. This work is motivated by a recent prostate cancer study, where investigators collected gene expression data along with established prognostic clinical variables and the primary endpoint is time to prostate cancer recurrence. We analyzed the prostate cancer data and evaluated prediction performance of several models based on the extended c statistic for censored data, showing that (1) the relationship between the clinical variable, prostate specific antigen, and the prostate cancer recurrence is likely nonlinear, that is, the time to recurrence decreases as PSA increases and it starts to level off when PSA becomes greater than 11; (2) correct specification of this nonlinear effect improves performance in prediction and feature selection; and (3) addition of gene expression data does not seem to further improve the performance of the resultant risk prediction scores.


Download Citation

Qi Long. Matthias Chung. Carlos S. Moreno. Brent A. Johnson. "Risk prediction for prostate cancer recurrence through regularized estimation with simultaneous adjustment for nonlinear clinical effects." Ann. Appl. Stat. 5 (3) 2003 - 2023, September 2011.


Published: September 2011
First available in Project Euclid: 13 October 2011

zbMATH: 1228.62142
Digital Object Identifier: 10.1214/11-AOAS458

Keywords: Accelerated failure time model , Feature selection , Lasso , partly linear model , penalized splines , rank estimation , risk prediction

Rights: Copyright © 2011 Institute of Mathematical Statistics

Vol.5 • No. 3 • September 2011
Back to Top