The Annals of Statistics
- Ann. Statist.
- Volume 39, Number 6 (2011), 3092-3120.
Regularization for Cox’s proportional hazards model with NP-dimensionality
High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of nonconcave penalized methods for nonpolynomial (NP) dimensional data with censoring in the framework of Cox’s proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that nonconcave penalties lead to significant reduction of the “irrepresentable condition” needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the nonconcave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.
Ann. Statist., Volume 39, Number 6 (2011), 3092-3120.
First available in Project Euclid: 27 January 2012
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Bradic, Jelena; Fan, Jianqing; Jiang, Jiancheng. Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann. Statist. 39 (2011), no. 6, 3092--3120. doi:10.1214/11-AOS911. https://projecteuclid.org/euclid.aos/1327672847
- Supplementary material: Supplementary material for “Regularization for Cox’s proportional hazards model with NP-dimensionality”. In the Supplementary Material [Bradic, Fan and Jiang (2011)] we give additional results of our simulation study, we specify the statements and detailed proofs of technical Lemmas 2.1–2.3 and give complete proofs of Theorems 2.1, 4.1, 4.4–4.6. We present the details of the ICA algorithm of the Section 5 together with new simulation settings were we increased the censoring rate and/or increased the number of significant variables s, and with discussion on the relative estimation efficiency of the penalized methods. We develop results on the growth of the L_2 norm of the score vector and of the matrix . Moreover we establish a result on the asymptotic behavior of vector when s=o(n^1/3) diverging with n. The main tools used are the theory of martingales [Fleming and Harrington (1991)] and the results of various matrix norms of Lemmas 4.1, 4.2 and 2.1–2.3.