The Annals of Statistics

Regularization for Cox’s proportional hazards model with NP-dimensionality

Jelena Bradic, Jianqing Fan, and Jiancheng Jiang

Full-text: Open access

Abstract

High throughput genetic sequencing arrays with thousands of measurements per sample and a great amount of related censored clinical data have increased demanding need for better measurement specific model selection. In this paper we establish strong oracle properties of nonconcave penalized methods for nonpolynomial (NP) dimensional data with censoring in the framework of Cox’s proportional hazards model. A class of folded-concave penalties are employed and both LASSO and SCAD are discussed specifically. We unveil the question under which dimensionality and correlation restrictions can an oracle estimator be constructed and grasped. It is demonstrated that nonconcave penalties lead to significant reduction of the “irrepresentable condition” needed for LASSO model selection consistency. The large deviation result for martingales, bearing interests of its own, is developed for characterizing the strong oracle property. Moreover, the nonconcave regularized estimator, is shown to achieve asymptotically the information bound of the oracle estimator. A coordinate-wise algorithm is developed for finding the grid of solution paths for penalized hazard regression problems, and its performance is evaluated on simulated and gene association study examples.

Article information

Source
Ann. Statist., Volume 39, Number 6 (2011), 3092-3120.

Dates
First available in Project Euclid: 27 January 2012

Permanent link to this document
https://projecteuclid.org/euclid.aos/1327672847

Digital Object Identifier
doi:10.1214/11-AOS911

Mathematical Reviews number (MathSciNet)
MR3012402

Zentralblatt MATH identifier
1246.62202

Subjects
Primary: 62N02: Estimation 60G44: Martingales with continuous parameter
Secondary: 62F12: Asymptotic properties of estimators 60F10: Large deviations

Keywords
Hazard rate LASSO SCAD large deviation oracle

Citation

Bradic, Jelena; Fan, Jianqing; Jiang, Jiancheng. Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann. Statist. 39 (2011), no. 6, 3092--3120. doi:10.1214/11-AOS911. https://projecteuclid.org/euclid.aos/1327672847


Export citation

References

  • Andersen, P. K. and Gill, R. D. (1982). Cox’s regression model for counting processes: A large sample study. Ann. Statist. 10 1100–1120.
  • Bertsekas, D. P. (2003). Nonlinear programming. Athena Scientific, Nashua, NH.
  • Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bradic, J., Fan, J. and Wang, W. (2011). Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 325–349.
  • Bradic, J., Fan, J. and Jiang, J. (2011). Supplement to “Regularization for Cox’s proportional hazards model with NP-dimensionality.” DOI:10.1214/11-AOS911SUPP.
  • Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194 (electronic).
  • Cai, J., Fan, J., Li, R. and Zhou, H. (2005). Variable selection for multivariate failure time data. Biometrika 92 303–316.
  • Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • Daubechies, I., Defrise, M. and De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57 1413–1457.
  • Dave, S. S. et al. (2004). Prediction of survival in follicular lymphoma based on molecular features of tumor-infiltrating immune cells. N. Engl. J. Med. 351 2159–2169.
  • de la Peña, V. H. (1999). A general class of exponential inequalities for martingales and ratios. Ann. Probab. 27 537–564.
  • Du, P., Ma, S. and Liang, H. (2010). Penalized variable selection procedure for Cox models with semiparametric relative risk. Ann. Statist. 38 2092–2117.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Li, R. (2002). Variable selection for Cox’s proportional hazards model and frailty model. Ann. Statist. 30 74–99.
  • Fan, J. and Lv, J. (2011). Non-concave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467–5484.
  • Fleming, T. R. and Harrington, D. P. (1991). Counting Processes and Survival Analysis. Wiley, New York.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1–22.
  • Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Stat. 1 302–332.
  • Johnson, B. A. (2009). On lasso for censored data. Electron. J. Stat. 3 485–506.
  • Juditsky, A. B. and Nemirovski, A. S. (2011). Large deviations of vector-valued martingales in 2-smooth normed spaces. Ann. Appl. Probab. To appear. Available at arXiv:0809.0813.
  • Kim, Y., Choi, H. and Oh, H. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc. 103 1656–1673.
  • Koltchinskii, V. (2009). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799–828.
  • Lv, J. and Fan, Y. (2009). A unified approach to model selection and sparse recovery using regularized least squares. Ann. Statist. 37 3498–3528.
  • Massart, P. and Meynet, C. (2010). An l1 oracle inequality for the LASSO. Available at arXiv:1007.4791.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tibshirani, R. (1997). The LASSO method for variable selection in the Cox model. Stat. Med. 16 385–395.
  • van de Geer, S. (1995). Exponential inequalities for martingales, with application to maximum likelihood estimation for counting processes. Ann. Statist. 23 1779–1801.
  • van de Geer, S. and Bühlmann, P. (2009). On conditions used to prove oracle results for the LASSO. Electron. J. Stat. 3 1360–1392.
  • Wang, S., Nan, B., Zhou, N. and Zhu, J. (2009). Hierarchically penalized Cox regression with grouped variables. Biometrika 96 307–322.
  • Wu, T. T. and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 224–244.
  • Yuan, M. and Lin, Y. (2007). On the non-negative garrote estimator. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 143–161.
  • Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.

Supplemental materials

  • Supplementary material: Supplementary material for “Regularization for Cox’s proportional hazards model with NP-dimensionality”. In the Supplementary Material [Bradic, Fan and Jiang (2011)] we give additional results of our simulation study, we specify the statements and detailed proofs of technical Lemmas 2.1–2.3 and give complete proofs of Theorems 2.1, 4.1, 4.4–4.6. We present the details of the ICA algorithm of the Section 5 together with new simulation settings were we increased the censoring rate and/or increased the number of significant variables s, and with discussion on the relative estimation efficiency of the penalized methods. We develop results on the growth of the L_2 norm of the score vector and of the matrix . Moreover we establish a result on the asymptotic behavior of vector when s=o(n^1/3) diverging with n. The main tools used are the theory of martingales [Fleming and Harrington (1991)] and the results of various matrix norms of Lemmas 4.1, 4.2 and 2.1–2.3.