The Annals of Statistics

Calibrating nonconvex penalized regression in ultra-high dimension

Lan Wang, Yongdai Kim, and Runze Li

Full-text: Open access


We investigate high-dimensional nonconvex penalized regression, where the number of covariates may grow at an exponential rate. Although recent asymptotic theory established that there exists a local minimum possessing the oracle property under general conditions, it is still largely an open problem how to identify the oracle estimator among potentially multiple local minima. There are two main obstacles: (1) due to the presence of multiple minima, the solution path is nonunique and is not guaranteed to contain the oracle estimator; (2) even if a solution path is known to contain the oracle estimator, the optimal tuning parameter depends on many unknown factors and is hard to estimate. To address these two challenging issues, we first prove that an easy-to-calculate calibrated CCCP algorithm produces a consistent solution path which contains the oracle estimator with probability approaching one. Furthermore, we propose a high-dimensional BIC criterion and show that it can be applied to the solution path to select the optimal tuning parameter which asymptotically identifies the oracle estimator. The theory for a general class of nonconvex penalties in the ultra-high dimensional setup is established when the random errors follow the sub-Gaussian distribution. Monte Carlo studies confirm that the calibrated CCCP algorithm combined with the proposed high-dimensional BIC has desirable performance in identifying the underlying sparsity pattern for high-dimensional data analysis.

Article information

Ann. Statist., Volume 41, Number 5 (2013), 2505-2536.

First available in Project Euclid: 5 November 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J05: Linear regression
Secondary: 62J07: Ridge regression; shrinkage estimators

High-dimensional regression LASSO MCP SCAD variable selection penalized least squares


Wang, Lan; Kim, Yongdai; Li, Runze. Calibrating nonconvex penalized regression in ultra-high dimension. Ann. Statist. 41 (2013), no. 5, 2505--2536. doi:10.1214/13-AOS1159.

Export citation


  • Bertsekas, D. P. (1999). Nonlinear Programming, 2nd ed. Athena Scientific, Belmont, MA.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
  • Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 759–771.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467–5484.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometric regression tools (with discussion). Technometrics 35 109–148.
  • Gao, B.-B., Phipps, J. A., Bursell, D., Clermont, A. C. and Feener, E. P. (2009). Angiotensin AT1 receptor antagonism ameliorates murine retinal proteome changes induced by diabetes. J. Proteome Res. 8 5541–5549.
  • Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statist. Sinica 18 1603–1618.
  • Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617–1642.
  • Kim, Y., Choi, H. and Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc. 103 1665–1673.
  • Kim, Y. and Kwon, S. (2012). Global optimality of nonconvex penalized estimators. Biometrika 99 315–325.
  • Kim, Y., Kwon, S. and Choi, H. (2012). Consistent model selection criteria on high dimensions. J. Mach. Learn. Res. 13 1037–1057.
  • Kwon, S. and Kim, Y. (2012). Large sample properties of the SCAD-penalized maximum likelihood estimation on high dimensions. Statist. Sinica 22 629–653.
  • Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
  • Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • Mikosch, T. (1990). Estimates for tail probabilities of quadratic and bilinear forms in sub-Gaussian random variables with applications to the law of the iterated logarithm. Probab. Math. Statist. 11 169–178.
  • Paunel-Görgülü, A. N., Franke, A. G., Paulsen, F. P. and Dünker, N. (2011). Trefoil factor family peptide 2 acts pro-proliferative and pro-apoptotic in the murine retina. Histochem. Cell Biol. 135 461–473.
  • Scheetz, T. E., Kim, K. Y. A., Swiderski, R. E., Philp, A. R., Braun, T. A., Knudtson, K. L., Dorrance, A. M., DiBona, G. F., Huang, J., Casavant, T. L., Sheffield, V. C. and Stone, E. M. (2006). Regulation of gene expression in the mammalian eye and its relevance to eye disease. Proc. Natl. Acad. Sci. USA 103 14429–14434.
  • Tao, P. D. and An, L. T. H. (1997). Convex analysis approach to d.c. programming: Theory, algorithms and applications. Acta Math. Vietnam. 22 289–355.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • van de Geer, S. A. (2008). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
  • van de Geer, S., Bühlmann, P. and Zhou, S. (2011). The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. J. Stat. 5 688–749.
  • Wang, L., Kim, Y. and Li, R. (2013). Supplement to “Calibrating nonconvex penalized regression in ultra-high dimension.” DOI:10.1214/13-AOS1159SUPP.
  • Wang, H., Li, R. and Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94 553–568.
  • Wang, H., Li, B. and Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 671–683.
  • Yuille, A. L. and Rangarajan, A. (2003). The concave–convex procedure. Neural Comput. 15 915–936.
  • Zhang, C.-H. (2010a). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zhang, T. (2010b). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107.
  • Zhang, T. (2013). Multi-stage convex relaxation for feature selection. Bernoulli. To appear.
  • Zhang, C. H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional regression. Ann. Statist. 36 156–594.
  • Zhang, Y., Li, R. and Tsai, C.-L. (2010). Regularization parameter selections via generalized information criterion. J. Amer. Statist. Assoc. 105 312–323.
  • Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27 576–593.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
  • Zhou, S. H. (2010). Thresholded Lasso for high dimensional variable selection and statistical estimation. Available at arXiv:1002.1583.
  • Zhou, S. H., van de Geer, S. A. and Bühlmann, P. (2009). Adaptive Lasso for high dimensional regression and Gaussian graphical modeling. Available at arXiv:0903.2515.
  • Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.

Supplemental materials

  • Supplementary material: Supplement to “Calibrating nonconvex penalized regression in ultra-high dimension”. This supplemental material includes the proofs of Lemmas 3.1 and 6.1, and some additional numerical results.