The Annals of Statistics

Strong oracle optimality of folded concave penalized estimation

Jianqing Fan, Lingzhou Xue, and Hui Zou

Full-text: Open access


Folded concave penalization methods have been shown to enjoy the strong oracle property for high-dimensional sparse estimation. However, a folded concave penalization problem usually has multiple local solutions and the oracle property is established only for one of the unknown local solutions. A challenging fundamental issue still remains that it is not clear whether the local optimum computed by a given optimization algorithm possesses those nice theoretical properties. To close this important theoretical gap in over a decade, we provide a unified theory to show explicitly how to obtain the oracle solution via the local linear approximation algorithm. For a folded concave penalized estimation problem, we show that as long as the problem is localizable and the oracle estimator is well behaved, we can obtain the oracle estimator by using the one-step local linear approximation. In addition, once the oracle estimator is obtained, the local linear approximation algorithm converges, namely it produces the same estimator in the next iteration. The general theory is demonstrated by using four classical sparse estimation problems, that is, sparse linear regression, sparse logistic regression, sparse precision matrix estimation and sparse quantile regression.

Article information

Ann. Statist. Volume 42, Number 3 (2014), 819-849.

First available in Project Euclid: 20 May 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J07: Ridge regression; shrinkage estimators

Folded concave penalty local linear approximation nonconvex optimization oracle estimator sparse estimation strong oracle property


Fan, Jianqing; Xue, Lingzhou; Zou, Hui. Strong oracle optimality of folded concave penalized estimation. Ann. Statist. 42 (2014), no. 3, 819--849. doi:10.1214/13-AOS1198.

Export citation


  • Antoniadis, A. and Fan, J. (2001). Regularization of wavelet approximations. J. Amer. Statist. Assoc. 96 939–967.
  • Belloni, A. and Chernozhukov, V. (2011). $\ell_1$-penalized quantile regression in high-dimensional sparse models. Ann. Statist. 39 82–130.
  • Bickel, P. J. (1975). One-step Huber estimates in the linear model. J. Amer. Statist. Assoc. 70 428–434.
  • Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Bradic, J., Fan, J. and Jiang, J. (2011). Regularization for Cox’s proportional hazards model with NP-dimensionality. Ann. Statist. 39 3092–3120.
  • Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_1$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
  • Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Fan, J., Fan, Y. and Barut, E. (2014). Adaptive robust variable selection. Ann. Statist. To appear.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Lv, J. (2011). Nonconcave penalized likelihood with NP-dimensionality. IEEE Trans. Inform. Theory 57 5467–5484.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • Fan, J., Xue, L. and Zou, H. (2014). Supplement to “Strong oracle optimality of folded concave penalized estimation.” DOI:10.1214/13-AOS1198.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
  • Huang, J. and Zhang, C.-H. (2012). Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications. J. Mach. Learn. Res. 13 1839–1864.
  • Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist. 58 30–37.
  • Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617–1642.
  • Knight, K. (1998). Limiting distributions for $L_1$ regression estimators under general conditions. Ann. Statist. 26 755–770.
  • Koenker, R. (2005). Quantile Regression. Cambridge Univ. Press, Cambridge.
  • Koenker, R. and Bassett, G. Jr. (1978). Regression quantiles. Econometrica 46 33–50.
  • Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
  • Li, Y. and Zhu, J. (2008). $L_1$-norm quantile regression. J. Comput. Graph. Statist. 17 163–185.
  • Mazumder, R., Friedman, J. H. and Hastie, T. (2011). SparseNet: Coordinate descent with nonconvex penalties. J. Amer. Statist. Assoc. 106 1125–1138.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
  • Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
  • Saulis, L. and Statulevičius, V. A. (1991). Limit Theorems for Large Deviations. Kluwer Academic, Dordrecht.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
  • Wang, L., Wu, Y. and Li, R. (2012). Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Amer. Statist. Assoc. 107 214–222.
  • Wu, Y. and Liu, Y. (2009). Variable selection in quantile regression. Statist. Sinica 19 801–817.
  • Xue, L., Zou, H. and Cai, T. (2012). Nonconcave penalized composite conditional likelihood estimation of sparse Ising models. Ann. Statist. 40 1403–1429.
  • Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_q$ loss in $\ell_r$ balls. J. Mach. Learn. Res. 11 3519–3540.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • Zhang, T. (2009). Some sharp performance bounds for least squares regression with $L_1$ regularization. Ann. Statist. 37 2109–2144.
  • Zhang, C.-H. (2010a). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zhang, T. (2010b). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107.
  • Zhang, T. (2013). Multi-stage convex relaxation for feature selection. Bernoulli 19 2277–2293.
  • Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27 576–593.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.
  • Zou, H. and Yuan, M. (2008). Composite quantile regression and the oracle model selection theory. Ann. Statist. 36 1108–1126.

Supplemental materials

  • Supplementary material: Supplement to “Strong oracle optimality of folded concave penalized estimation”. In this supplementary note, we give the complete proof of Theorem 5 and some comments on the simulation studies.