The Annals of Statistics

One-step sparse estimates in nonconcave penalized likelihood models

Hui Zou and Runze Li

Full-text: Open access


Fan and Li propose a family of variable selection methods via penalized likelihood using concave penalty functions. The nonconcave penalized likelihood estimators enjoy the oracle properties, but maximizing the penalized likelihood function is computationally challenging, because the objective function is nondifferentiable and nonconcave. In this article, we propose a new unified algorithm based on the local linear approximation (LLA) for maximizing the penalized likelihood for a broad class of concave penalty functions. Convergence and other theoretical properties of the LLA algorithm are established. A distinguished feature of the LLA algorithm is that at each LLA step, the LLA estimator can naturally adopt a sparse representation. Thus, we suggest using the one-step LLA estimator from the LLA algorithm as the final estimates. Statistically, we show that if the regularization parameter is appropriately chosen, the one-step LLA estimates enjoy the oracle properties with good initial estimators. Computationally, the one-step LLA estimation methods dramatically reduce the computational cost in maximizing the nonconcave penalized likelihood. We conduct some Monte Carlo simulation to assess the finite sample performance of the one-step sparse estimation methods. The results are very encouraging.

Article information

Ann. Statist., Volume 36, Number 4 (2008), 1509-1533.

First available in Project Euclid: 16 July 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators

AIC BIC LASSO one-step estimator oracle properties SCAD


Zou, Hui; Li, Runze. One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 (2008), no. 4, 1509--1533. doi:10.1214/009053607000000802.

Export citation


  • [1] Antoniadis, A. and Fan, J. (2001). Regularization of wavelets approximations. J. Amer. Statist. Assoc. 96 939–967.
  • [2] Bickel, P. J. (1975). One-step Huber estimates in the linear model. J. Amer. Statist. Assoc. 70 428–434.
  • [3] Black, A. and Zisserman, A. (1987). Visual Reconstruction. MIT Press, Cambridge, MA.
  • [4] Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350–2383.
  • [5] Cai, J., Fan, J., Li, R. and Zhou, H. (2005). Variable selection for multivariate failure time data. Biometrika 92 303–316.
  • [6] Cai, J., Fan, J., Zhou, H. and Zhou, Y. (2007). Marginal hazard models with varying-coefficients for multivariate failure time data. Ann. Statist. 35 324–354.
  • [7] Cai, Z., Fan, J. and Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. J. Amer. Statist. Assoc. 95 888–902.
  • [8] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • [9] Fan, J. and Chen, J. (1999). One-step local quasi-likelihood estimation. J. Roy. Statist. Soc. Ser. B 61 927–943.
  • [10] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [11] Fan, J. and Li, R. (2002). Variable selection for Cox’s proportional hazards model and frailty model. Ann. Statist. 30 74–99.
  • [12] Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Amer. Statist. Assoc. 99 710–723.
  • [13] Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Proceedings of the Madrid International Congress of Mathematicians 2006 3 595–622. EMS, Zürich.
  • [14] Fan, J., Lin, H. and Zhou, Y. (2006). Local partial likelihood estimation for life time data. Ann. Statist. 34 290–325.
  • [15] Fan, J. and Peng, H. (2004). On non-concave penalized likelihood with diverging number of parameters. Ann. Statist. 32 928–961.
  • [16] Frank, I. and Friedman, J. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109–148.
  • [17] Fu, W. (1998). Penalized regression: The bridge versus the lasso. J. Comput. Graph. Statist. 7 397–416.
  • [18] Geyer, C. (1994). On the asymptotics of constrainted M-estimation. Ann. Statist. 22 1993–2010.
  • [19] Heiser, W. (1995). Convergent Computation by Iterative Majorization: Theory and Applications in Multidimensional Data Analysis. Clarendon Press, Oxford.
  • [20] Hunter, D. and Li, R. (2005). Variable selection using mm algorithms. Ann. Statist. 33 1617–1642.
  • [21] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
  • [22] Lange, K. (1995). A gradient algorithm locally equivalent to the EM algorithm. J. Roy. Statist. Soc. Ser. B 57 425–437.
  • [23] Lange, K., Hunter, D. and Yang, I. (2000). Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graph. Statist. 9 1–59.
  • [24] Lehmann, E. and Casella, G. (1998). Theory of Point Estimation, 2nd ed. Springer, Berlin.
  • [25] Leng, C., Lin, Y. and Wahba, G. (2006). A note on the lasso and related procedures in model selection. Statist. Sinica 16 1273–1284.
  • [26] Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Ann. Statist. 36 261–286.
  • [27] Mike, W. (1984). Outlier models and prior distributions in Bayesian linear regression. J. Roy. Statist. Soc. Ser. B 46 431–439.
  • [28] Miller, A. (2002). Subset Selection in Regression, 2nd ed. Chapman and Hall, London.
  • [29] Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403.
  • [30] Robinson, P. (1988). The stochastic difference between econometrics and statistics. Econometrics 56 531–547.
  • [31] Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths. Ann. Statist. 35 1012–1030.
  • [32] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [33] Wu, Y. (2000). Optimization transfer using surrogate objective functions: Discussion. J. Comput. Graph. Statist. 9 32–34.
  • [34] Yuan, M. and Lin, Y. (2005). Efficient empirical Bayes variable selection and estimation in linear models. J. Amer. Statist. Assoc. 100 1215–1225.
  • [35] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49–67.
  • [36] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301–320.