The Annals of Applied Statistics

Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection

Patrick Breheny and Jian Huang

Full-text: Open access


A number of variable selection methods have been proposed involving nonconvex penalty functions. These methods, which include the smoothly clipped absolute deviation (SCAD) penalty and the minimax concave penalty (MCP), have been demonstrated to have attractive theoretical properties, but model fitting is not a straightforward task, and the resulting solutions may be unstable. Here, we demonstrate the potential of coordinate descent algorithms for fitting these models, establishing theoretical convergence properties and demonstrating that they are significantly faster than competing approaches. In addition, we demonstrate the utility of convexity diagnostics to determine regions of the parameter space in which the objective function is locally convex, even though the penalty is not. Our simulation study and data examples indicate that nonconvex penalties like MCP and SCAD are worthwhile alternatives to the lasso in many applications. In particular, our numerical results suggest that MCP is the preferred approach among the three methods.

Article information

Ann. Appl. Stat., Volume 5, Number 1 (2011), 232-253.

First available in Project Euclid: 21 March 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Coordinate descent penalized regression lasso SCAD MCP optimization


Breheny, Patrick; Huang, Jian. Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 (2011), no. 1, 232--253. doi:10.1214/10-AOAS388.

Export citation


  • Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350–2383.
  • Bruce, A. G. \and Gao, H. Y. (1996). Understanding WaveShrink: Variance and bias estimation. Biometrika 83 727–745.
  • Donoho, D. L. \and Johnstone, J. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • Efron, B., Hastie, T., Johnstone, I. \and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
  • Fan, J. \and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1361.
  • Friedman, J., Hastie, T. \and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Statist. Softw. 33 1–22.
  • Friedman, J., Hastie, T., Höfling, H. \and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 1 302–332.
  • Gao, H. Y. \and Bruce, A. G. (1997). WaveShrink with firm shrinkage. Statist. Sinica 7 855–874.
  • Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P., Coller, H., Loh, M. L., Downing, J. R., Caligiuri, M. A. et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–536.
  • McCullagh, P. \and Nelder, J. A. (1989). Generalized Linear Models. Chapman and Hall/CRC, Boca Raton, FL.
  • Park, M. Y. \and Hastie, T. (2007). L1 regularization path algorithm for generalized linear models. J. Roy. Statist. Soc. Ser. B 69 659–677.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tseng, P. (2001). Convergence of a block coordinate descent method for nondifferentiable minimization. J. Optim. Theory Appl. 109 475–494.
  • Wu, T. T. \and Lange, K. (2008). Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Statist. 2 224–244.
  • Yu, J., Yu, J., Almal, A. A., Dhanasekaran, S. M., Ghosh, D., Worzel, W. P. \and Chinnaiyan, A. M. (2007). Feature selection and molecular classification of cancer using genetic programming. Neoplasia 9 292–303.
  • Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
  • Zou, H. \and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.