The Annals of Applied Statistics

Coordinate descent algorithms for lasso penalized regression

Tong Tong Wu and Kenneth Lange

Full-text: Open access


Imposition of a lasso penalty shrinks parameter estimates toward zero and performs continuous model selection. Lasso penalized regression is capable of handling linear regression problems where the number of predictors far exceeds the number of cases. This paper tests two exceptionally fast algorithms for estimating regression coefficients with a lasso penalty. The previously known 2 algorithm is based on cyclic coordinate descent. Our new 1 algorithm is based on greedy coordinate descent and Edgeworth’s algorithm for ordinary 1 regression. Each algorithm relies on a tuning constant that can be chosen by cross-validation. In some regression problems it is natural to group parameters and penalize parameters group by group rather than separately. If the group penalty is proportional to the Euclidean norm of the parameters of the group, then it is possible to majorize the norm and reduce parameter estimation to 2 regression with a lasso penalty. Thus, the existing algorithm can be extended to novel settings. Each of the algorithms discussed is tested via either simulated or real data or both. The Appendix proves that a greedy form of the 2 algorithm converges to the minimum value of the objective function.

Article information

Ann. Appl. Stat. Volume 2, Number 1 (2008), 224-244.

First available in Project Euclid: 24 March 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Model selection Edgeworth’s algorithm cyclic greedy consistency convergence


Wu, Tong Tong; Lange, Kenneth. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2 (2008), no. 1, 224--244. doi:10.1214/07-AOAS147.

Export citation


  • Armstrong, R. D. and Kung, M. T. (1978). Algorithm AS 132: Least absolute value estimates for a simple linear regression problem., Appl. Statist. 27 363–366.
  • Barrodale, I. and Roberts, F. D. (1980). Algorithm 552: Solution of the constrained, 1 linear approximation problem. ACM Trans. Math. Software 6 231–235.
  • Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when, p is much larger than n (with discussion). Ann. Statist. 35 2313–2404.
  • Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit., SIAM J. Sci. Comput. 20 33–61.
  • Claerbout, J. F. and Muir, F. (1973). Robust modeling with erratic data., Geophysics 38 826–844.
  • Daubechies, I., Defrise, M. and De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint., Comm. Pure Appl. Math. 57 1413–1457.
  • Edgeworth, F. Y. (1887). On observations relating to several quantities., Hermathena 6 279–285.
  • Edgeworth, F. Y. (1888). On a new method of reducing observations relating to several quantities., Philosophical Magazine 25 184–191.
  • Friedman, J., Hastie, T., Hofling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization., Ann. Appl. Statist. 1 302–332.
  • Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso., J. Comput. Graph. Statist. 7 397–416.
  • Ghazalpour, A., Doss, S., Sheth, S. S., Ingram-Drake, L. A., Schadt, E. E., Lusis, A. J. and Drake, T. A. (2005). Genomic analysis of metabolic pathway gene expression in mice., Nat. Genet. 37 1224–1233.
  • Ghazalpour, A., Doss, S., Zhang, B., Wang, S., Plaisier, C., Castellanos, R., Brozell, A., Schadt, E. E., Drake, T. A., Lusis, A. J. and Horvath, S. (2006). Integrating genetic and network analysis to characterize genes related to mouse weight., PLoS Genet. 2 e130.
  • Hastie, T. and Efron, B. (2007). The LARS, Package.
  • Huang, B., Wu, P., Bowker-Kinley, M. M. and Harris, R. A. (2002). Regulation of pyruvate dehydrogenase kinase expression by peroxisome proliferator-activated receptor-alpha ligands., Diabetes 51 276–283.
  • Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms., Amer. Statist. 58 30–37.
  • Lange, K. (2004)., Optimization. Springer, New York.
  • Li, Y. and Arce, G. R. (2004). A maximum likelihood approach to least absolute deviation regression., EURASIP J. Applied Signal Proc. 2004 1762–1769.
  • Mehrabian, M., Allayee, H., Stockton, J., Lum, P. Y., Drake, T. A., Castellani, L. W., Suh, M., Armour, C., Edwards, S., Lamb, J., Lusis, A. J. and Schadt, E. E. (2005). Integrating genotypic and expression data in a segregating mouse population to identify 5-lipoxygenase as a susceptibility gene for obesity and bone traits., Nat. Genet. 37 1224–1233.
  • Merle, G. and Spath, H. (1974). Computational experiences with discrete, Lp approximation. Computing 12 315–321.
  • Oberhofer, W. (1983). The consistency of nonlinear regression minimizing the, 1-norm. Ann. Statist. 10 316–319.
  • Park, M. Y. and Hastie, T. (2006a)., L1 regularization path algorithm for generalized linear models. Technical Report 2006-14, Dept. Statistics, Stanford Univ.
  • Park, M. Y. and Hastie, T. (2006b). Penalized logistic regression for detecting gene interactions. Technical Report 2006-15, Dept. Statistics, Stanford, Univ.
  • Portnoy, S. and Koenker, R. (1997). The Gaussian hare and the Laplacian tortoise: Computability of squared-error versus absolute-error estimators., Statist. Sci. 12 279–300.
  • Rudin, W. (1987)., Real and Complex Analysis, 3rd ed. McGraw-Hill, New York.
  • Ruszczyński, A. (2006)., Nonlinear Optimization. Princeton Univ. Press.
  • Santosa, F. and Symes, W. W. (1986). Linear inversion of band-limited reflection seimograms., SIAM J. Sci. Statist. Comput. 7 1307–1330.
  • Schlossmacher, E. J. (1973). An iterative technique for absolute deviations curve fitting., J. Amer. Statist. Assoc. 68 857–859.
  • Sugden, M. C. (2003). PDK4: A factor in fatness?, Obesity Res. 11 167–169.
  • Taylor, H. L., Banks, S. C. and McCoy, J. F. (1979). Deconvolution with the, 1 norm. Geophysics 44 39–52.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tseng, P. (2001). Convergence of block coordinate descent method for nondifferentiable maximization., J. Optim. Theory Appl. 109 473–492.
  • Wang, L., Gordon, M. D. and Zhu, J. (2006a). Regularized least absolute deviations regression and an efficient algorithm for parameter tuning. In, Proceedings of the Sixth International Conference on Data Mining (ICDM’06) 690–700. IEEE Computer Society.
  • Wang, S., Yehya, N., Schadt, E. E., Wang, H., Drake, T. A. and Lusis, A. J. (2006b). Genetic and genomic analysis of a fat mass trait with complex inheritance reveals marked sex specificity., PLoS Genet. 2 148–159.
  • Wu, T. T. and Lange, K. (2008). Supplement to “Cordinate descent algorithms for lasso penalized regression.” DOI:, 10.1214/07-AOAS174SUPP.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., J. Roy. Statist. Soc. Ser. B 68 49–67.
  • Zhao, P., Rocha, G. and Yu, B. (2006). Grouped and hierarchical model selection through composite absolute penalties. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of lasso., J. Machine Learning Research 7 2541–2563.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., J. Roy. Statist. Soc. Ser. B 67 301–320.

Supplemental materials