The Annals of Applied Statistics

Pathwise coordinate optimization

Jerome Friedman, Trevor Hastie, Holger Höfling, and Robert Tibshirani

Full-text: Open access

Abstract

We consider “one-at-a-time” coordinate-wise descent algorithms for a class of convex optimization problems. An algorithm of this kind has been proposed for the L1-penalized regression (lasso) in the literature, but it seems to have been largely ignored. Indeed, it seems that coordinate-wise algorithms are not often used in convex optimization. We show that this algorithm is very competitive with the well-known LARS (or homotopy) procedure in large lasso problems, and that it can be applied to related methods such as the garotte and elastic net. It turns out that coordinate-wise descent does not work in the “fused lasso,” however, so we derive a generalized algorithm that yields the solution in much less time that a standard convex optimizer. Finally, we generalize the procedure to the two-dimensional fused lasso, and demonstrate its performance on some image smoothing problems.

Article information

Source
Ann. Appl. Stat. Volume 1, Number 2 (2007), 302-332.

Dates
First available in Project Euclid: 30 November 2007

Permanent link to this document
http://projecteuclid.org/euclid.aoas/1196438020

Digital Object Identifier
doi:10.1214/07-AOAS131

Mathematical Reviews number (MathSciNet)
MR2415737

Zentralblatt MATH identifier
05226935

Keywords
Coordinate descent lasso convex optimization

Citation

Friedman, Jerome; Hastie, Trevor; Höfling, Holger; Tibshirani, Robert. Pathwise coordinate optimization. Ann. Appl. Stat. 1 (2007), no. 2, 302--332. doi:10.1214/07-AOAS131. http://projecteuclid.org/euclid.aoas/1196438020.


Export citation

References

  • Bertsekas, D. (1999). Nonlinear Programming. Athena Scientific.
  • Breiman, L. (1995). Better subset selection using the nonnegative garrote. Technometrics 37 738–754.
  • Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 33–61.
  • Daubechies, I., Defrise, M. and De Mol, C. (2004). An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Comm. Pure Appl. Math. 57 1413–1457.
  • Donoho, D. and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200–1224.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407–499.
  • Friedlander, M. and Saunders, M. (2007). Discussion of “Dantzig selector” by E. Candes and T. Tao. Ann. Statist. 35 2385–2391.
  • Fu, W. J. (1998). Penalized regressions: The bridge versus the lasso. J. Comput. Graph. Statist. 7 397–416.
  • Gill, P., Murray, W. and Saunders, M. (1999). Users guide for sqopt 5.3: A fortran package for large-scale linear and quadratic programming. Technical report, Stanford Univ.
  • Li, Y. and Arce, G. (2004). A maximum likelihood approach to least absolute deviation regression. URASIP J. Appl. Signal Processing 2004 1762–1769.
  • Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–404.
  • Owen, A. (2006). A robust hybrid of lasso and ridge regression. Technical report, Stanford Univ.
  • Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Phys. D 60 259–268.
  • Schlegel, P. (1970). The explicit inverse of a tridiagonal matrix. Math. Comput. 24 665–665.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. Roy. Statist. Soc. Ser. B 67 91–108.
  • Tibshirani, R. and Wang, P. (2007). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics. Advance Access published May 18, 2007.
  • Tseng, P. (1988). Coordinate ascent for maximizing nondifferentiable concave functions. Technical Report LIDS-P, 1840, Massachusetts Institute of Technology, Laboratory for Information and Decision Systems.
  • Tseng, P. (2001). Convergence of block coordinate descent method for nondifferentiable maximation. J. Opt. Theory Appl. 109 474–494.
  • Van der Kooij, A. (2007). Prediction accuracy and stability of regresssion with optimal scaling transformations. Technical report, Dept. Data Theory, Leiden Univ.
  • Wang, H., Li, G. and Jiang, G. (2006). Robust regression shrinkage and consistent variable selection via the lad-lasso. J. Business Econom. Statist. 11 1–6.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49–67.
  • Zhou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. Roy. Statist. Soc. Ser. B 67 301–320.