The Annals of Statistics

Piecewise linear regularized solution paths

Saharon Rosset and Ji Zhu

Full-text: Open access


We consider the generic regularized optimization problem β̂(λ)=arg minβL(y, )+λJ(β). Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407–499] have shown that for the LASSO—that is, if L is squared error loss and J(β)=‖β1 is the 1 norm of β—the optimal coefficient path is piecewise linear, that is, ∂β̂(λ)/∂λ is piecewise constant. We derive a general characterization of the properties of (loss L, penalty J) pairs which give piecewise linear coefficient paths. Such pairs allow for efficient generation of the full regularized coefficient paths. We investigate the nature of efficient path following algorithms which arise. We use our results to suggest robust versions of the LASSO for regression and classification, and to develop new, efficient algorithms for existing problems in the literature, including Mammen and van de Geer’s locally adaptive regression splines.

Article information

Ann. Statist. Volume 35, Number 3 (2007), 1012-1030.

First available in Project Euclid: 24 July 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62F35: Robustness and adaptive procedures 62G08: Nonparametric regression 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]

ℓ_1-norm penalty polynomial splines regularization solution paths sparsity total variation


Rosset, Saharon; Zhu, Ji. Piecewise linear regularized solution paths. Ann. Statist. 35 (2007), no. 3, 1012--1030. doi:10.1214/009053606000001370.

Export citation


  • Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution (with discussion). Ann. Statist. 29 1--65.
  • Donoho, D., Johnstone, I., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B 57 301--369.
  • Efron, B., Hastie, T., Johnstone, I. M. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407--499.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348--1360.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928--961.
  • Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning 148--156. Morgan Kauffman, San Francisco.
  • Hastie, T., Rosset, S., Tibshirani, R. and Zhu, J. (2004). The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5 1391--1415.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
  • Hoerl, A. and Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55--67.
  • Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35 73--101.
  • Koenker, R. (2005). Quantile Regression. Cambridge Univ. Press.
  • Koenker, R., Ng, P. and Portnoy, S. (1994). Quantile smoothing splines. Biometrika 81 673--680.
  • Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387--413.
  • Osborne, M., Presnell, B. and Turlach, B. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319--337.
  • Rosset, S., Zhu, J. and Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res. 5 941--973.
  • Shen, X., Tseng, G., Zhang, X. and Wong, W. H. (2003). On $\psi$-learning. J. Amer. Statist. Assoc. 98 724--734.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267--288.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91--108.
  • Tsuda, K. and Ratsch, G. (2005). Image reconstruction by linear programming. IEEE Trans. Image Process. 14 737--744.
  • Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, New York.
  • Zhu, J., Rosset. S., Hastie, T. and Tibshirani, R. (2003). 1-norm support vector machines. In Advances in Neural Information Processing Systems 16.