The Annals of Statistics

Piecewise linear regularized solution paths

Saharon Rosset and Ji Zhu

Full-text: Open access

Abstract

We consider the generic regularized optimization problem β̂(λ)=arg minβL(y, )+λJ(β). Efron, Hastie, Johnstone and Tibshirani [Ann. Statist. 32 (2004) 407–499] have shown that for the LASSO—that is, if L is squared error loss and J(β)=‖β1 is the 1 norm of β—the optimal coefficient path is piecewise linear, that is, ∂β̂(λ)/∂λ is piecewise constant. We derive a general characterization of the properties of (loss L, penalty J) pairs which give piecewise linear coefficient paths. Such pairs allow for efficient generation of the full regularized coefficient paths. We investigate the nature of efficient path following algorithms which arise. We use our results to suggest robust versions of the LASSO for regression and classification, and to develop new, efficient algorithms for existing problems in the literature, including Mammen and van de Geer’s locally adaptive regression splines.

Article information

Source
Ann. Statist. Volume 35, Number 3 (2007), 1012-1030.

Dates
First available: 24 July 2007

Permanent link to this document
http://projecteuclid.org/euclid.aos/1185303996

Digital Object Identifier
doi:10.1214/009053606000001370

Mathematical Reviews number (MathSciNet)
MR2341696

Zentralblatt MATH identifier
05186959

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62F35: Robustness and adaptive procedures 62G08: Nonparametric regression 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]

Keywords
ℓ_1-norm penalty polynomial splines regularization solution paths sparsity total variation

Citation

Rosset, Saharon; Zhu, Ji. Piecewise linear regularized solution paths. The Annals of Statistics 35 (2007), no. 3, 1012--1030. doi:10.1214/009053606000001370. http://projecteuclid.org/euclid.aos/1185303996.


Export citation

References

  • Davies, P. L. and Kovac, A. (2001). Local extremes, runs, strings and multiresolution (with discussion). Ann. Statist. 29 1--65.
  • Donoho, D., Johnstone, I., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B 57 301--369.
  • Efron, B., Hastie, T., Johnstone, I. M. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407--499.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348--1360.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928--961.
  • Freund, Y. and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proc. 13th International Conference on Machine Learning 148--156. Morgan Kauffman, San Francisco.
  • Hastie, T., Rosset, S., Tibshirani, R. and Zhu, J. (2004). The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5 1391--1415.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
  • Hoerl, A. and Kennard, R. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55--67.
  • Huber, P. (1964). Robust estimation of a location parameter. Ann. Math. Statist. 35 73--101.
  • Koenker, R. (2005). Quantile Regression. Cambridge Univ. Press.
  • Koenker, R., Ng, P. and Portnoy, S. (1994). Quantile smoothing splines. Biometrika 81 673--680.
  • Mammen, E. and van de Geer, S. (1997). Locally adaptive regression splines. Ann. Statist. 25 387--413.
  • Osborne, M., Presnell, B. and Turlach, B. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319--337.
  • Rosset, S., Zhu, J. and Hastie, T. (2004). Boosting as a regularized path to a maximum margin classifier. J. Mach. Learn. Res. 5 941--973.
  • Shen, X., Tseng, G., Zhang, X. and Wong, W. H. (2003). On $\psi$-learning. J. Amer. Statist. Assoc. 98 724--734.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267--288.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 91--108.
  • Tsuda, K. and Ratsch, G. (2005). Image reconstruction by linear programming. IEEE Trans. Image Process. 14 737--744.
  • Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer, New York.
  • Zhu, J., Rosset. S., Hastie, T. and Tibshirani, R. (2003). 1-norm support vector machines. In Advances in Neural Information Processing Systems 16.