Least angle regression

Bradley Efron, Trevor Hastie, Iain Johnstone, and Robert Tibshirani
Source: Ann. Statist. Volume 32, Number 2 (2004), 407-499.

Abstract

The purpose of model selection algorithms such as All Subsets, Forward Selection and Backward Elimination is to choose a linear model on the basis of the same set of data to which the model will be applied. Typically we have available a large collection of possible covariates from which we hope to select a parsimonious set for the efficient prediction of a response variable. Least Angle Regression (LARS), a new model selection algorithm, is a useful and less greedy version of traditional forward selection methods. Three main properties are derived: (1) A simple modification of the LARS algorithm implements the Lasso, an attractive version of ordinary least squares that constrains the sum of the absolute regression coefficients; the LARS modification calculates all possible Lasso estimates for a given problem, using an order of magnitude less computer time than previous methods. (2) A different LARS modification efficiently implements Forward Stagewise linear regression, another promising new model selection method; this connection explains the similar numerical results previously observed for the Lasso and Stagewise, and helps us understand the properties of both methods, which are seen as constrained versions of the simpler LARS algorithm. (3) A simple approximation for the degrees of freedom of a LARS estimate is available, from which we derive a Cp estimate of prediction error; this allows a principled choice among the range of possible LARS estimates. LARS and its variants are computationally efficient: the paper describes a publicly available algorithm that requires only the same order of magnitude of computational effort as ordinary least squares applied to the full set of covariates.

First Page:
Primary Subjects: 62J07
Full-text: Open access

Permanent link to this document: http://projecteuclid.org/euclid.aos/1083178935
Digital Object Identifier: doi:10.1214/009053604000000067
Mathematical Reviews number (MathSciNet): MR2060166
Zentralblatt MATH identifier: 02100802

References

Breiman, L., Friedman, J., Olshen, R. and Stone, C. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet): MR726392
Zentralblatt MATH: 0541.62042
Efron, B. (1986). How biased is the apparent error rate of a prediction rule? J. Amer. Statist. Assoc. 81 461--470.
Mathematical Reviews (MathSciNet): MR845884
Efron, B. and Tibshirani, R. (1997). Improvements on cross-validation: The $.632+$ bootstrap method. J. Amer. Statist. Assoc. 92 548--560.
Mathematical Reviews (MathSciNet): MR1467848
Freund, Y. and Schapire, R. (1997). A decision-theoretic generalization of online learning and an application to boosting. J. Comput. System Sci. 55 119--139.
Mathematical Reviews (MathSciNet): MR1473055
Digital Object Identifier: doi:10.1006/jcss.1997.1504
Zentralblatt MATH: 0880.68103
Friedman, J. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189--1232.
Mathematical Reviews (MathSciNet): MR1873328
Digital Object Identifier: doi:10.1214/aos/1013203451
Project Euclid: euclid.aos/1013203451
Zentralblatt MATH: 1043.62034
Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting (with discussion). Ann. Statist. 28 337--407.
Mathematical Reviews (MathSciNet): MR1790002
Digital Object Identifier: doi:10.1214/aos/1016218223
Project Euclid: euclid.aos/1016218223
Zentralblatt MATH: 1106.62323
Golub, G. and Van Loan, C. (1983). Matrix Computations. Johns Hopkins Univ. Press, Baltimore, MD.
Mathematical Reviews (MathSciNet): MR733103
Zentralblatt MATH: 0559.65011
Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, New York.
Mathematical Reviews (MathSciNet): MR1851606
Zentralblatt MATH: 0973.62007
Lawson, C. and Hanson, R. (1974). Solving Least Squares Problems. Prentice-Hall, Englewood Cliffs, NJ.
Mathematical Reviews (MathSciNet): MR366019
Zentralblatt MATH: 0860.65028
Mallows, C. (1973). Some comments on $C_p$. Technometrics 15 661--675.
Meyer, M. and Woodroofe, M. (2000). On the degrees of freedom in shape-restricted regression. Ann. Statist. 28 1083--1104.
Mathematical Reviews (MathSciNet): MR1810920
Digital Object Identifier: doi:10.1214/aos/1015956708
Project Euclid: euclid.aos/1015956708
Zentralblatt MATH: 1105.62340
Osborne, M., Presnell, B. and Turlach, B. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389--403.
Mathematical Reviews (MathSciNet): MR1773265
Digital Object Identifier: doi:10.1093/imanum/20.3.389
Zentralblatt MATH: 0962.65036
Osborne, M. R., Presnell, B. and Turlach, B. (2000b). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319--337.
Mathematical Reviews (MathSciNet): MR1822089
Rao, C. R. (1973). Linear Statistical Inference and Its Applications, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR346957
Zentralblatt MATH: 0256.62002
Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135--1151.
Mathematical Reviews (MathSciNet): MR630098
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B. 58 267--288.
Mathematical Reviews (MathSciNet): MR1379242
Weisberg, S. (1980). Applied Linear Regression. Wiley, New York.
Mathematical Reviews (MathSciNet): MR591462
Zentralblatt MATH: 0529.62054
Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120--131.
Mathematical Reviews (MathSciNet): MR1614596
Breiman, L. (1992). The little bootstrap and other methods for dimensionality selection in regression: $X$-fixed prediction error. J. Amer. Statist. Assoc. 87 738--754.
Mathematical Reviews (MathSciNet): MR1185196
George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881--889.
Ishwaran, H. and Rao, J. S. (2000). Bayesian nonparametric MCMC for large variable selection problems. Unpublished manuscript.
Ishwaran, H. and Rao, J. S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. J. Amer. Statist. Assoc. 98 438--455.
Mathematical Reviews (MathSciNet): MR1995720
Digital Object Identifier: doi:10.1198/016214503000224
Zentralblatt MATH: 1041.62090
Mallows, C. (1973). Some comments on $C_p$. Technometrics 15 661--675.
Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression (with discussion). J. Amer. Statist. Assoc. 83 1023--1036.
Mathematical Reviews (MathSciNet): MR997578
Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486--494.
Mathematical Reviews (MathSciNet): MR1224373
Breiman, L. (1996). Bagging predictors. Machine Learning 24 123--140.
Bühlmann, P. and Yu, B. (2002). Analyzing bagging. Ann. Statist. 30 927--961.
Mathematical Reviews (MathSciNet): MR1926165
Digital Object Identifier: doi:10.1214/aos/1031689014
Project Euclid: euclid.aos/1031689014
Zentralblatt MATH: 1029.62037
Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2000). Adapting to unknown sparsity by controlling the false discovery rate. Technical Report 2000--19, Dept. Statistics, Stanford Univ.
Akaike, H. (1973). Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika 60 255--265.
Mathematical Reviews (MathSciNet): MR326953
Zentralblatt MATH: 0318.62075
Birgé, L. and Massart, P. (2001a). Gaussian model selection. J. Eur. Math. Soc. 3 203--268.
Mathematical Reviews (MathSciNet): MR1848946
Digital Object Identifier: doi:10.1007/s100970100031
Zentralblatt MATH: 1037.62001
Birgé, L. and Massart, P. (2001b). A generalized $C_p$ criterion for Gaussian model selection. Technical Report 647, Univ. Paris 6 & 7.
Foster, D. and George, E. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947--1975.
Mathematical Reviews (MathSciNet): MR1329177
Knight, K. and Fu, B. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356--1378.
Mathematical Reviews (MathSciNet): MR1805787
Digital Object Identifier: doi:10.1214/aos/1015957397
Project Euclid: euclid.aos/1015957397
Zentralblatt MATH: 1105.62357
Loubes, J.-M. and van de Geer, S. (2002). Adaptive estimation with soft thresholding penalties. Statist. Neerlandica 56 453--478.
Mathematical Reviews (MathSciNet): MR2027536
Digital Object Identifier: doi:10.1111/1467-9574.00212
Zentralblatt MATH: 1090.62534
Mallows, C. (1973). Some comments on $C_p$. Technometrics 15 661--675.
van de Geer, S. (2001). Least squares estimation with complexity penalties. Math. Methods Statist. 10 355--374.
Mathematical Reviews (MathSciNet): MR1867165
Breiman, L. (2001). Random forests. Available at ftp://ftp.stat.berkeley.edu/pub/users/breiman/ randomforest2001.pdf.
Fu, W. J. (1998). Penalized regressions: The Bridge versus the Lasso. J. Comput. Graph. Statist. 7 397--416.
Mathematical Reviews (MathSciNet): MR1646710
Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389--403.
Mathematical Reviews (MathSciNet): MR1773265
Digital Object Identifier: doi:10.1093/imanum/20.3.389
Zentralblatt MATH: 0962.65036
Ridgeway, G. (2003). GBM 0.7-2 package manual. Available at http://cran.r-project.org/doc/ packages/gbm.pdf.
Breiman, L. (1999). Prediction games and arcing algorithms. Neural Computation 11 1493--1517.
Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119--139.
Mathematical Reviews (MathSciNet): MR1473055
Digital Object Identifier: doi:10.1006/jcss.1997.1504
Zentralblatt MATH: 0880.68103
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189--1232.
Mathematical Reviews (MathSciNet): MR1873328
Digital Object Identifier: doi:10.1214/aos/1013203451
Project Euclid: euclid.aos/1013203451
Zentralblatt MATH: 1043.62034
Mason, L., Baxter, J., Bartlett, P. and Frean, M. (2000). Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems 12 512--518. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR1820960
Rosset, S. and Zhu, J. (2004). Piecewise linear regularized solution paths. Advances in Neural Information Processing Systems 16. To appear.
Mathematical Reviews (MathSciNet): MR2341696
Zentralblatt MATH: 05186959
Digital Object Identifier: doi:10.1214/009053606000001370
Project Euclid: euclid.aos/1185303996
Rosset, S., Zhu, J. and Hastie, T. (2003). Boosting as a regularized path to a maximum margin classifier. Technical report, Dept. Statistics, Stanford Univ.
Mathematical Reviews (MathSciNet): MR2248005
Zhu, J., Rosset, S., Hastie, T. and Tibshirani, R. (2004). 1-norm support vector machines. Neural Information Processing Systems 16. To appear.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289--300.
Mathematical Reviews (MathSciNet): MR1325392
Blake, C. and Merz, C. (1998). UCI repository of machine learning databases. Technical report, School Information and Computer Science, Univ. California, Irvine. Available at www.ics.uci.edu/~mlearn/MLRepository.html.
Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425--455.
Mathematical Reviews (MathSciNet): MR1311089
Zentralblatt MATH: 0815.62019
Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947--1975.
Mathematical Reviews (MathSciNet): MR1329177
Foster, D. P. and Stine, R. A. (1996). Variable selection via information theory. Technical Report Discussion Paper 1180, Center for Mathematical Studies in Economics and Management Science, Northwestern Univ.
Shao, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486--494.
Mathematical Reviews (MathSciNet): MR1224373
Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373--384.
Mathematical Reviews (MathSciNet): MR1365720
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR727836
Zentralblatt MATH: 0588.62104
Moore, D. S. and McCabe, G. P. (1999). Introduction to the Practice of Statistics, 3rd ed. Freeman, New York.
Zentralblatt MATH: 0701.62002
Nelder, J. A. (1977). A reformulation of linear models (with discussion). J. Roy. Statist. Soc. Ser. A 140 48--76.
Mathematical Reviews (MathSciNet): MR458743
Nelder, J. A. (1994). The statistics of linear models: Back to basics. Statist. Comput. 4 221--234.
Cook, R. D. (1998). Regression Graphics. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1645673
Zentralblatt MATH: 0903.62001
Cook, R. D. and Weisberg, S. (1999a). Applied Regression Including Computing and Graphics. Wiley, New York.
Cook, R. D. and Weisberg, S. (1999b). Graphs in statistical analysis: Is the medium the message? Amer. Statist. 53 29--37.
Efron, B. (2001). Discussion of Statistical modeling: The two cultures,'' by L. Breiman. Statist. Sci. 16 218--219.
Mathematical Reviews (MathSciNet): MR1874152
Digital Object Identifier: doi:10.1214/ss/1009213726
Project Euclid: euclid.ss/1009213726
Zentralblatt MATH: 1059.62505
Li, K. C. (1991). Sliced inverse regression for dimension reduction (with discussion). J. Amer. Statist. Assoc. 86 316--342.
Mathematical Reviews (MathSciNet): MR1137117
Weisberg, S. (1981). A statistic for allocating $C_p$ to individual cases. Technometrics 23 27--31.
Mathematical Reviews (MathSciNet): MR604907
Weisberg, S. (2002). Dimension reduction regression in R. J. Statistical Software 7. (On-line journal available at www.jstatsoft.org. The software is available from cran.r-project.org.)
Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2000). Adapting to unknown sparsity by controlling the false discovery rate. Technical Report 2000-19, Dept. Statistics, Stanford Univ.
Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3 203--268.
Mathematical Reviews (MathSciNet): MR1848946
Digital Object Identifier: doi:10.1007/s100970100031
Zentralblatt MATH: 1037.62001
Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation. J. Amer. Statist. Assoc. To appear.
Mathematical Reviews (MathSciNet): MR2090899
Digital Object Identifier: doi:10.1198/016214504000000692
Zentralblatt MATH: 1117.62324
Foster, D. and Stine, R. (1997). An information theoretic comparison of model selection criteria. Technical report, Dept. Statistics, Univ. Pennsylvania.
George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731--747.
Mathematical Reviews (MathSciNet): MR1813972
Zentralblatt MATH: 1029.62008
Digital Object Identifier: doi:10.1093/biomet/87.4.731
Leblanc, M. and Tibshirani, R. (1998). Monotone shrinkage of trees. J. Comput. Graph. Statist. 7 417--433.