The Annals of Statistics

On the “degrees of freedom” of the lasso

Hui Zou, Trevor Hastie, and Robert Tibshirani

Full-text: Open access

Abstract

We study the effective degrees of freedom of the lasso in the framework of Stein’s unbiased risk estimation (SURE). We show that the number of nonzero coefficients is an unbiased estimate for the degrees of freedom of the lasso—a conclusion that requires no special assumption on the predictors. In addition, the unbiased estimator is shown to be asymptotically consistent. With these results on hand, various model selection criteria—Cp, AIC and BIC—are available, which, along with the LARS algorithm, provide a principled and efficient approach to obtaining the optimal lasso fit with the computational effort of a single ordinary least-squares fit.

Article information

Source
Ann. Statist. Volume 35, Number 5 (2007), 2173-2192.

Dates
First available in Project Euclid: 7 November 2007

Permanent link to this document
http://projecteuclid.org/euclid.aos/1194461726

Digital Object Identifier
doi:10.1214/009053607000000127

Mathematical Reviews number (MathSciNet)
MR2363967

Zentralblatt MATH identifier
1126.62061

Subjects
Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators 90C46: Optimality conditions, duality [See also 49N15]

Keywords
Degrees of freedom LARS algorithm lasso model selection SURE unbiased estimate

Citation

Zou, Hui; Hastie, Trevor; Tibshirani, Robert. On the “degrees of freedom” of the lasso. Ann. Statist. 35 (2007), no. 5, 2173--2192. doi:10.1214/009053607000000127. http://projecteuclid.org/euclid.aos/1194461726.


Export citation

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (B. N. Petrov and F. Csáki, eds.) 267--281. Académiai Kiadó, Budapest.
  • Bühlmann, P. and Yu, B. (2005). Boosting, model selection, lasso and nonnegative garrote. Technical report, ETH Zürich.
  • Donoho, D. and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200--1224.
  • Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation (with discussion). J. Amer. Statist. Assoc. 99 619--642.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression (with discussion). Ann. Statist. 32 407--499.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348--1360.
  • Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In Proc. International Congress of Mathematicians 3 595--622. European Math. Soc., Zürich.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928--961.
  • Gunter, L. and Zhu, J. (2007). Efficient computation and model selection for the support vector regression. Neural Computation 19 1633--1655.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning; Data Mining, Inference and Prediction. Springer, New York.
  • Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356--1378.
  • Mallows, C. (1973). Some comments on $C_P$. Technometrics 15 661--675.
  • Meyer, M. and Woodroofe, M. (2000). On the degrees of freedom in shape-restricted regression. Ann. Statist. 28 1083--1104.
  • Osborne, M., Presnell, B. and Turlach, B. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389--403.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461--464.
  • Shao, J. (1997). An asymptotic theory for linear model selection (with discussion). Statist. Sinica 7 221--264.
  • Shen, X. and Huang, H.-C. (2006). Optimal model assessment, selection and combination. J. Amer. Statist. Assoc. 101 554--568.
  • Shen, X., Huang, H.-C. and Ye, J. (2004). Adaptive model selection and assessment for exponential family distributions. Technometrics 46 306--317.
  • Shen, X. and Ye, J. (2002). Adaptive model selection. J. Amer. Statist. Assoc. 97 210--221.
  • Stein, C. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135--1151.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267--288.
  • Yang, Y. (2005). Can the strengths of AIC and BIC be shared?---A conflict between model identification and regression estimation. Biometrika 92 937--950.
  • Ye, J. (1998). On measuring and correcting the effects of data mining and model selection. J. Amer. Statist. Assoc. 93 120--131.
  • Zhao, P., Rocha, G. and Yu, B. (2006). Grouped and hierarchical model selection through composite absolute penalties. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • Zou, H. (2005). Some perspectives of sparse statistical modeling. Ph.D. dissertation, Dept. Statistics, Stanford Univ.