The Annals of Statistics

The sparsity and bias of the Lasso selection in high-dimensional linear regression

Cun-Hui Zhang and Jian Huang

Full-text: Open access


Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436–1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the α-loss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.

Article information

Ann. Statist. Volume 36, Number 4 (2008), 1567-1594.

First available in Project Euclid: 16 July 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators
Secondary: 62H25: Factor analysis and principal components; correspondence analysis

Penalized regression high-dimensional data variable selection bias rate consistency spectral analysis random matrices


Zhang, Cun-Hui; Huang, Jian. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 (2008), no. 4, 1567--1594. doi:10.1214/07-AOS520.

Export citation


  • Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statist. Sinica 9 611–677.
  • Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Sparsity oracle inequalities for the lasso. Technical report M979, Dept. Statistics, Florida State Univ.
  • Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2351.
  • Davidson, K. and Szarek, S. (2001). Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach Spaces I (W. B. Johnson and J. Lindenstrauss, eds.) 317–366. North-Holland, Amsterdam.
  • Donoho, D. L. (2006). For most large underdetermined systems of equations, the minimal 1-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 907–934.
  • Donoho, D.L. and Johnstone, I. (1994). Minimax risk over p-balls for q-error. Probab. Theory Related Fields 99 277–303.
  • Eaton, M. L. (1983). Multivariate Statistics: A Vector Space Approach. Wiley, New York.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Efron, B., Hastie, T. and Tibshirani, R. (2007). Discussion of: “The Dantzig selector: Statistical estimation when p is much larger than n”. Ann. Statist. 35 2358–2364.
  • Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
  • Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
  • Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • Huang, J., Ma, S. and Zhang, C.-H. (2007). Adaptive LASSO for sparse high-dimensional regression models. Statist. Sinica. To appear.
  • Knight, K. and Fu, W. J. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
  • Leng, C., Lin, Y. and Wahba, G. (2006). A note on the LASSO and related procedures in model selection. Statist. Sinica 16 1273–1284.
  • Meinshausen, N. and Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Yu, B. (2006). Lasso-type recovery of sparse representations for high-dimensional data. Technical report, Dept. Statistics, Univ. California, Berkeley.
  • Osborne, M., Presnell, B. and Turlach, B. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–404.
  • Osborne, M., Presnell, B. and Turlach, B. (2000b). On the lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
  • Silverstein, J. W. (1985). The smallest eigenvalue of a large dimensional Wishart matrix. Ann. Probab. 13 1364–1368.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • van de Geer, S. (2007). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
  • Wainwright, M. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Available at
  • Zhao, P. and Yu, B. (2006). On model selection consistency of LASSO. J. Machine Learning Research 7 2541–2567.
  • Zhang, C.-H. and Huang, J. (2006). Model-selection consistency of the LASSO in high-dimensional linear regression. Technical Report No. 2006-003, Dept. Statistics, Rutgers Univ.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.