The sparsity and bias of the Lasso selection in high-dimensional linear regression



The Annals of Statistics

The sparsity and bias of the Lasso selection in high-dimensional linear regression

Cun-Hui Zhang and Jian Huang

Source: Ann. Statist. Volume 36, Number 4 (2008), 1567-1594.

Abstract

Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436–1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the α-loss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.

Primary Subjects: 62J05, 62J07
Secondary Subjects: 62H25
Keywords: Penalized regression; high-dimensional data; variable selection; bias; rate consistency; spectral analysis; random matrices

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Alternatively, the document is available for a cost of $15. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1216237292
Digital Object Identifier: doi:10.1214/07-AOS520

References

Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statist. Sinica 9 611–677.
Mathematical Reviews (MathSciNet): MR1711663
Zentralblatt MATH: 0949.60077
Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Sparsity oracle inequalities for the lasso. Technical report M979, Dept. Statistics, Florida State Univ.
Mathematical Reviews (MathSciNet): MR2312149
Digital Object Identifier: doi:10.1214/07-EJS008
Project Euclid: euclid.ejs/1179759718
Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2351.
Mathematical Reviews (MathSciNet): MR2382644
Digital Object Identifier: doi:10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958
Davidson, K. and Szarek, S. (2001). Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach Spaces I (W. B. Johnson and J. Lindenstrauss, eds.) 317–366. North-Holland, Amsterdam.
Mathematical Reviews (MathSciNet): MR1863696
Zentralblatt MATH: 1067.46008
Digital Object Identifier: doi:10.1016/S1874-5849(01)80010-3
Donoho, D. L. (2006). For most large underdetermined systems of equations, the minimal 1-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 907–934.
Donoho, D.L. and Johnstone, I. (1994). Minimax risk over p-balls for q-error. Probab. Theory Related Fields 99 277–303.
Mathematical Reviews (MathSciNet): MR1278886
Digital Object Identifier: doi:10.1007/BF01199026
Eaton, M. L. (1983). Multivariate Statistics: A Vector Space Approach. Wiley, New York.
Mathematical Reviews (MathSciNet): MR716321
Zentralblatt MATH: 0587.62097
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
Mathematical Reviews (MathSciNet): MR2060166
Digital Object Identifier: doi:10.1214/009053604000000067
Project Euclid: euclid.aos/1083178935
Efron, B., Hastie, T. and Tibshirani, R. (2007). Discussion of: “The Dantzig selector: Statistical estimation when p is much larger than n”. Ann. Statist. 35 2358–2364.
Mathematical Reviews (MathSciNet): MR2382646
Digital Object Identifier: doi:10.1214/009053607000000433
Project Euclid: euclid.aos/1201012960
Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
Mathematical Reviews (MathSciNet): MR1329177
Digital Object Identifier: doi:10.1214/aos/1176325766
Project Euclid: euclid.aos/1176325766
Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
Mathematical Reviews (MathSciNet): MR566592
Digital Object Identifier: doi:10.1214/aop/1176994775
Project Euclid: euclid.aop/1176994775
Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
Mathematical Reviews (MathSciNet): MR2108039
Digital Object Identifier: doi:10.3150/bj/1106314846
Project Euclid: euclid.bj/1106314846
Huang, J., Ma, S. and Zhang, C.-H. (2007). Adaptive LASSO for sparse high-dimensional regression models. Statist. Sinica. To appear.
Knight, K. and Fu, W. J. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
Mathematical Reviews (MathSciNet): MR1805787
Digital Object Identifier: doi:10.1214/aos/1015957397
Project Euclid: euclid.aos/1015957397
Leng, C., Lin, Y. and Wahba, G. (2006). A note on the LASSO and related procedures in model selection. Statist. Sinica 16 1273–1284.
Mathematical Reviews (MathSciNet): MR2327490
Meinshausen, N. and Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
Mathematical Reviews (MathSciNet): MR2278363
Digital Object Identifier: doi:10.1214/009053606000000281
Project Euclid: euclid.aos/1152540754
Meinshausen, N. and Yu, B. (2006). Lasso-type recovery of sparse representations for high-dimensional data. Technical report, Dept. Statistics, Univ. California, Berkeley.
Osborne, M., Presnell, B. and Turlach, B. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–404.
Mathematical Reviews (MathSciNet): MR1773265
Digital Object Identifier: doi:10.1093/imanum/20.3.389
Osborne, M., Presnell, B. and Turlach, B. (2000b). On the lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
Mathematical Reviews (MathSciNet): MR1822089
Digital Object Identifier: doi:10.2307/1390657
Silverstein, J. W. (1985). The smallest eigenvalue of a large dimensional Wishart matrix. Ann. Probab. 13 1364–1368.
Mathematical Reviews (MathSciNet): MR806232
Digital Object Identifier: doi:10.1214/aop/1176992819
Project Euclid: euclid.aop/1176992819
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242
van de Geer, S. (2007). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
Mathematical Reviews (MathSciNet): MR2396809
Digital Object Identifier: doi:10.1214/009053607000000929
Project Euclid: euclid.aos/1205420513
Wainwright, M. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Available at http://www.arxiv.org/PS_cache/math/pdf/0605/0605740v1.pdf.
Zhao, P. and Yu, B. (2006). On model selection consistency of LASSO. J. Machine Learning Research 7 2541–2567.
Mathematical Reviews (MathSciNet): MR2274449
Zhang, C.-H. and Huang, J. (2006). Model-selection consistency of the LASSO in high-dimensional linear regression. Technical Report No. 2006-003, Dept. Statistics, Rutgers Univ.
Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
Mathematical Reviews (MathSciNet): MR2279469
Digital Object Identifier: doi:10.1198/016214506000000735

2008 © Institute of Mathematical Statistics