Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436–1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the ℓα-loss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription.
Read more about accessing full-text
Alternatively, the document is available for a cost of $15. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
References
Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices, a review. Statist. Sinica 9 611–677.
Bunea, F., Tsybakov, A. and Wegkamp, M. (2006). Sparsity oracle inequalities for the lasso. Technical report M979, Dept. Statistics, Florida State Univ.
Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n (with discussion). Ann. Statist. 35 2313–2351.
Davidson, K. and Szarek, S. (2001). Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach Spaces I (W. B. Johnson and J. Lindenstrauss, eds.) 317–366. North-Holland, Amsterdam.
Donoho, D. L. (2006). For most large underdetermined systems of equations, the minimal ℓ1-norm near-solution approximates the sparsest near-solution. Comm. Pure Appl. Math. 59 907–934.
Donoho, D.L. and Johnstone, I. (1994). Minimax risk over ℓp-balls for ℓq-error. Probab. Theory Related Fields 99 277–303.
Eaton, M. L. (1983). Multivariate Statistics: A Vector Space Approach. Wiley, New York.
Mathematical Reviews (MathSciNet):
MR716321
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
Efron, B., Hastie, T. and Tibshirani, R. (2007). Discussion of: “The Dantzig selector: Statistical estimation when p is much larger than n”. Ann. Statist. 35 2358–2364.
Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
Mathematical Reviews (MathSciNet):
MR566592
Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
Huang, J., Ma, S. and Zhang, C.-H. (2007). Adaptive LASSO for sparse high-dimensional regression models. Statist. Sinica. To appear.
Knight, K. and Fu, W. J. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
Leng, C., Lin, Y. and Wahba, G. (2006). A note on the LASSO and related procedures in model selection. Statist. Sinica 16 1273–1284.
Meinshausen, N. and Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
Meinshausen, N. and Yu, B. (2006). Lasso-type recovery of sparse representations for high-dimensional data. Technical report, Dept. Statistics, Univ. California, Berkeley.
Osborne, M., Presnell, B. and Turlach, B. (2000a). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–404.
Osborne, M., Presnell, B. and Turlach, B. (2000b). On the lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
Silverstein, J. W. (1985). The smallest eigenvalue of a large dimensional Wishart matrix. Ann. Probab. 13 1364–1368.
Mathematical Reviews (MathSciNet):
MR806232
Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
van de Geer, S. (2007). High-dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
Wainwright, M. (2006). Sharp thresholds for high-dimensional and noisy recovery of sparsity. Available at http://www.arxiv.org/PS_cache/math/pdf/0605/0605740v1.pdf.
Zhao, P. and Yu, B. (2006). On model selection consistency of LASSO. J. Machine Learning Research 7 2541–2567.
Zhang, C.-H. and Huang, J. (2006). Model-selection consistency of the LASSO in high-dimensional linear regression. Technical Report No. 2006-003, Dept. Statistics, Rutgers Univ.
Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.