The Annals of Statistics

Simultaneous analysis of Lasso and Dantzig selector

Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov

Source: Ann. Statist. Volume 37, Number 4 (2009), 1705-1732.

Abstract

We show that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior. For both methods, we derive, in parallel, oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the p estimation loss for 1≤p≤2 in the linear model when the number of variables can be much larger than the sample size.

Primary Subjects: 60K35, 62G08
Secondary Subjects: 62C20, 62G05, 62G20
Keywords: Linear models; model selection; nonparametric statistics

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1245332830
Digital Object Identifier: doi:10.1214/08-AOS620
Zentralblatt MATH identifier: 05582008
Mathematical Reviews number (MathSciNet): MR2533469

References

[1] Bickel, P. J. (2007). Discussion of “The Dantzig selector: Statistical estimation when p is much larger than n,” by E. Candes and T. Tao. Ann. Statist. 35 2352–2357.
Mathematical Reviews (MathSciNet): MR2382645
Digital Object Identifier: doi:10.1214/009053607000000424
Project Euclid: euclid.aos/1201012959
[2] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2004). Aggregation for regression learning. Preprint LPMA, Univ. Paris 6–Paris 7, n 948. Available at arXiv:math. ST/0410214 and at https://hal.ccsd.cnrs.fr/ccsd-00003205.
[3] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via 1 penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory (COLT 2006) (G. Lugosi and H. U. Simon, eds.). Lecture Notes in Artificial Intelligence 4005 379–391. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR2280619
Digital Object Identifier: doi:10.1007/11776420_29
[4] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
Mathematical Reviews (MathSciNet): MR2351101
Digital Object Identifier: doi:10.1214/009053606000001587
Project Euclid: euclid.aos/1188405626
[5] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Statist. 1 169–194.
Mathematical Reviews (MathSciNet): MR2312149
Digital Object Identifier: doi:10.1214/07-EJS008
Project Euclid: euclid.ejs/1179759718
[6] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparse density estimation with 1 penalties. In Proceedings of 20th Annual Conference on Learning Theory (COLT 2007) (N. H. Bshouty and C. Gentile, eds.). Lecture Notes in Artificial Intelligence 4539 530–543. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR2397610
Digital Object Identifier: doi:10.1007/978-3-540-72927-3_38
[7] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
Mathematical Reviews (MathSciNet): MR2382644
Digital Object Identifier: doi:10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958
[8] Donoho, D. L., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
Mathematical Reviews (MathSciNet): MR2237332
Digital Object Identifier: doi:10.1109/TIT.2005.860430
[9] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
Mathematical Reviews (MathSciNet): MR2060166
Digital Object Identifier: doi:10.1214/009053604000000067
Project Euclid: euclid.aos/1083178935
[10] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 1 302–332.
Mathematical Reviews (MathSciNet): MR2415737
Digital Object Identifier: doi:10.1214/07-AOAS131
Project Euclid: euclid.aoas/1196438020
[11] Fu, W. and Knight, K. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356–1378.
Mathematical Reviews (MathSciNet): MR1805787
Digital Object Identifier: doi:10.1214/aos/1015957397
Project Euclid: euclid.aos/1015957397
[12] Greenshtein, E. and Ritov, Y. (2004). Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10 971–988.
[13] Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric estimation. Ann. Statist. 28 681–712.
[14] Koltchinskii, V. (2006). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. To appear.
Mathematical Reviews (MathSciNet): MR2500227
Digital Object Identifier: doi:10.1214/07-AIHP146
Project Euclid: euclid.aihp/1234469970
[15] Koltchinskii, V. (2007). Dantzig selector and sparsity oracle inequalities. Unpublished manuscript.
[16] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The Group Lasso for logistic regression. J. Roy. Statist. Soc. Ser. B 70 53–71.
[17] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
[18] Meinshausen, N. and Yu, B. (2006). Lasso type recovery of sparse representations for high dimensional data. Ann. Statist. To appear.
Mathematical Reviews (MathSciNet): MR2488351
Digital Object Identifier: doi:10.1214/07-AOS582
Project Euclid: euclid.aos/1232115934
[19] Nemirovski, A. (2000). Topics in nonparametric statistics. In Ecole d’Eté de Probabilités de Saint-Flour XXVIII—1998. Lecture Notes in Math. 1738. Springer, New York.
Mathematical Reviews (MathSciNet): MR1775640
[20] Osborne, M. R., Presnell, B. and Turlach, B. A (2000a). On the Lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
Mathematical Reviews (MathSciNet): MR1822089
Digital Object Identifier: doi:10.2307/1390657
[21] Osborne, M. R., Presnell, B. and Turlach, B. A (2000b). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–404.
[22] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
Mathematical Reviews (MathSciNet): MR1379242
[23] Tsybakov, A. B. (2006). Discussion of “Regularization in Statistics,” by P. Bickel and B. Li. TEST 15 303–310.
[24] Turlach, B. A. (2005). On algorithms for solving least squares problems under an L1 penalty or an L1 constraint. In 2004 Proceedings of the American Statistical Association, Statistical Computing Section [CD-ROM] 2572–2577. Amer. Statist. Assoc., Alexandria, VA.
[25] van de Geer, S. A. (2008). High dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
Mathematical Reviews (MathSciNet): MR2396809
Digital Object Identifier: doi:10.1214/009053607000000929
Project Euclid: euclid.aos/1205420513
[26] Zhang, C.-H. and Huang, J. (2008). Model-selection consistency of the Lasso in high-dimensional regression. Ann. Statist. 36 1567–1594.
Mathematical Reviews (MathSciNet): MR2435448
Digital Object Identifier: doi:10.1214/07-AOS520
Project Euclid: euclid.aos/1216237292
[27] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
Mathematical Reviews (MathSciNet): MR2274449

2009 © Institute of Mathematical Statistics