We show that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior. For both methods, we derive, in parallel, oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the ℓp estimation loss for 1≤p≤2 in the linear model when the number of variables can be much larger than the sample size.
References
[1] Bickel, P. J. (2007). Discussion of “The Dantzig selector: Statistical estimation when p is much larger than n,” by E. Candes and T. Tao. Ann. Statist. 35 2352–2357.
[2] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2004). Aggregation for regression learning. Preprint LPMA, Univ. Paris 6–Paris 7, n○ 948. Available at arXiv:math. ST/0410214 and at https://hal.ccsd.cnrs.fr/ccsd-00003205.
[3] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via ℓ1 penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory (COLT 2006) (G. Lugosi and H. U. Simon, eds.). Lecture Notes in Artificial Intelligence 4005 379–391. Springer, Berlin.
[4] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
[5] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Statist. 1 169–194.
[6] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparse density estimation with ℓ1 penalties. In Proceedings of 20th Annual Conference on Learning Theory (COLT 2007) (N. H. Bshouty and C. Gentile, eds.). Lecture Notes in Artificial Intelligence 4539 530–543. Springer, Berlin.
[7] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
[8] Donoho, D. L., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
[9] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
[10] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 1 302–332.
[11] Fu, W. and Knight, K. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356–1378.
[12] Greenshtein, E. and Ritov, Y. (2004). Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10 971–988.
[13] Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric estimation. Ann. Statist. 28 681–712.
[14] Koltchinskii, V. (2006). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. To appear.
[15] Koltchinskii, V. (2007). Dantzig selector and sparsity oracle inequalities. Unpublished manuscript.
[16] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The Group Lasso for logistic regression. J. Roy. Statist. Soc. Ser. B 70 53–71.
[17] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
[18] Meinshausen, N. and Yu, B. (2006). Lasso type recovery of sparse representations for high dimensional data. Ann. Statist. To appear.
[19] Nemirovski, A. (2000). Topics in nonparametric statistics. In Ecole d’Eté de Probabilités de Saint-Flour XXVIII—1998. Lecture Notes in Math. 1738. Springer, New York.
[20] Osborne, M. R., Presnell, B. and Turlach, B. A (2000a). On the Lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
[21] Osborne, M. R., Presnell, B. and Turlach, B. A (2000b). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–404.
[22] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
[23] Tsybakov, A. B. (2006). Discussion of “Regularization in Statistics,” by P. Bickel and B. Li. TEST 15 303–310.
[24] Turlach, B. A. (2005). On algorithms for solving least squares problems under an L1 penalty or an L1 constraint. In 2004 Proceedings of the American Statistical Association, Statistical Computing Section [CD-ROM] 2572–2577. Amer. Statist. Assoc., Alexandria, VA.
[25] van de Geer, S. A. (2008). High dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
[26] Zhang, C.-H. and Huang, J. (2008). Model-selection consistency of the Lasso in high-dimensional regression. Ann. Statist. 36 1567–1594.
[27] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.