The Annals of Statistics

Simultaneous analysis of Lasso and Dantzig selector

Peter J. Bickel, Ya’acov Ritov, and Alexandre B. Tsybakov

Full-text: Open access

Abstract

We show that, under a sparsity scenario, the Lasso estimator and the Dantzig selector exhibit similar behavior. For both methods, we derive, in parallel, oracle inequalities for the prediction risk in the general nonparametric regression model, as well as bounds on the p estimation loss for 1≤p≤2 in the linear model when the number of variables can be much larger than the sample size.

Article information

Source
Ann. Statist. Volume 37, Number 4 (2009), 1705-1732.

Dates
First available in Project Euclid: 18 June 2009

Permanent link to this document
http://projecteuclid.org/euclid.aos/1245332830

Digital Object Identifier
doi:10.1214/08-AOS620

Zentralblatt MATH identifier
05582008

Mathematical Reviews number (MathSciNet)
MR2533469

Subjects
Primary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43] 62G08: Nonparametric regression
Secondary: 62C20: Minimax procedures 62G05: Estimation 62G20: Asymptotic properties

Keywords
Linear models model selection nonparametric statistics

Citation

Bickel, Peter J.; Ritov, Ya’acov; Tsybakov, Alexandre B. Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 (2009), no. 4, 1705--1732. doi:10.1214/08-AOS620. http://projecteuclid.org/euclid.aos/1245332830.


Export citation

References

  • [1] Bickel, P. J. (2007). Discussion of “The Dantzig selector: Statistical estimation when p is much larger than n,” by E. Candes and T. Tao. Ann. Statist. 35 2352–2357.
  • [2] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2004). Aggregation for regression learning. Preprint LPMA, Univ. Paris 6–Paris 7, n 948. Available at arXiv:math. ST/0410214 and at https://hal.ccsd.cnrs.fr/ccsd-00003205.
  • [3] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2006). Aggregation and sparsity via 1 penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory (COLT 2006) (G. Lugosi and H. U. Simon, eds.). Lecture Notes in Artificial Intelligence 4005 379–391. Springer, Berlin.
  • [4] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
  • [5] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Statist. 1 169–194.
  • [6] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparse density estimation with 1 penalties. In Proceedings of 20th Annual Conference on Learning Theory (COLT 2007) (N. H. Bshouty and C. Gentile, eds.). Lecture Notes in Artificial Intelligence 4539 530–543. Springer, Berlin.
  • [7] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
  • [8] Donoho, D. L., Elad, M. and Temlyakov, V. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
  • [9] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–451.
  • [10] Friedman, J., Hastie, T., Höfling, H. and Tibshirani, R. (2007). Pathwise coordinate optimization. Ann. Appl. Statist. 1 302–332.
  • [11] Fu, W. and Knight, K. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356–1378.
  • [12] Greenshtein, E. and Ritov, Y. (2004). Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10 971–988.
  • [13] Juditsky, A. and Nemirovski, A. (2000). Functional aggregation for nonparametric estimation. Ann. Statist. 28 681–712.
  • [14] Koltchinskii, V. (2006). Sparsity in penalized empirical risk minimization. Ann. Inst. H. Poincaré Probab. Statist. To appear.
  • [15] Koltchinskii, V. (2007). Dantzig selector and sparsity oracle inequalities. Unpublished manuscript.
  • [16] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The Group Lasso for logistic regression. J. Roy. Statist. Soc. Ser. B 70 53–71.
  • [17] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • [18] Meinshausen, N. and Yu, B. (2006). Lasso type recovery of sparse representations for high dimensional data. Ann. Statist. To appear.
  • [19] Nemirovski, A. (2000). Topics in nonparametric statistics. In Ecole d’Eté de Probabilités de Saint-Flour XXVIII—1998. Lecture Notes in Math. 1738. Springer, New York.
  • [20] Osborne, M. R., Presnell, B. and Turlach, B. A (2000a). On the Lasso and its dual. J. Comput. Graph. Statist. 9 319–337.
  • [21] Osborne, M. R., Presnell, B. and Turlach, B. A (2000b). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–404.
  • [22] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [23] Tsybakov, A. B. (2006). Discussion of “Regularization in Statistics,” by P. Bickel and B. Li. TEST 15 303–310.
  • [24] Turlach, B. A. (2005). On algorithms for solving least squares problems under an L1 penalty or an L1 constraint. In 2004 Proceedings of the American Statistical Association, Statistical Computing Section [CD-ROM] 2572–2577. Amer. Statist. Assoc., Alexandria, VA.
  • [25] van de Geer, S. A. (2008). High dimensional generalized linear models and the Lasso. Ann. Statist. 36 614–645.
  • [26] Zhang, C.-H. and Huang, J. (2008). Model-selection consistency of the Lasso in high-dimensional regression. Ann. Statist. 36 1567–1594.
  • [27] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.