Electronic Journal of Statistics

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

Sara van de Geer, Peter Bühlmann, and Shuheng Zhou

Full-text: Open access

Abstract

We revisit the adaptive Lasso as well as the thresholded Lasso with refitting, in a high-dimensional linear model, and study prediction error, q-error (q{1,2}), and number of false positive selections. Our theoretical results for the two methods are, at a rather fine scale, comparable. The differences only show up in terms of the (minimal) restricted and sparse eigenvalues, favoring thresholding over the adaptive Lasso. As regards prediction and estimation, the difference is virtually negligible, but our bound for the number of false positives is larger for the adaptive Lasso than for thresholding. We also study the adaptive Lasso under beta-min conditions, which are conditions on the size of the coefficients. We show that for exact variable selection, the adaptive Lasso generally needs more severe beta-min conditions than thresholding. Both the two-stage methods add value to the one-stage Lasso in the sense that, under appropriate restricted and sparse eigenvalue conditions, they have similar prediction and estimation error as the one-stage Lasso but substantially less false positives. Regarding the latter, we provide a lower bound for the Lasso with respect to false positive selections.

Article information

Source
Electron. J. Statist., Volume 5 (2011), 688-749.

Dates
First available in Project Euclid: 25 July 2011

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1311600467

Digital Object Identifier
doi:10.1214/11-EJS624

Mathematical Reviews number (MathSciNet)
MR2820636

Zentralblatt MATH identifier
1274.62471

Subjects
Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62G08: Nonparametric regression

Keywords
Adaptive Lasso estimation prediction restricted eigenvalue thresholding variable selection

Citation

van de Geer, Sara; Bühlmann, Peter; Zhou, Shuheng. The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. J. Statist. 5 (2011), 688--749. doi:10.1214/11-EJS624. https://projecteuclid.org/euclid.ejs/1311600467


Export citation

References

  • [1] S. Arlot and A. Celisse. A survey of cross-validation procedures for model selection., Statistics Surveys, 4:40–79, 2010.
  • [2] A. Barron, L. Birge, and P. Massart. Risk bounds for model selection via penalization., Probability Theory and Related Fields, 113:301–413, 1999.
  • [3] D. Bertsimas and J. Tsitsiklis., Introduction to linear optimization. Athena Scientific Belmont, MA, 1997.
  • [4] P. Bickel, Y. Ritov, and A. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector., Annals of Statistics, 37 :1705–1732, 2009.
  • [5] P. Bühlmann and S. van de Geer., Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, 2011.
  • [6] F. Bunea, A.B. Tsybakov, and M.H. Wegkamp. Aggregation and sparsity via, 1-penalized least squares. In Proceedings of 19th Annual Conference on Learning Theory, COLT 2006. Lecture Notes in Artificial Intelligence 4005, pages 379–391, Heidelberg, 2006. Springer Verlag.
  • [7] F. Bunea, A.B. Tsybakov, and M.H. Wegkamp. Aggregation for Gaussian regression., Annals of Statistics, 35 :1674–1697, 2007a.
  • [8] F. Bunea, A. Tsybakov, and M.H. Wegkamp. Sparsity oracle inequalities for the Lasso., Electronic Journal of Statistics, 1:169–194, 2007b.
  • [9] E. Candès and Y. Plan. Near-ideal model selection by, 1 minimization. Annals of Statistics, 37 :2145–2177, 2009.
  • [10] E. Candès and T. Tao. Decoding by linear programming., IEEE Transactions on Information Theory, 51 :4203–4215, 2005.
  • [11] E. Candès and T. Tao. The Dantzig selector: statistical estimation when p is much larger than n., Annals of Statistics, 35 :2313–2351, 2007.
  • [12] E.J. Candès, J.K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements., Communications on Pure and Applied Mathematics, 59 :1207–1223, 2006.
  • [13] EJ Candès, M. Wakin, and S. Boyd. Enhancing sparsity by reweighted 11 minimization., J. Fourier Anal. Appl, 14:877–905, 2008.
  • [14] L. De Haan and A. Ferreira., Extreme Value theory: an Introduction. Springer Verlag, 2006. ISBN 0387239464.
  • [15] J. Friedman, T. Hastie, and R. Tibshirani. Regularized paths for generalized linear models via coordinate descent., Journal of Statistical Software, 33, 2010.
  • [16] E. Greenshtein and Y. Ritov. Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization., Bernoulli, 10:971–988, 2004.
  • [17] J. Huang, S. Ma, and C.-H. Zhang. Adaptive Lasso for sparse high-dimensional regression models., Statistica Sinica, 18 :1603–1618, 2008.
  • [18] V. Koltchinskii. Sparsity in penalized empirical risk minimization., Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, 45:7–57, 2009a.
  • [19] V. Koltchinskii. The Dantzig selector and sparsity oracle inequalities., Bernoulli, 15:799–828, 2009b.
  • [20] K. Lounici. Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators., Electronic Journal of Statistics, 2:90–102, 2008.
  • [21] L. Meier, S. van de Geer, and P. Bühlmann. The group Lasso for logistic regression., Journal of the Royal Statistical Society Series B, 70:53–71, 2008.
  • [22] N. Meinshausen. Relaxed Lasso., Computational Statistics and Data Analysis, 52:374–393, 2007.
  • [23] N. Meinshausen and P. Bühlmann. High dimensional graphs and variable selection with the Lasso., Annals of Statistics, 34 :1436–1462, 2006.
  • [24] N. Meinshausen and B. Yu. Lasso-type recovery of sparse representations for high-dimensional data., Annals of Statistics, 37:246–270, 2009.
  • [25] R. Tibshirani. Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society Series B, 58:267–288, 1996.
  • [26] S. van de Geer. High-dimensional generalized linear models and the Lasso., Annals of Statistics, 36:614–645, 2008.
  • [27] S. van de Geer. On non-asymptotic bounds for estimation in generalized linear models with highly correlated design. In, Asymptotics: Particles, Processes and Inverse Problems (E.A. Cator, G. Jongbloed, C. Kraaikamp, H.P. Lopuhaä, J.A. Wellner eds.), volume 55, pages 121–134. IMS Lecture Notes Monograph Series, 2007.
  • [28] S. van de Geer. Least squares estimation with complexity penalties., Mathematical Methods of Statistics, pages 355–374, 2001.
  • [29] S. van de Geer and P. Bühlmann. On the conditions used to prove oracle results for the Lasso., Electronic Journal of Statistics, pages 1360–1392, 2009.
  • [30] M. Wainwright. Information-theoretic limitations on sparsity recovery in the high-dimensional and noisy setting., IEEE Transactions on Information Theory, 55 :5728–5741, 2007.
  • [31] M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using, 1-constrained quadratic programming (Lasso). IEEE Transactions on Information Theory, 55 :2183–2202, 2009.
  • [32] L. Wasserman and K. Roeder. High dimensional variable selection., Annals of Statistics, 37 :2178–2201, 2009.
  • [33] C.H. Zhang. Nearly unbiased variable selection under minimax concave penalty., Annals of Statistics, 38(2):894–942, 2010.
  • [34] C.H. Zhang and J. Huang. The sparsity and bias of the Lasso selection in high-dimensional linear regression., Annals of Statistics, 36(4) :1567–1594, 2008.
  • [35] T. Zhang. Some sharp performance bounds for least squares regression with, 1 regularization. Annals of Statistics, 37 :2109–2144, 2009.
  • [36] P. Zhao and B. Yu. On model selection consistency of Lasso., Journal of Machine Learning Research, 7 :2541–2567, 2006.
  • [37] S. Zhou. Thresholding procedures for high dimensional variable selection and statistical estimation. In, Advances in Neural Information Processing Systems 22. MIT Press, 2009.
  • [38] S. Zhou. Thresholded lasso for high dimensional variable selection and statistical estimation, 2010. arXiv:1002.1583v2, shorter version in Advances in Neural Information Processing Systems 22(NIPS, 2009).
  • [39] H. Zou. The adaptive Lasso and its oracle properties., Journal of the American Statistical Association, 101 :1418–1429, 2006.
  • [40] H. Zou and R. Li. One-step sparse estimates in nonconcave penalized likelihood models (with discussion)., Annals of Statistics, 36 :1509–1566, 2008.