The Annals of Statistics

Slope meets Lasso: Improved oracle bounds and optimality

Pierre C. Bellec, Guillaume Lecué, and Alexandre B. Tsybakov

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

We show that two polynomial time methods, a Lasso estimator with adaptively chosen tuning parameter and a Slope estimator, adaptively achieve the minimax prediction and $\ell_{2}$ estimation rate $(s/n)\log(p/s)$ in high-dimensional linear regression on the class of $s$-sparse vectors in $\mathbb{R}^{p}$. This is done under the Restricted Eigenvalue (RE) condition for the Lasso and under a slightly more constraining assumption on the design for the Slope. The main results have the form of sharp oracle inequalities accounting for the model misspecification error. The minimax optimal bounds are also obtained for the $\ell_{q}$ estimation errors with $1\le q\le2$ when the model is well specified. The results are nonasymptotic, and hold both in probability and in expectation. The assumptions that we impose on the design are satisfied with high probability for a large class of random matrices with independent and possibly anisotropically distributed rows. We give a comparative analysis of conditions, under which oracle bounds for the Lasso and Slope estimators can be obtained. In particular, we show that several known conditions, such as the RE condition and the sparse eigenvalue condition are equivalent if the $\ell_{2}$-norms of regressors are uniformly bounded.

Article information

Source
Ann. Statist., Volume 46, Number 6B (2018), 3603-3642.

Dates
Received: May 2016
Revised: May 2017
First available in Project Euclid: 11 September 2018

Permanent link to this document
https://projecteuclid.org/euclid.aos/1536631285

Digital Object Identifier
doi:10.1214/17-AOS1670

Mathematical Reviews number (MathSciNet)
MR3852663

Zentralblatt MATH identifier
06965699

Subjects
Primary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43] 62G08: Nonparametric regression
Secondary: 62C20: Minimax procedures 62G05: Estimation 62G20: Asymptotic properties

Keywords
Sparse linear regression minimax rates high-dimensional statistics Slope Lasso

Citation

Bellec, Pierre C.; Lecué, Guillaume; Tsybakov, Alexandre B. Slope meets Lasso: Improved oracle bounds and optimality. Ann. Statist. 46 (2018), no. 6B, 3603--3642. doi:10.1214/17-AOS1670. https://projecteuclid.org/euclid.aos/1536631285


Export citation

References

  • [1] Abramovich, F. and Grinshtein, V. (2010). MAP model selection in Gaussian regression. Electron. J. Stat. 4 932–949.
  • [2] Abramowitz, M. and Stegun, I. A. (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards Applied Mathematics Series 55. Washington, DC.
  • [3] Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
  • [4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [5] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103–1140.
  • [6] Boucheron, S., Lugosi, G. and Massart, P. (2013). Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford Univ. Press, London.
  • [7] Candès, E. J. and Davenport, M. A. (2013). How well can we estimate a sparse vector? Appl. Comput. Harmon. Anal. 34 317–323.
  • [8] Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406–5425.
  • [9] Chafaï, D., Guédon, O., Lecué, G. and Pajor, A. (2012). Interactions Between Compressed Sensing Random Matrices and High Dimensional Geometry. Panoramas et Synthèses 37. Société Mathématique de France, Paris.
  • [10] Dalalyan, A. S., Hebiri, M., Lederer, J. et al. (2017). On the prediction performance of the Lasso. Bernoulli 23 552–581.
  • [11] Dirksen, S. (2015). Tail bounds via generic chaining. Electron. J. Probab. 20 no. 53, 29.
  • [12] Giraud, C. (2015). Introduction to High-Dimensional Statistics. Monographs on Statistics and Applied Probability. 139. CRC Press, Boca Raton, FL.
  • [13] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Ecole D’Ete de Probabilites de Saint-Flour XXXVIII-2008. Springer, New York.
  • [14] Koltchinskii, V., Lounici, K. and Tsybakov, A. B. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Statist. 39 2302–2329.
  • [15] Koltchinskii, V. and Mendelson, S. (2015). Bounding the smallest singular value of a random matrix without concentration. Int. Math. Res. Not. IMRN 23 12991–13008.
  • [16] Lecué, G. and Mendelson, S. (2015). Regularization and the small-ball method I: Sparse recovery Technical report, CNRS, ENSAE and Technion, I.I.T.
  • [17] Lecué, G. and Mendelson, S. (2017). Sparse recovery under weak moment assumptions. J. Eur. Math. Soc. (JEMS) 19 881–904.
  • [18] Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) 23. Springer, Berlin.
  • [19] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. A. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164–2204.
  • [20] Mendelson, S. (2014). Learning without concentration. In Proceedings of the 27th Annual Conference on Learning Theory COLT14 25–39.
  • [21] Mendelson, S. (2016). Upper bounds on product and multiplier empirical processes. Stochastic Process. Appl. 126 3652–3680.
  • [22] Mendelson, S. (2015). Learning without concentration. J. ACM 62 Art. 21, 25.
  • [23] Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $\ell_{q}$-balls. IEEE Trans. Inform. Theory 57 6976–6994.
  • [24] Rigollet, P. and Tsybakov, A. (2011). Exponential screening and optimal rates of sparse estimation. Ann. Statist. 39 731–771.
  • [25] Su, W. and Candès, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist. 44 1038–1068.
  • [26] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • [27] Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 38–90.
  • [28] Witold, B. (2013). Concentration via chaining method and its applications. Technical report, Univ. Warsaw. Available at arXiv:1405.0676.
  • [29] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls. J. Mach. Learn. Res. 11 3519–3540.