Electronic Journal of Statistics

Optimal two-step prediction in regression

Didier Chételat, Johannes Lederer, and Joseph Salmon

Full-text: Open access


High-dimensional prediction typically comprises two steps: variable selection and subsequent least-squares refitting on the selected variables. However, the standard variable selection procedures, such as the lasso, hinge on tuning parameters that need to be calibrated. Cross-validation, the most popular calibration scheme, is computationally costly and lacks finite sample guarantees. In this paper, we introduce an alternative scheme, easy to implement and both computationally and theoretically efficient.

Article information

Electron. J. Statist., Volume 11, Number 1 (2017), 2519-2546.

Received: May 2016
First available in Project Euclid: 2 June 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression
Secondary: 62J07: Ridge regression; shrinkage estimators

High-dimensional prediction tuning parameter selection lasso

Creative Commons Attribution 4.0 International License.


Chételat, Didier; Lederer, Johannes; Salmon, Joseph. Optimal two-step prediction in regression. Electron. J. Statist. 11 (2017), no. 1, 2519--2546. doi:10.1214/17-EJS1287. https://projecteuclid.org/euclid.ejs/1496390437

Export citation


  • [1] A. Antoniadis. Comments on: $\ell_1$-penalization for mixture regression models., TEST, 19(2):257–258, 2010.
  • [2] P. Bellec. Aggregation of supports along the Lasso path. In, COLT, pages 488–529, 2016.
  • [3] A. Belloni and V. Chernozhukov. Least squares after model selection in high-dimensional sparse models., Bernoulli, 19(2):521–547, 2013.
  • [4] A. Belloni, V. Chernozhukov, and L. Wang. Square-root Lasso: Pivotal recovery of sparse signals via conic programming., Biometrika, 98(4):791–806, 2011.
  • [5] P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector., Ann. Statist., 37(4) :1705–1732, 2009.
  • [6] P. Bühlmann and S. van de Geer., Statistics for high-dimensional data. Springer Series in Statistics. Springer, Heidelberg, 2011. Methods, theory and applications.
  • [7] F. Bunea, A. B. Tsybakov, and M. H. Wegkamp. Sparsity oracle inequalities for the Lasso., Electron. J. Stat., 1:169–194 (electronic), 2007.
  • [8] F. Bunea, Y. She, H. Ombao, A. Gongvatana, K. Devlin, and R. Cohen. Penalized least squares regression methods and applications to neuroimaging., Neuroimage, 55, 2011.
  • [9] F. Bunea, J. Lederer, and Y. She. The group square-root Lasso: Theoretical properties and fast algorithms., IEEE Trans. Inf. Theory, 60(2) :1313–1325, 2014.
  • [10] S. Chatterjee and J. Jafarov. Prediction error of cross-validated lasso., arXiv :1502.06291, 2015.
  • [11] M. Chichignoud and J. Lederer. A robust, adaptive M-estimator for pointwise estimation in heteroscedastic regression., Bernoulli, 20(3) :1560–1599, 2014.
  • [12] M. Chichignoud, J. Lederer, and M. Wainwright. Tuning Lasso for sup-norm optimality., J. Mach. Learn. Res., 17, 2016.
  • [13] A. S. Dalalyan, M. Hebiri, and J. Lederer. On the prediction performance of the Lasso., Bernoulli, 23(1):552–581, 2017.
  • [14] J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent., J. Stat. Softw., 33(1):1–22, 2010.
  • [15] C. Giraud, S. Huet, and N. Verzelen. High-dimensional regression with unknown variance., Statist. Sci., 27(4):500–518, 2012.
  • [16] T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring., Science, 286 (5439):531–537, 1999.
  • [17] B. Grünbaum., Convex Polytopes. Springer-Verlag, New York, second edition, 2003.
  • [18] N. Harris and A. Sepehri. The accessible lasso models., arXiv :1501.02559, 2015.
  • [19] M. Hebiri and J. Lederer. How correlations influence Lasso prediction., IEEE Transactions on Information Theory, 59 :1846–1854, 2013.
  • [20] V. Koltchinskii., Oracle inequalities in empirical risk minimization and sparse recovery problems, volume 2033 of Lecture Notes in Mathematics. Springer, Heidelberg, 2011.
  • [21] J. Lederer. Trust, but verify: benefits and pitfalls of least-squares refitting in high dimensions., arXiv :1306.0113 [stat.ME], 2013.
  • [22] J. Lederer and C. Müller. Don’t fall for tuning parameters: Tuning-free variable selection in high dimensions with the trex. In, Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
  • [23] D. Lee, J.and Sun and Y. Sun. Exact post-selection inference, with applications to the lasso., Preprint arXiv :1311.6238v5, 2015.
  • [24] O. Lepski. On a problem of adaptive estimation in gaussian white noise., Theory Probab. Appl., 35(3):454–466, 1990.
  • [25] O. Lepski, E. Mammen, and V. Spokoiny. Optimal spatial adaptation to inhomogeneous smoothness: an approach based on kernel estimates with variable bandwidth selectors., Ann. Statist., 25(3):929–947, 1997.
  • [26] J. Mairal and B. Yu. Complexity analysis of the lasso regularization path., Proceedings of the 29th International Conference on Machine Learning, 2012.
  • [27] N. Meinshausen and P. Bühlmann. Stability selection., J. Roy. Statist. Soc. Ser. B, 72(4):417–473, 2010.
  • [28] A. Owen. A robust hybrid of lasso and ridge regression., Contemporary Mathematics, 443:59–72, 2007.
  • [29] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python., J. Mach. Learn. Res., 12 :2825–2830, 2011.
  • [30] J. Sabourin, W. Valdar, and A. Nobel. A permutation approach for selecting the penalty parameter in penalized model selection., Biometrics, 71 :1185–1194, 2015.
  • [31] R. Schneider., Convex bodies: the Brunn–Minkowski theory, volume 151 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, second edition, 2013.
  • [32] R. Shah and R. Samworth. Variable selection with error control: another look at stability selection., J. Roy. Statist. Soc. Ser. B, 75(1):55–80, 2013.
  • [33] J. Shao and X. Deng. Estimation in high-dimensional linear models with deterministic design matrices., Ann. Statist., 40(2):812–831, 2012.
  • [34] N. Städler, P. Bühlmann, and Sara s van de Geer. $\ell_1$-penalization for mixture regression models., TEST, 19(2):209–256, 2010.
  • [35] T. Sun and C.-H. Zhang. Scaled sparse linear regression., Biometrika, 99(4):879–898, 2012.
  • [36] T. Sun and C.-H. Zhang. Sparse matrix inversion with scaled lasso., J. Mach. Learn. Res., 14 :3385–3418, 2013.
  • [37] R. Tibshirani. Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B, 58(1):267–288, 1996.
  • [38] R. J. Tibshirani and J. Taylor. Degrees of freedom in lasso problems., Ann. Statist., 40(2) :1198–1232, 2012.
  • [39] S. van de Geer and P. Bühlmann. On the conditions used to prove oracle results for the Lasso., Electron. J. Stat., 3 :1360–1392, 2009.
  • [40] X. Wang, D. Dunson, and C. Leng. No penalty no tears: Least squares in high-dimensional linear models., arXiv :1506.02222, 2015.
  • [41] L. Wasserman and K. Roeder. High dimensional variable selection., Ann. Stat., 37(5A) :2178, 2009.
  • [42] G. M. Ziegler., Lectures on polytopes, volume 152. Springer, 1995.