Electronic Journal of Statistics

The Lasso as an 1-ball model selection procedure

Pascal Massart and Caroline Meynet

Full-text: Open access


While many efforts have been made to prove that the Lasso behaves like a variable selection procedure at the price of strong (though unavoidable) assumptions on the geometric structure of these variables, much less attention has been paid to the oracle inequalities for the Lasso involving the 1-norm of the target vector. Such inequalities proved in the literature show that, provided that the regularization parameter is properly chosen, the Lasso approximately mimics the deterministic Lasso. Some of them do not require any assumption at all, neither on the structure of the variables nor on the regression function. Our first purpose here is to provide a conceptually very simple result in this direction in the framework of Gaussian models with non-random regressors.

Our second purpose is to propose a new estimator particularly adapted to deal with infinite countable dictionaries. This estimator is constructed as an 0-penalized estimator among a sequence of Lasso estimators associated to a dyadic sequence of growing truncated dictionaries. The selection procedure is choosing automatically the best level of truncation of the dictionary so as to make the best tradeoff between approximation, 1-regularization and sparsity. From a theoretical point of view, we shall provide an oracle inequality satisfied by this selected Lasso estimator.

The oracle inequalities presented in this paper are obtained via the application of a general theorem of model selection among a collection of nonlinear models which is a direct consequence of the Gaussian concentration inequality. The key idea that enables us to apply this general theorem is to see 1-regularization as a model selection procedure among 1-balls.

Article information

Electron. J. Statist., Volume 5 (2011), 669-687.

First available in Project Euclid: 25 July 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Lasso ℓ_1-oracle inequalities model selection by penalization ℓ_1-balls generalized linear Gaussian model


Massart, Pascal; Meynet, Caroline. The Lasso as an ℓ 1 -ball model selection procedure. Electron. J. Statist. 5 (2011), 669--687. doi:10.1214/11-EJS623. https://projecteuclid.org/euclid.ejs/1311600466

Export citation


  • [1] Barron, A.R., Cohen, A., Dahmen, W. and DeVore, R.A. Approximation and learning by greedy algorithms., Annals of Statistics, Vol. 36, No. 1, 64–94 (2008).
  • [2] Bartlett, P.L., Mendelson, S. and Neeman, J., 1-regularized linear regression: persistence and oracle inequalities. Preprint (2009).
  • [3] Birgé, L. and Massart, P. Gaussian model selection., Journal of the European Mathematical Society, No. 3, 203–268 (2001).
  • [4] Bickel, P.J., Ritov, Y. and Tsybakov, A.B. Simultaneous analysis of Lasso and Dantzig selector., Annals of Statistics, Vol. 37, No. 4, 1705–1732 (2009).
  • [5] Birgé, L. and Massart, P. Minimal penalties for Gaussian model selection., Probab. Theory Related Fields, 138, 33–73 (2007).
  • [6] Boucheron, S., Lugosi, G. and Massart, P., Concentration inequalities with applications. To appear.
  • [7] Bühlmann, P. and van de Geer, S. On the conditions used to prove oracle results for the Lasso., Electronic Journal of Statistics, Vol. 3, 1360–1392 (2009).
  • [8] Bunea, F., Tsybakov, A.B. and Wegkamp, M.H. Sparsity oracle inequalities for the Lasso., Electronic Journal of Statistics, Vol. 1, 169–194 (2007).
  • [9] Cohen, A., DeVore, R., Kerkyacharin, G. and Picard, D. Maximal spaces with given rate of convergence for thresholding algorithms., Applied and Computational Harmonic Analysis, 11, 167–191 (2001).
  • [10] DeVore, R.A. and Lorentz, G.G., Constructive Approximation. Springer-Verlag, Berlin (1993).
  • [11] Donoho, D.L. and Johnstone, I.M. Minimax estimation via wavelet shrinkage., Annals of Statistics, Vol. 36, No. 3, 879–921 (1998).
  • [12] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. Least Angle Regression., Annals of Statistics, Vol. 32, No. 2, 407–499 (2004).
  • [13] Härdle, W., Kerkyacharin, G., Picard, D. and Tsybakov, A., Wavelets, approximation, and statistical applications. Springer-Verlag, Paris-Berlin (1998).
  • [14] Haussler, D. Sphere packing numbers for subsets of the boolean, n-cube with bounded Vapnik-Chervonenkis dimension. Journal of Combinatorial Theory Series A, 69, 217–232 (1995).
  • [15] Huang, C., Cheang, G.H.L. and Barron, A.R. Risk of penalized least squares, greedy selection and, 1-penalization for flexible function libraries. Preprint (2008).
  • [16] Koltchinskii, V. Sparsity in penalized empirical risk minimization., Annals of Statistics, Vol. 45, No. 1, 7–57 (2009).
  • [17] Massart, P., Concentration inequalities and model selection. Ecole d’été de Probabilités de Saint-Flour 2003. Lecture Notes in Mathematics 1896, Springer Berlin-Heidelberg (2007).
  • [18] Massart, P. and Meynet, C. An, 1-oracle inequality for the Lasso. arXiv, 1007.4791 (2010).
  • [19] Meinshausen, N. and Yu, B. Lasso-type recovery of sparse representations for high-dimensional data., Annals of Statistics, Vol. 37, No. 1, 246–270 (2009).
  • [20] Rigollet, P. and Tsybakov, A. Exponential screening and optimal rates of sparse estimation. Preprint, (2010).
  • [21] Rivoirard, V. Nonlinear estimation over weak Besov spaces and minimax Bayes method., Bernoulli, Vol. 12, No. 4, 609–632 (2006).
  • [22] Tibshirani, R. Regression shrinkage and selection via the Lasso., Journal of the Royal Statistical Society, Series B, 58, 267–288 (1996).
  • [23] van de Geer, S.A. High dimensional generalized linear models and the Lasso., Annals of Statistics, Vol. 36, No. 2, 614–645 (2008).
  • [24] Zhang, C.H. and Huang, J. Model-selection consistency of the Lasso in high-dimensional linear regression., Annals of Statistics, Vol. 36, 1567–1594 (2008).
  • [25] Zhao, P. and Yu, B. On model selection consistency of Lasso., J. Machine Learning Res., 7, 2541–2567 (2007).