Electronic Journal of Statistics

The Smooth-Lasso and other 1+2-penalized methods

Mohamed Hebiri and Sara van de Geer

Full-text: Open access

Abstract

We consider a linear regression problem in a high dimensional setting where the number of covariates p can be much larger than the sample size n. In such a situation, one often assumes sparsity of the regression vector, i.e., the regression vector contains many zero components. We propose a Lasso-type estimator β̂Quad (where ‘Quad’ stands for quadratic) which is based on two penalty terms. The first one is the 1 norm of the regression coefficients used to exploit the sparsity of the regression as done by the Lasso estimator, whereas the second is a quadratic penalty term introduced to capture some additional information on the setting of the problem. We detail two special cases: the Elastic-Net β̂EN introduced in [42], which deals with sparse problems where correlations between variables may exist; and the Smooth-Lasso β̂SL, which responds to sparse problems where successive regression coefficients are known to vary slowly (in some situations, this can also be interpreted in terms of correlations between successive variables). From a theoretical point of view, we establish variable selection consistency results and show that β̂Quad achieves a Sparsity Inequality, i.e., a bound in terms of the number of non-zero components of the ‘true’ regression vector. These results are provided under a weaker assumption on the Gram matrix than the one used by the Lasso. In some situations this guarantees a significant improvement over the Lasso. Furthermore, a simulation study is conducted and shows that the S-Lasso β̂SL performs better than known methods as the Lasso, the Elastic-Net β̂EN, and the Fused-Lasso (introduced in [30]) with respect to the estimation accuracy. This is especially the case when the regression vector is ‘smooth’, i.e., when the variations between successive coefficients of the unknown parameter of the regression are small. The study also reveals that the theoretical calibration of the tuning parameters and the one based on 10 fold cross validation imply two S-Lasso solutions with close performance.

Article information

Source
Electron. J. Statist. Volume 5 (2011), 1184-1226.

Dates
First available in Project Euclid: 6 October 2011

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1317906993

Digital Object Identifier
doi:10.1214/11-EJS638

Mathematical Reviews number (MathSciNet)
MR2842904

Subjects
Primary: 62J05: Linear regression 62J07: Ridge regression; shrinkage estimators
Secondary: 62H20: Measures of association (correlation, canonical correlation, etc.) 62F12: Asymptotic properties of estimators

Keywords
Lasso Elastic-Net LARS sparsity variable selection restricted eigenvalues high-dimensional data

Citation

Hebiri, Mohamed; van de Geer, Sara. The Smooth-Lasso and other ℓ 1 + ℓ 2 -penalized methods. Electron. J. Statist. 5 (2011), 1184--1226. doi:10.1214/11-EJS638. https://projecteuclid.org/euclid.ejs/1317906993.


Export citation

References

  • [1] Bach, F. (2008). Consistency of the group Lasso and multiple kernel learning., J. Mach. Learn. Res., 9:1179–1225.
  • [2] Belloni, A. and Chernozhukov, V. (2010)., Post-1-Penalized Estimation in High Dimensional Sparse Linear Regression Models. Submitted.
  • [3] Bickel, P. and Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of lasso and Dantzig selector., Ann. Statist., 37(4):1705–1732
  • [4] Bunea, F. (2008). Consistent selection via the Lasso for high dimensional approximating regression models., IMS Collections, B. Clarke and S. Ghosal Editors, 3:122–138.
  • [5] Bunea, F. (2008). Honest variable selection in linear and logistic regression models via, 1 and 1+2 penalization. Electron. J. Stat. 2:1153–1194.
  • [6] Bunea, F. and Tsybakov, A. and Wegkamp, M. (2007). Aggregation for Gaussian regression., Ann. Statist., 35(4):1674–1697.
  • [7] Bunea, F. and Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso., Electron. J. Stat., 1:169–194.
  • [8] Chesneau, C. and Hebiri, M. (2008). Some theoretical results on the grouped variables Lasso., Math. Methods Statist., 17(4):317–326.
  • [9] Dalalyan, A. and Tsybakov, A. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In, Learning theory, volume 4539 of Lecture Notes in Comput. Sci., pages 97–111. Springer, Berlin.
  • [10] Daye, Z. John and Jeng, X. Jessie (2009). Shrinkage and model selection with correlated variables via weighted fusion., Computational Statistics & Data Analysis, 53(4):1284–1298.
  • [11] Dümbgen, L. and van de Geer, S. and Veraar, M. and Wellner, J. (2010). Nemirovski’s inequalities revisited., Amer. Math. Monthly, 117(2):138–160.
  • [12] Efron, B. and Hastie, T. and Johnstone, I. and Tibshirani, R. (2004). Least angle regression., Ann. Statist., 32(2):407–499. With discussion, and a rejoinder by the authors.
  • [13] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties., J. Amer. Statist. Assoc., 96(456):1348–1360.
  • [14] Hebiri, M. (2008). Regularization with the Smooth-Lasso procedure. Preprint Laboratoire de Probabilités et Modèles, Aléatoires.
  • [15] Jia, J. and Yu, B. (2008). On model selection consistency of elastic net when, pn. Tech. Report 756, Statistics, UC Berkeley.
  • [16] Kim, S. and Koh, K. and Boyd, S. and Gorinevsky, D. (2009)., l1 trend filtering. SIAM Rev. 51(2):339–360.
  • [17] Land, S. and Friedman, J. (1996). Variable fusion: a new method of adaptive signal regression. Manuscript.,
  • [18] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators., Electron. J. Stat., 2:90–102.
  • [19] Meier, L. and van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression., J. R. Stat. Soc. Ser. B Stat. Methodol., 70(1):53–71.
  • [20] Meinshausen, N. (2007). Relaxed Lasso Titre., Comput. Statist. Data Anal., 52(1):374–393.
  • [21] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso., Ann. Statist., 34(3):1436–1462.
  • [22] Meinshausen, N. and Meier, L. and Bühlmann, P. (2009). p-values for high-dimensional regression., J. Amer. Statist. Assoc., 104:1671-1681.
  • [23] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data., Ann. Statist., 37(1):246–270.
  • [24] Nesterov, Yu. (2007). Gradient methods for minimizing composite objective function. CORE Discussion Papers 2007076, Université catholique de Louvain, Center for Operations Research and Econometrics (CORE). Sep, 2007.
  • [25] Raskutti, G. and Wainwright, M. and Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over, q-balls. Submitted.
  • [26] Rigollet, P. and Tsybakov, A. (2010). Exponential Screening and optimal rates of sparse estimation., Submitted.
  • [27] Rinaldo, A. (2009). Properties and refinements of the fused lasso., Ann. Statist., 37(5B):2922–2952.
  • [28] Rosset, S. and Zhu, J. (2007). Piecewise linear regularized solution paths., Ann. Statist., 35(3):1012–1030.
  • [29] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso., J. Roy. Statist. Soc. Ser. B, 58(1):267–288.
  • [30] Tibshirani, R. and Saunders, M. and Rosset, S. and Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso., J. R. Stat. Soc. Ser. B Stat. Methodol., 67(1):91–108.
  • [31] Tibshirani, R.J. and Taylor, J. (2010). Regularization Paths for Least Squares Problems with Generalized, 1 Penalties. Submitted.
  • [32] Tsybakov, A. and van de Geer, S. (2005). Square root penalty: adaptation to the margin in classification and in edge estimation., Ann. Statist., 33(3):1203–1224.
  • [33] van de Geer, S. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso., Elect. Journ. Statist., 3:1360–1392.
  • [34] Wainwright, M. (2006). Sharp thresholds for noisy and high-dimensional recovery of sparsity using, 1-constrained quadratic programming. Manuscript.
  • [35] Ye, F. and Zhang, C. (2010). Rate Minimaxity of the Lasso and Dantzig Selector for the, q Loss in r Balls J. Mach. Learn. Res., 11:3519–3540.
  • [36] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., J. R. Stat. Soc. Ser. B Stat. Methodol., 68(1):49–67.
  • [37] Yuan, M. and Lin, Y. (2007). On the non-negative garrote estimator., J. R. Stat. Soc. Ser. B Stat. Methodol., 69(2):143–161.
  • [38] Yueh, W-C. (2005). Eigenvalues of several tridiagonal matrices., Appl. Math. E-Notes, 2:66–74 (electronic).
  • [39] Zhang, C-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression., Ann. Statist., 36(4):1567–1594.
  • [40] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso., J. Mach. Learn. Res., 7:2541–2563.
  • [41] Zou, H. (2006). The adaptive lasso and its oracle properties., J. Amer. Statist. Assoc., 101(476):1418–1429.
  • [42] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net., J. R. Stat. Soc. Ser. B Stat. Methodol., 67(2):301–320.
  • [43] Zou, H. and Zhang, H. (2009). On the adaptive elastic-net with a diverging number of parameters., Ann. Statist., 37(4):1733–1751.