Honest variable selection in linear and logistic regression models via ℓ1 and ℓ1+ℓ2 penalization

Florentina Bunea

doi:10.1214/08-EJS287

2008 Honest variable selection in linear and logistic regression models via ℓ₁ and ℓ₁+ℓ₂ penalization

Florentina Bunea

Electron. J. Statist. 2: 1153-1194 (2008). DOI: 10.1214/08-EJS287

Abstract

This paper investigates correct variable selection in finite samples via ℓ₁ and ℓ₁+ℓ₂ type penalization schemes. The asymptotic consistency of variable selection immediately follows from this analysis. We focus on logistic and linear regression models. The following questions are central to our paper: given a level of confidence 1−δ, under which assumptions on the design matrix, for which strength of the signal and for what values of the tuning parameters can we identify the true model at the given level of confidence? Formally, if Î is an estimate of the true variable set I^*, we study conditions under which ℙ(Î=I^*)≥1−δ, for a given sample size n, number of parameters M and confidence 1−δ. We show that in identifiable models, both methods can recover coefficients of size $\frac{1}{\sqrt{n}}$, up to small multiplicative constants and logarithmic factors in M and $\frac{1}{\delta}$. The advantage of the ℓ₁+ℓ₂ penalization over the ℓ₁ is minor for the variable selection problem, for the models we consider here. Whereas the former estimates are unique, and become more stable for highly correlated data matrices as one increases the tuning parameter of the ℓ₂ part, too large an increase in this parameter value may preclude variable selection.

References

1.

[1] Armitage, P. (1955) Tests for linear trends in proportions and frequencies., Biometrics, 11 (3), 375–386.[1] Armitage, P. (1955) Tests for linear trends in proportions and frequencies., Biometrics, 11 (3), 375–386.

2.

[2] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. Simultaneous analysis of Lasso and Dantzig Selector., The Annals of Statistics: To appear. 1173.62022 10.1214/08-AOS620 euclid.aos/1245332830[2] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. Simultaneous analysis of Lasso and Dantzig Selector., The Annals of Statistics: To appear. 1173.62022 10.1214/08-AOS620 euclid.aos/1245332830

3.

[3] Bunea, F. (2008) Consistent selection via the Lasso for high dimensional approximating regression models., The IMS Collections 3 122–137.[3] Bunea, F. (2008) Consistent selection via the Lasso for high dimensional approximating regression models., The IMS Collections 3 122–137.

4.

[4] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007) Aggregation for Gaussian regression., The Annals of Statistics 35 (4), 1674–1697. 1209.62065 10.1214/009053606000001587 euclid.aos/1188405626[4] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007) Aggregation for Gaussian regression., The Annals of Statistics 35 (4), 1674–1697. 1209.62065 10.1214/009053606000001587 euclid.aos/1188405626

5.

[5] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso., The Electronic Journal of Statistics 1, 169–194. 1146.62028 10.1214/07-EJS008[5] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Sparsity oracle inequalities for the Lasso., The Electronic Journal of Statistics 1, 169–194. 1146.62028 10.1214/07-EJS008

6.

[6] Candés, E. J. and Plan, Y. (2007) Near-ideal model selection by, ℓ₁ minimization. Technical Report, Caltech.[6] Candés, E. J. and Plan, Y. (2007) Near-ideal model selection by, ℓ₁ minimization. Technical Report, Caltech.

7.

[7] Devroye, L. and Lugosi, G. (2001), Combinatorial methods in density estimation Springer-Verlag. 0964.62025[7] Devroye, L. and Lugosi, G. (2001), Combinatorial methods in density estimation Springer-Verlag. 0964.62025

8.

[8] Donoho, D. L., Elad, M. and Temlyakov, V. N. (2006) Stable recovery of sparse overcomplete representations in the presence of noise., IEEE Trans. Inform. Theory 52(1), 6–18. MR2237332 1288.94017 10.1109/TIT.2005.860430[8] Donoho, D. L., Elad, M. and Temlyakov, V. N. (2006) Stable recovery of sparse overcomplete representations in the presence of noise., IEEE Trans. Inform. Theory 52(1), 6–18. MR2237332 1288.94017 10.1109/TIT.2005.860430

9.

[9] Fan, J. and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association 96(456), 1348–1360. 1073.62547 10.1198/016214501753382273[9] Fan, J. and Li, R. (2001) Variable selection via nonconcave penalized likelihood and its oracle properties., Journal of the American Statistical Association 96(456), 1348–1360. 1073.62547 10.1198/016214501753382273

10.

[10] Greenshtein, E. (2006) Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint., The Annals of Statististics 34(5), 2367–2386. MR2291503 1106.62022 10.1214/009053606000000768 euclid.aos/1169571800[10] Greenshtein, E. (2006) Best subset selection, persistence in high-dimensional statistical learning and optimization under l1 constraint., The Annals of Statististics 34(5), 2367–2386. MR2291503 1106.62022 10.1214/009053606000000768 euclid.aos/1169571800

11.

[11] Koltchinskii, V. Sparsity in penalized empirical risk minimization., Technical report, School of Mathematics, Georgia Tech. 1168.62044 10.1214/07-AIHP146 euclid.aihp/1234469970[11] Koltchinskii, V. Sparsity in penalized empirical risk minimization., Technical report, School of Mathematics, Georgia Tech. 1168.62044 10.1214/07-AIHP146 euclid.aihp/1234469970

12.

[12] Lounici, K. (2008) Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators., The Electronic Journal of Statistics 2, 90–102. 1306.62155 10.1214/08-EJS177[12] Lounici, K. (2008) Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators., The Electronic Journal of Statistics 2, 90–102. 1306.62155 10.1214/08-EJS177

13.

[13] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso., The Annals of Statistics 34 (3), 1436–1462. 1113.62082 10.1214/009053606000000281 euclid.aos/1152540754[13] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso., The Annals of Statistics 34 (3), 1436–1462. 1113.62082 10.1214/009053606000000281 euclid.aos/1152540754

14.

[14] Meinshausen, N. and Yu, B. (2007) Lasso-type recovery of sparse representations for high dimensional data., To appear in the Annals of Statistics. 1155.62050 10.1214/07-AOS582 euclid.aos/1232115934[14] Meinshausen, N. and Yu, B. (2007) Lasso-type recovery of sparse representations for high dimensional data., To appear in the Annals of Statistics. 1155.62050 10.1214/07-AOS582 euclid.aos/1232115934

15.

[15] Osborne, M.R., Presnell, B. and Turlach, B.A (2000a). On the lasso and its dual., Journal of Computational and Graphical Statistics 9, 319–337.[15] Osborne, M.R., Presnell, B. and Turlach, B.A (2000a). On the lasso and its dual., Journal of Computational and Graphical Statistics 9, 319–337.

16.

[16] Osborne, M.R., Presnell, B. and Turlach, B.A (2000b). A new approach to variable selection in least squares problems., IMA Journal of Numerical Analysis 20(3), 389–404. 0962.65036 10.1093/imanum/20.3.389[16] Osborne, M.R., Presnell, B. and Turlach, B.A (2000b). A new approach to variable selection in least squares problems., IMA Journal of Numerical Analysis 20(3), 389–404. 0962.65036 10.1093/imanum/20.3.389

17.

[17] Ravikumar, P., Wainwright, M. J. and Lafferty, J. (2008) High-dimensional graphical model selection using, ℓ₁-regularized logistic regression. Technical Report, UC Berkeley, Dept of Statistics. 1189.62115 10.1214/09-AOS691 euclid.aos/1268056617[17] Ravikumar, P., Wainwright, M. J. and Lafferty, J. (2008) High-dimensional graphical model selection using, ℓ₁-regularized logistic regression. Technical Report, UC Berkeley, Dept of Statistics. 1189.62115 10.1214/09-AOS691 euclid.aos/1268056617

18.

[18] Steinwart, I. (2007) How to compare different loss functions and their risks., Constructive Approximation 26, 225–287. 1127.68089 10.1007/s00365-006-0662-3[18] Steinwart, I. (2007) How to compare different loss functions and their risks., Constructive Approximation 26, 225–287. 1127.68089 10.1007/s00365-006-0662-3

19.

[19] van de Geer, S. (2008) High-dimensional generalized linear models and the Lasso., The Annals of Statistics 36 (2), 614–645. 1138.62323 10.1214/009053607000000929 euclid.aos/1205420513[19] van de Geer, S. (2008) High-dimensional generalized linear models and the Lasso., The Annals of Statistics 36 (2), 614–645. 1138.62323 10.1214/009053607000000929 euclid.aos/1205420513

20.

[20] Zhang, T. (2007) Some sharp performance bounds for least squares regression with l1 regularization., Technical report, Rutgers University.[20] Zhang, T. (2007) Some sharp performance bounds for least squares regression with l1 regularization., Technical report, Rutgers University.

21.

[21] Zhao, P. and Yu, B. (2007). On model selection consistency of Lasso., Journal of Machine Learning Research 7, 2541–2567. 1222.62008[21] Zhao, P. and Yu, B. (2007). On model selection consistency of Lasso., Journal of Machine Learning Research 7, 2541–2567. 1222.62008

22.

[22] Zou, H. (2006) The adaptive lasso and its oracle properties., Journal of the American Statistical Association 101, 1418–1429. 1171.62326 10.1198/016214506000000735[22] Zou, H. (2006) The adaptive lasso and its oracle properties., Journal of the American Statistical Association 101, 1418–1429. 1171.62326 10.1198/016214506000000735

23.

[23] Zou, H. and Hastie, T. (2005) Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society, Series B 67 (2) 301–320. 1069.62054 10.1111/j.1467-9868.2005.00503.x[23] Zou, H. and Hastie, T. (2005) Regularization and variable selection via the elastic net., Journal of the Royal Statistical Society, Series B 67 (2) 301–320. 1069.62054 10.1111/j.1467-9868.2005.00503.x

24.

[24] Wainwright, M. J. (2007). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting., Technical Report, UC Berkeley, Department of Statistics. 1367.94106 10.1109/TIT.2009.2032816[24] Wainwright, M. J. (2007). Information-theoretic limits on sparsity recovery in the high-dimensional and noisy setting., Technical Report, UC Berkeley, Department of Statistics. 1367.94106 10.1109/TIT.2009.2032816

25.

[25] Wasserman, L. and Roeder, K. (2007). High dimensional variable selection., Technical Report, Carnegie Mellon University, Department of Statistics. 1173.62054 10.1214/08-AOS646 euclid.aos/1247663752[25] Wasserman, L. and Roeder, K. (2007). High dimensional variable selection., Technical Report, Carnegie Mellon University, Department of Statistics. 1173.62054 10.1214/08-AOS646 euclid.aos/1247663752

26.

[26] Wegkamp, M. H. (2007) Lasso type classifiers with a reject option., Electronic Journal of Statistics 1, 155–168. 1320.62153 10.1214/07-EJS058[26] Wegkamp, M. H. (2007) Lasso type classifiers with a reject option., Electronic Journal of Statistics 1, 155–168. 1320.62153 10.1214/07-EJS058

Citation Download Citation

Florentina Bunea "Honest variable selection in linear and logistic regression models via ℓ₁ and ℓ₁+ℓ₂ penalization," Electronic Journal of Statistics 2(none), 1153-1194, (2008). https://doi.org/10.1214/08-EJS287

Published: 2008

Access the abstract

JOURNAL ARTICLE
42 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY