Open Access
December 2019 Sorted concave penalized regression
Long Feng, Cun-Hui Zhang
Ann. Statist. 47(6): 3069-3098 (December 2019). DOI: 10.1214/18-AOS1759
Abstract

The Lasso is biased. Concave penalized least squares estimation (PLSE) takes advantage of signal strength to reduce this bias, leading to sharper error bounds in prediction, coefficient estimation and variable selection. For prediction and estimation, the bias of the Lasso can be also reduced by taking a smaller penalty level than what selection consistency requires, but such smaller penalty level depends on the sparsity of the true coefficient vector. The sorted $\ell_{1}$ penalized estimation (Slope) was proposed for adaptation to such smaller penalty levels. However, the advantages of concave PLSE and Slope do not subsume each other. We propose sorted concave penalized estimation to combine the advantages of concave and sorted penalizations. We prove that sorted concave penalties adaptively choose the smaller penalty level and at the same time benefits from signal strength, especially when a significant proportion of signals are stronger than the corresponding adaptively selected penalty levels. A local convex approximation for sorted concave penalties, which extends the local linear and quadratic approximations for separable concave penalties, is developed to facilitate the computation of sorted concave PLSE and proven to possess desired prediction and estimation error bounds. Our analysis of prediction and estimation errors requires the restricted eigenvalue condition on the design, not beyond, and provides selection consistency under a required minimum signal strength condition in addition. Thus, our results also sharpens existing results on concave PLSE by removing the upper sparse eigenvalue component of the sparse Riesz condition.

References

1.

[1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Statist. 40 2452–2482. 1373.62244 10.1214/12-AOS1032 euclid.aos/1359987527[1] Agarwal, A., Negahban, S. and Wainwright, M. J. (2012). Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Statist. 40 2452–2482. 1373.62244 10.1214/12-AOS1032 euclid.aos/1359987527

2.

[2] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202. 1175.94009 10.1137/080716542[2] Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202. 1175.94009 10.1137/080716542

3.

[3] Bellec, P. C., Lecué, G. and Tsybakov, A. B. (2018). Slope meets Lasso: Improved oracle bounds and optimality. Ann. Statist. 46 3603–3642. 1405.62056 10.1214/17-AOS1670 euclid.aos/1536631285[3] Bellec, P. C., Lecué, G. and Tsybakov, A. B. (2018). Slope meets Lasso: Improved oracle bounds and optimality. Ann. Statist. 46 3603–3642. 1405.62056 10.1214/17-AOS1670 euclid.aos/1536631285

4.

[4] Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521–547. 06168762 10.3150/11-BEJ410 euclid.bj/1363192037[4] Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli 19 521–547. 06168762 10.3150/11-BEJ410 euclid.bj/1363192037

5.

[5] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732. 1173.62022 10.1214/08-AOS620 euclid.aos/1245332830[5] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732. 1173.62022 10.1214/08-AOS620 euclid.aos/1245332830

6.

[6] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103–1140. MR3418717 06525980 10.1214/15-AOAS842 euclid.aoas/1446488733[6] Bogdan, M., van den Berg, E., Sabatti, C., Su, W. and Candès, E. J. (2015). SLOPE—Adaptive variable selection via convex optimization. Ann. Appl. Stat. 9 1103–1140. MR3418717 06525980 10.1214/15-AOAS842 euclid.aoas/1446488733

7.

[7] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351. MR2382647 1139.62019 10.1214/009053607000000442 euclid.aos/1201012961[7] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351. MR2382647 1139.62019 10.1214/009053607000000442 euclid.aos/1201012961

8.

[8] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215. MR2243152 1264.94121 10.1109/TIT.2005.858979[8] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215. MR2243152 1264.94121 10.1109/TIT.2005.858979

9.

[9] Dalalyan, A. and Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp pac-Bayesian bounds and sparsity. Mach. Learn. 72 39–61.[9] Dalalyan, A. and Tsybakov, A. B. (2008). Aggregation by exponential weighting, sharp pac-Bayesian bounds and sparsity. Mach. Learn. 72 39–61.

10.

[10] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499. With discussion, and a rejoinder by the authors. MR2060166 1091.62054 10.1214/009053604000000067 euclid.aos/1083178935[10] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499. With discussion, and a rejoinder by the authors. MR2060166 1091.62054 10.1214/009053604000000067 euclid.aos/1083178935

11.

[11] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360. 1073.62547 10.1198/016214501753382273[11] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360. 1073.62547 10.1198/016214501753382273

12.

[12] Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error. Ann. Statist. 46 814–841. MR3782385 1392.62215 10.1214/17-AOS1568 euclid.aos/1522742437[12] Fan, J., Liu, H., Sun, Q. and Zhang, T. (2018). I-LAMM for sparse learning: Simultaneous control of algorithmic complexity and statistical error. Ann. Statist. 46 814–841. MR3782385 1392.62215 10.1214/17-AOS1568 euclid.aos/1522742437

13.

[13] Feng, L. and Zhang, C.-H. (2019). Supplement to “Sorted concave penalized regression.”  DOI:10.1214/18-AOS1759SUPP1435.62262 10.1214/18-AOS1759 euclid.aos/1572487384[13] Feng, L. and Zhang, C.-H. (2019). Supplement to “Sorted concave penalized regression.”  DOI:10.1214/18-AOS1759SUPP1435.62262 10.1214/18-AOS1759 euclid.aos/1572487384

14.

[14] Huang, J. and Zhang, C.-H. (2012). Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications. J. Mach. Learn. Res. 13 1839–1864. 1435.62091[14] Huang, J. and Zhang, C.-H. (2012). Estimation and selection via absolute penalized convex minimization and its multistage adaptive applications. J. Mach. Learn. Res. 13 1839–1864. 1435.62091

15.

[15] Loh, P.-L. and Wainwright, M. J. (2015). Regularized $M$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16 559–616. 1360.62276[15] Loh, P.-L. and Wainwright, M. J. (2015). Regularized $M$-estimators with nonconvexity: Statistical and algorithmic theory for local optima. J. Mach. Learn. Res. 16 559–616. 1360.62276

16.

[16] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462. 1113.62082 10.1214/009053606000000281 euclid.aos/1152540754[16] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462. 1113.62082 10.1214/009053606000000281 euclid.aos/1152540754

17.

[17] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557. 1331.62350 10.1214/12-STS400 euclid.ss/1356098555[17] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557. 1331.62350 10.1214/12-STS400 euclid.ss/1356098555

18.

[18] Nesterov, Y. (2007). Gradient methods for minimizing composite functions. Math. Program. 140 125–161. 1287.90067 10.1007/s10107-012-0629-5[18] Nesterov, Y. (2007). Gradient methods for minimizing composite functions. Math. Program. 140 125–161. 1287.90067 10.1007/s10107-012-0629-5

19.

[19] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403. 0962.65036 10.1093/imanum/20.3.389[19] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403. 0962.65036 10.1093/imanum/20.3.389

20.

[20] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319–337. MR1822089[20] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319–337. MR1822089

21.

[21] Parikh, N. and Boyd, S. (2013). Proximal algorithms. In Foundations and Trends in Optimization.[21] Parikh, N. and Boyd, S. (2013). Proximal algorithms. In Foundations and Trends in Optimization.

22.

[22] Ročková, V. and George, E. I. (2018). The spike-and-slab LASSO. J. Amer. Statist. Assoc. 113 431–444. 1398.62186 10.1080/01621459.2016.1260469[22] Ročková, V. and George, E. I. (2018). The spike-and-slab LASSO. J. Amer. Statist. Assoc. 113 431–444. 1398.62186 10.1080/01621459.2016.1260469

23.

[23] Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements. IEEE Trans. Inform. Theory 59 3434–3447. 1364.94158 10.1109/TIT.2013.2243201[23] Rudelson, M. and Zhou, S. (2013). Reconstruction from anisotropic random measurements. IEEE Trans. Inform. Theory 59 3434–3447. 1364.94158 10.1109/TIT.2013.2243201

24.

[24] Su, W. and Candès, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist. 44 1038–1068. 1338.62032 10.1214/15-AOS1397 euclid.aos/1460381686[24] Su, W. and Candès, E. (2016). SLOPE is adaptive to unknown sparsity and asymptotically minimax. Ann. Statist. 44 1038–1068. 1338.62032 10.1214/15-AOS1397 euclid.aos/1460381686

25.

[25] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898. 06111558 10.1093/biomet/ass043[25] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898. 06111558 10.1093/biomet/ass043

26.

[26] Sun, T. and Zhang, C.-H. (2013). Sparse matrix inversion with scaled lasso. J. Mach. Learn. Res. 14 3385–3418. 1318.62184[26] Sun, T. and Zhang, C.-H. (2013). Sparse matrix inversion with scaled lasso. J. Mach. Learn. Res. 14 3385–3418. 1318.62184

27.

[27] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288. 0850.62538 10.1111/j.2517-6161.1996.tb02080.x[27] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288. 0850.62538 10.1111/j.2517-6161.1996.tb02080.x

28.

[28] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030–1051. MR2238069 1288.94025 10.1109/TIT.2005.864420[28] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030–1051. MR2238069 1288.94025 10.1109/TIT.2005.864420

29.

[29] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392. 1327.62425 10.1214/09-EJS506[29] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392. 1327.62425 10.1214/09-EJS506

30.

[30] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202. 1367.62220 10.1109/TIT.2009.2016018[30] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202. 1367.62220 10.1109/TIT.2009.2016018

31.

[31] Wang, Z., Liu, H. and Zhang, T. (2014). Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Statist. 42 2164–2201. 1302.62066 10.1214/14-AOS1238 euclid.aos/1413810725[31] Wang, Z., Liu, H. and Zhang, T. (2014). Optimal computational and statistical rates of convergence for sparse nonconvex learning problems. Ann. Statist. 42 2164–2201. 1302.62066 10.1214/14-AOS1238 euclid.aos/1413810725

32.

[32] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls. J. Mach. Learn. Res. 11 3519–3540.[32] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls. J. Mach. Learn. Res. 11 3519–3540.

33.

[33] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942. MR2604701 1183.62120 10.1214/09-AOS729 euclid.aos/1266586618[33] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942. MR2604701 1183.62120 10.1214/09-AOS729 euclid.aos/1266586618

34.

[34] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594. 1142.62044 10.1214/07-AOS520 euclid.aos/1216237292[34] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594. 1142.62044 10.1214/07-AOS520 euclid.aos/1216237292

35.

[35] Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27 576–593. 1331.62353 10.1214/12-STS399 euclid.ss/1356098557[35] Zhang, C.-H. and Zhang, T. (2012). A general theory of concave regularization for high-dimensional sparse estimation problems. Statist. Sci. 27 576–593. 1331.62353 10.1214/12-STS399 euclid.ss/1356098557

36.

[36] Zhang, T. (2010). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107. 1242.68262[36] Zhang, T. (2010). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107. 1242.68262

37.

[37] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563. 1222.62008[37] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563. 1222.62008

38.

[38] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533. MR2435447 1142.62027 10.1214/07-AOS0316REJ euclid.aos/1216237291[38] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533. MR2435447 1142.62027 10.1214/07-AOS0316REJ euclid.aos/1216237291
Copyright © 2019 Institute of Mathematical Statistics
Long Feng and Cun-Hui Zhang "Sorted concave penalized regression," The Annals of Statistics 47(6), 3069-3098, (December 2019). https://doi.org/10.1214/18-AOS1759
Received: 1 November 2017; Published: December 2019
Vol.47 • No. 6 • December 2019
Back to Top