## Statistical Science

### A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems

#### Abstract

Concave regularization methods provide natural procedures for sparse recovery. However, they are difficult to analyze in the high-dimensional setting. Only recently a few sparse recovery results have been established for some specific local solutions obtained via specialized numerical procedures. Still, the fundamental relationship between these solutions such as whether they are identical or their relationship to the global minimizer of the underlying nonconvex formulation is unknown. The current paper fills this conceptual gap by presenting a general theoretical framework showing that, under appropriate conditions, the global solution of nonconvex regularization leads to desirable recovery performance; moreover, under suitable conditions, the global solution corresponds to the unique sparse local solution, which can be obtained via different numerical procedures. Under this unified framework, we present an overview of existing results and discuss their connections. The unified view of this work leads to a more satisfactory treatment of concave high-dimensional sparse estimation procedures, and serves as a guideline for developing further numerical procedures for concave regularization.

#### Article information

Source
Statist. Sci., Volume 27, Number 4 (2012), 576-593.

Dates
First available in Project Euclid: 21 December 2012

https://projecteuclid.org/euclid.ss/1356098557

Digital Object Identifier
doi:10.1214/12-STS399

Mathematical Reviews number (MathSciNet)
MR3025135

Zentralblatt MATH identifier
1331.62353

#### Citation

Zhang, Cun-Hui; Zhang, Tong. A General Theory of Concave Regularization for High-Dimensional Sparse Estimation Problems. Statist. Sci. 27 (2012), no. 4, 576--593. doi:10.1214/12-STS399. https://projecteuclid.org/euclid.ss/1356098557

#### References

• [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In Second International Symposium on Information Theory (Tsahkadsor, 1971) 267–281. Akadémiai Kiadó, Budapest.
• [2] Antoniadis, A. (2010). Comments on: $\ell_{1}$-penalization for mixture regression models. TEST 19 257–258.
• [3] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
• [4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [5] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5 232–253.
• [6] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• [7] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194.
• [8] Cai, T. T., Wang, L. and Xu, G. (2010). Shifting inequality and recovery of sparse signals. IEEE Trans. Signal Process. 58 1300–1308.
• [9] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
• [10] Candès, E. J. and Plan, Y. (2009). Near-ideal model selection by $\ell_{1}$ minimization. Ann. Statist. 37 2145–2177.
• [11] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
• [12] Chen, S. S., Donoho, D. L. and Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM Rev. 43 129–159.
• [13] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
• [14] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• [15] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
• [16] Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools (with discussion). Technometrics 35 109–148.
• [17] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
• [18] Huang, J., Ma, S. and Zhang, C.-H. (2008). Adaptive Lasso for sparse high-dimensional regression models. Statist. Sinica 18 1603–1618.
• [19] Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617–1642.
• [20] Kim, Y., Choi, H. and Oh, H.-S. (2008). Smoothly clipped absolute deviation on high dimensions. J. Amer. Statist. Assoc. 103 1665–1673.
• [21] Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
• [22] Koltchinskii, V. (2009). The Dantzig selector and sparsity oracle inequalities. Bernoulli 15 799–828.
• [23] Liu, J., Wonka, P. and Ye, J. (2010). Multi-stage Dantzig selector. In Advances in Neural Information Processing Systems (J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds.) 23 1450–1458.
• [24] Mallows, C. L. (1973). Some comments on Cp. Technometrics 12 661–675.
• [25] Mazumder, R., Friedman, J. and Hastie, T. (2012). SparseNet: Coordinate descent with non-convex penalties. J. Amer. Statist. Assoc. 106 1125–1138.
• [26] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• [27] Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
• [28] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403.
• [29] Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319–337.
• [30] Raskutti, G., Wainwright, M. J. and Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over $\ell _{q}$-balls. IEEE Trans. Inform. Theory 57 6976–6994.
• [31] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
• [32] Städler, N., Bühlmann, P. and van de Geer, S. (2010). $\ell_{1}$-penalization for mixture regression models. TEST 19 209–256.
• [33] Sun, T. and Zhang, C.-H. (2010). Comments on: $\ell _{1}$-penalization for mixture regression models. Test 19 270–275.
• [34] Sun, T. and Zhang, C.-H. (2011). Scaled sparse linear regression. Technical report. Available at arXiv:1104.4595.
• [35] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• [36] Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030–1051.
• [37] van de Geer, S. (2007). The deterministic lasso. Technical Report 140. ETH Zurich, Switzerland.
• [38] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
• [39] van de Geer, S. A. and Bühlmann, P. (2009). On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3 1360–1392.
• [40] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_{1}$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• [41] Ye, F. and Zhang, C.-H. (2010). Rate minimaxity of the Lasso and Dantzig selector for the $\ell_{q}$ loss in $\ell_{r}$ balls. J. Mach. Learn. Res. 11 3519–3540.
• [42] Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty. Ann. Statist. 38 894–942.
• [43] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
• [44] Zhang, C. H. and Zhang, T. (2012). Supplement to “A general theory of concave regularization for high dimensional sparse estimation problems.” DOI:10.1214/12-STS399SUPP.
• [45] Zhang, T. (2009). Some sharp performance bounds for least squares regression with $L_{1}$ regularization. Ann. Statist. 37 2109–2144.
• [46] Zhang, T. (2010). Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11 1081–1107.
• [47] Zhang, T. (2011). Adaptive forward-backward greedy algorithm for learning sparse representations. IEEE Trans. Inform. Theory 57 4689–4708.
• [48] Zhang, T. (2011). Multi-stage convex relaxation for feature selection. Technical report. Available at arXiv:1106.0565.
• [49] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
• [50] Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
• [51] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.

#### Supplemental materials

• Supplementary material: Supplementary material for “A general theory of concave regularization for high-dimensional sparse estimation problems”. Due to space considerations, the proofs in this paper are all given in the supplementary document [44].