The Annals of Statistics

The benefit of group sparsity

Junzhou Huang and Tong Zhang

Full-text: Open access


This paper develops a theory for group Lasso using a concept called strong group sparsity. Our result shows that group Lasso is superior to standard Lasso for strongly group-sparse signals. This provides a convincing theoretical justification for using group sparse regularization when the underlying group structure is consistent with the data. Moreover, the theory predicts some limitations of the group Lasso formulation that are confirmed by simulation studies.

Article information

Ann. Statist., Volume 38, Number 4 (2010), 1978-2004.

First available in Project Euclid: 11 July 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation
Secondary: 62J05: Linear regression

L_1 regularization Lasso group Lasso regression sparsity group sparsity variable selection parameter estimation


Huang, Junzhou; Zhang, Tong. The benefit of group sparsity. Ann. Statist. 38 (2010), no. 4, 1978--2004. doi:10.1214/09-AOS778.

Export citation


  • [1] Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179–1225.
  • [2] Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • [3] Candes, E. J. and Tao, T. (2005). Decoding by linear programming. IEEE Trans. Inform. Theory 51 4203–4215.
  • [4] Ji, S., Dunson, D. and Carin, L. (2009). Multi-task compressive sensing. IEEE Trans. Signal Process. 57 92–106.
  • [5] Koltchinskii, V. and Yuan, M. (2008). Sparse recovery in large ensembles of kernel machines. In COLT’08. Omnipress, Madison, WI.
  • [6] Lounici, K., Pontil, M., Tsybakov, A. B. and van de Geer, S. A. (2009). Taking advantage of sparsity in multi-task learning. In COLT’09. Omnipress, Madison, WI.
  • [7] Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models. Electron. J. Stat. 2 605–633.
  • [8] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2008). Union support recovery in high-dimensional multivariate regression. Technical Report 761, Univ. California Press, Berkeley, CA.
  • [9] Pisier, G. (1989). The volume of convex bodies and Banach space geometry. Cambridge Univ. Press, Cambridge.
  • [10] Rauhut, H., Schnass, K. and Vandergheynst, P. (2008). Compressed sensing and redundant dictionaries. IEEE Trans. Inform. Theory 54 2210–2219.
  • [11] Stojnic, M., Parvaresh, F. and Hassibi, B. (2009). On the reconstruction of block-sparse signals with an optimal number of measurements. Trans. Signal. Process. 57 3075–3085.
  • [12] Wipf, D. and Rao, B. (2007). An empirical Bayesian strategy for solving the simultaneous sparse approximation problem. IEEE Trans. Signal Process. 55 3704–3716.
  • [13] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • [14] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
  • [15] Zhang, T. (2006). Information theoretical upper and lower bounds for statistical estimation. IEEE Trans. Inform. Theory 52 1307–1321.
  • [16] Zhang, T. (2009). Some sharp performance bounds for least squares regression with l1 regularization. Ann. Statist. 37 2109–2144.