Electronic Journal of Statistics

Theoretical properties of the overlapping groups lasso

Daniel Percival

Full-text: Open access

Abstract

We present two sets of theoretical results on the grouped lasso with overlap due to Jacob, Obozinski and Vert (2009) in the linear regression setting. This method jointly selects predictors in sparse regression, allowing for complex structured sparsity over the predictors encoded as a set of groups. This flexible framework suggests that arbitrarily complex structures can be encoded with an intricate set of groups. Our results show that this strategy results in unexpected theoretical consequences for the procedure. In particular, we give two sets of results: (1) finite sample bounds on prediction and estimation, and (2) asymptotic distribution and selection. Both sets of results demonstrate negative consequences from choosing an increasingly complex set of groups for the procedure, as well for when the set of groups cannot recover the true sparsity pattern. Additionally, these results demonstrate the differences and similarities between the the grouped lasso procedure with and without overlapping groups. Our analysis shows that while the procedure enjoys advantages over the standard lasso, the set of groups must be chosen with caution — an overly complex set of groups will damage the analysis.

Article information

Source
Electron. J. Statist., Volume 6 (2012), 269-288.

Dates
First available in Project Euclid: 29 February 2012

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1330524560

Digital Object Identifier
doi:10.1214/12-EJS672

Mathematical Reviews number (MathSciNet)
MR2988408

Zentralblatt MATH identifier
1334.62131

Keywords
Sparsity variable selection structured sparsity regularized methods

Citation

Percival, Daniel. Theoretical properties of the overlapping groups lasso. Electron. J. Statist. 6 (2012), 269--288. doi:10.1214/12-EJS672. https://projecteuclid.org/euclid.ejs/1330524560


Export citation

References

  • Bach, F. (2008a). Consistency of the Group Lasso and Multiple Kernel Learning., Journal of Machine Learning Research 9 1179–1225.
  • Bach, F. (2008b). Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning. In, Advances in Neural Information Processing Systems. NIPS ’08.
  • Bach, F. (2010a). Shaping Level Sets with Submodular Functions Technical Report No., arXiv:1012.1501v1.
  • Bach, F. (2010b). Structured Sparsity-Inducing Norms through Submodular Functions. In, Advances in Neural Information Processing Systems. NIPS ’10.
  • Bickel, J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector., Annals of Statistics 37 1705–1732.
  • Chesneau, C. and Hebiri, M. (2008). Some theoretical results on the Grouped Variables Lasso., Mathematical Methods of Statistics 17 317–326.
  • Fu, W. and Knight, K. (2000). Asymptotics for lasso-type estimators., The Annals of Statistics 28 1356–1378.
  • Huang, J., Zhang, T. and Metaxas, D. (2009). Learning with structured sparsity. In, Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09 417–424. ACM, New York, NY, USA.
  • Huang, J. and Zhang, T. (2010). The Benefit of Group Sparsity., Annals of Statistics 38 1978–2004.
  • Jacob, L., Obozinski, G. and Vert, J.-P. (2009). Group lasso with overlap and graph lasso. In, Proceedings of the 26th Annual International Conference on Machine Learning. ICML ’09 433–440. ACM, New York, NY, USA.
  • Jenatton, R., Audibert, J.-Y. and Bach, F. (2009). Structured Variable Selection with Sparsity-Inducing Norms. Technical Report No., arXiv:0904.3523v3.
  • Jenatton, R., Obozinski, G. and Bach, F. (2010). Structured Sparse Principal Component Analysis. In, Proceedings of the International Conference on Artificial Intelligence and Statistics. AISTATS ’10.
  • Kim, S. and Xing, E. (2010). Tree-Guided Group Lasso for Multi-Task Regression with Structured Sparsity. In, Proceedings of the 27th International Conference on Machine Learningy. ICML ’10.
  • Lounici, K., Tsybakov, A. B., Pontil, M. and Geer, S. A. V. D. (2009). Taking Advantage of Sparsity in Multi-Task Learning. In, COLT 2009.
  • Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models., Electronic Journal of Statistics 2 605–633.
  • Peng, J., Zhu, J., Bergamaschi, A., Han, W., Noh, D.-Y., Pollack, J. R. and Wang, P. (2010). Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer., Annals Of Applied Statistics 4 53–77.
  • Percival, D., Roeder, K., Rosenfeld, R. and Wasserman, L. (2011). Structured, Sparse Regression With Application to HIV Drug Resistance., Annals Of Applied Statistics. To appear.
  • Tibshirani, R. (1996). Regression Shrinkage and Selection Via the Lasso., Journal of the Royal Statistical Society, Series B 58 267–288.
  • van der Vaart, A. W. and Wellner, J. A. (1998)., Weak Convergence and Empirical Processes: With Applications to Statistics. Springer.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., Journal of the Royal Statistical Society, Series B 68 49–67.
  • Zhao, P., Rocha, G., and Yu, B. (2007). The composite absolute penalties family for grouped and hierarchical variable selection., Annals Of Statistics 37 3468–3497.
  • Zou, H. (2006). The Adaptive Lasso and Its Oracle Properties., Journal of the American Statistical Association 101 1418-1429.