The Annals of Statistics

Support union recovery in high-dimensional multivariate regression

Guillaume Obozinski, Martin J. Wainwright, and Michael I. Jordan

Full-text: Open access


In multivariate regression, a K-dimensional response vector is regressed upon a common set of p covariates, with a matrix B ∈ ℝp × K of regression coefficients. We study the behavior of the multivariate group Lasso, in which block regularization based on the 1 ∕ 2 norm is used for support union recovery, or recovery of the set of s rows for which B is nonzero. Under high-dimensional scaling, we show that the multivariate group Lasso exhibits a threshold for the recovery of the exact row pattern with high probability over the random design and noise that is specified by the sample complexity parameter θ(n, p, s) := n ∕ [2ψ(B) log(ps)]. Here n is the sample size, and ψ(B) is a sparsity-overlap function measuring a combination of the sparsities and overlaps of the K-regression coefficient vectors that constitute the model. We prove that the multivariate group Lasso succeeds for problem sequences (n, p, s) such that θ(n, p, s) exceeds a critical level θu, and fails for sequences such that θ(n, p, s) lies below a critical level θ. For the special case of the standard Gaussian ensemble, we show that θ = θu so that the characterization is sharp. The sparsity-overlap function ψ(B) reveals that, if the design is uncorrelated on the active rows, 1 ∕ 2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach and can yield substantial improvements in sample complexity (up to a factor of K) when the coefficient vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the multivariate group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.

Article information

Ann. Statist., Volume 39, Number 1 (2011), 1-47.

First available in Project Euclid: 3 December 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J07: Ridge regression; shrinkage estimators
Secondary: 62F07: Ranking and selection

LASSO block-norm second-order cone program sparsity variable selection multivariate regression high-dimensional scaling simultaneous Lasso group Lasso


Obozinski, Guillaume; Wainwright, Martin J.; Jordan, Michael I. Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 (2011), no. 1, 1--47. doi:10.1214/09-AOS776.

Export citation


  • Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis. Wiley, New York.
  • Argyriou, A., Evgeniou, T. and Pontil, M. (2006). Multi-task feature learning. In Advances in Neural Information Processing Systems 19 41–48. MIT Press, Cambridge, MA.
  • Bach, F. (2008). Consistency of the group Lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179–1225.
  • Bach, F., Lanckriet, G. and Jordan, M. I. (2004). Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the 21st International Conference in Machine Learning 41–48. ACM, New York.
  • Bertsekas, D. P. (1995). Nonlinear Programming. Athena Scientific, Belmont, MA.
  • Bickel, P., Ritov, Y. and Tsybakov, A. (2009). Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
  • Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge, UK.
  • Chen, S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices, and Banach spaces. In Handbook of Banach Spaces 1 317–336. Elsevier, Amsterdam.
  • Donoho, D. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845–2862.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1361.
  • Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109–135.
  • Huang, J., Horowitz, J. L. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587–613.
  • Huang, J. and Zhang, T. (2009). The benefit of group sparsity. Technical report, Rutgers University. Available at arXiv:0901.2962.
  • Knight, K. and Fu, W. J. (2000). Asymptotics for Lasso-type estimators. Ann. Statist. 28 1356–1378.
  • Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1303–1338.
  • Ledoux, M. and Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer, New York.
  • Liu, H. and Zhang, J. (2008). On the 1q regularized regression. Technical report, Carnegie Mellon University. Available at arXiv:0802.1517v1.
  • Lounici, K., Tsybakov, A. B., Pontil, M. and van de Geer, S. A. (2009). Taking advantage of sparsity in multi-task learning. In Proceedings of the 22nd Conference on Learning Theory. Montreal.
  • Massart, P. (2003). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, New York.
  • Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. Roy. Statist. Soc. Ser. B 70 53–71.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. Ann. Statist. 37 246–270.
  • Negahban, S. and Wainwright, M. (2008). Joint support recovery under high-dimensional scaling: Benefits and perils of 1 ∕  regularization. In Advances in Neural Information Processing Systems 21 1161–1168. MIT Press, Cambridge, MA.
  • Obozinski, G., Taskar, B. and Jordan, M. I. (2010). Joint covariate selection and joint subspace selection for multiple classification problems. Statist. Comput. 20 231–252.
  • Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA J. Numer. Anal. 20 389–403.
  • Ravikumar, P., Liu, H., Lafferty, J. and Wasserman, L. (2009). SpAM: Sparse additive models. J. Roy. Statist. Soc. Ser. B 71 1009–1030.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Tropp, J. A. (2006). Just relax: Convex programming methods for identifying sparse signals in noise. IEEE Trans. Inform. Theory 52 1030–1051.
  • Turlach, B., Venables, W. and Wright, S. (2005). Simultaneous variable selection. Technometrics 27 349–363.
  • Wainwright, M. J. (2009a). Information-theoretic bounds on sparsity recovery in the high-dimensional and noisy setting. IEEE Trans. Inform. Theory 55 5728–5741.
  • Wainwright, M. J. (2009b). Sharp thresholds for high-dimensional and noisy sparsity recovery using 1-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183– 2202.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. Roy. Statist. Soc. Ser. B 68 49–67.
  • Zhang, H., Liu, H., Wu, Y. and Zhu, J. (2008). Variable selection for the multi-category SVM via adaptive sup-norm regularization. Electron. J. Statist. 2 1149–1167.
  • Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2567.