The Annals of Statistics

Oracle inequalities and optimal inference under group sparsity

Karim Lounici, Massimiliano Pontil, Sara van de Geer, and Alexandre B. Tsybakov
Source: Ann. Statist. Volume 39, Number 4 (2011), 2164-2204.

Abstract

We consider the problem of estimating a sparse linear regression vector β* under a Gaussian noise model, for the purpose of both prediction and model selection. We assume that prior knowledge is available on the sparsity pattern, namely the set of variables is partitioned into prescribed groups, only few of which are relevant in the estimation process. This group sparsity assumption suggests us to consider the Group Lasso method as a means to estimate β*. We establish oracle inequalities for the prediction and 2 estimation errors of this estimator. These bounds hold under a restricted eigenvalue condition on the design matrix. Under a stronger condition, we derive bounds for the estimation error for mixed (2, p)-norms with 1 ≤ p ≤ ∞. When p=∞, this result implies that a thresholded version of the Group Lasso estimator selects the sparsity pattern of β* with high probability. Next, we prove that the rate of convergence of our upper bounds is optimal in a minimax sense, up to a logarithmic factor, for all estimators over a class of group sparse vectors. Furthermore, we establish lower bounds for the prediction and 2 estimation errors of the usual Lasso estimator. Using this result, we demonstrate that the Group Lasso can achieve an improvement in the prediction and estimation errors as compared to the Lasso.

An important application of our results is provided by the problem of estimating multiple regression equations simultaneously or multi-task learning. In this case, we obtain refinements of the results in [In Proc. of the 22nd Annual Conference on Learning Theory (COLT) (2009)], which allow us to establish a quantitative advantage of the Group Lasso over the usual Lasso in the multi-task setting. Finally, within the same setting, we show how our results can be extended to more general noise distributions, of which we only require the fourth moment to be finite. To obtain this extension, we establish a new maximal moment inequality, which may be of independent interest.

First Page: Show Hide
Primary Subjects: 62J05
Secondary Subjects: 62C20, 62F07
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1319595462
Digital Object Identifier: doi:10.1214/11-AOS896
Zentralblatt MATH identifier: 05987688
Mathematical Reviews number (MathSciNet): MR2893865

References

[1] Aaker, D. A., Day, G. S. and Kumar, V. (1995). Marketing Research. Wiley.
[2] Argyriou, A., Evgeniou, T. and Pontil, M. (2008). Convex multi-task feature learning. Machine Learning 73 243–272.
[3] Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179–1225.
Mathematical Reviews (MathSciNet): MR2417268
[4] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
Mathematical Reviews (MathSciNet): MR2533469
Zentralblatt MATH: 1173.62022
Digital Object Identifier: doi:10.1214/08-AOS620
Project Euclid: euclid.aos/1245332830
[5] Borwein, J. M. and Lewis, A. S. (2006). Convex Analysis and Nonlinear Optimization: Theory and Examples, 2nd ed. CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC 3. Springer, New York.
Mathematical Reviews (MathSciNet): MR2184742
[6] Bunea, F., Tsybakov, A. and Wegkamp, M. (2007). Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1 169–194 (electronic).
Mathematical Reviews (MathSciNet): MR2312149
Zentralblatt MATH: 1146.62028
Digital Object Identifier: doi:10.1214/07-EJS008
Project Euclid: euclid.ejs/1179759718
[7] Bunea, F., Tsybakov, A. B. and Wegkamp, M. H. (2007). Aggregation for Gaussian regression. Ann. Statist. 35 1674–1697.
Mathematical Reviews (MathSciNet): MR2351101
Zentralblatt MATH: 1209.62065
Digital Object Identifier: doi:10.1214/009053606000001587
Project Euclid: euclid.aos/1188405626
[8] Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when p is much larger than n. Ann. Statist. 35 2313–2351.
Mathematical Reviews (MathSciNet): MR2382644
Zentralblatt MATH: 1139.62019
Digital Object Identifier: doi:10.1214/009053606000001523
Project Euclid: euclid.aos/1201012958
[9] Cavalier, L., Golubev, G. K., Picard, D. and Tsybakov, A. B. (2002). Oracle inequalities for inverse problems. Ann. Statist. 30 843–874.
Mathematical Reviews (MathSciNet): MR1922543
Zentralblatt MATH: 1029.62032
Digital Object Identifier: doi:10.1214/aos/1028674843
Project Euclid: euclid.aos/1028674843
[10] Chesneau, C. and Hebiri, M. (2008). Some theoretical results on the grouped variables Lasso. Math. Methods Statist. 17 317–326.
Mathematical Reviews (MathSciNet): MR2483460
Zentralblatt MATH: 05614402
Digital Object Identifier: doi:10.3103/S1066530708040030
[11] Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Statistical Science Series 25. Oxford Univ. Press, Oxford.
Mathematical Reviews (MathSciNet): MR2049007
Zentralblatt MATH: 1031.62002
[12] Donoho, D. L., Elad, M. and Temlyakov, V. N. (2006). Stable recovery of sparse overcomplete representations in the presence of noise. IEEE Trans. Inform. Theory 52 6–18.
Mathematical Reviews (MathSciNet): MR2237332
Digital Object Identifier: doi:10.1109/TIT.2005.860430
[13] Dümbgen, L., van de Geer, S. A., Veraar, M. C. and Wellner, J. A. (2010). Nemirovski’s inequalities revisited. Amer. Math. Monthly 117 138–160.
Mathematical Reviews (MathSciNet): MR2590193
Zentralblatt MATH: 1213.60039
Digital Object Identifier: doi:10.4169/000298910X476059
[14] Evgeniou, T., Pontil, M. and Toubia, O. (2007). A convex optimization approach to modeling consumer heterogeneity in conjoint estimation. Marketing Science 26 805–818.
[15] Hsiao, C. (2003). Analysis of Panel Data, 2nd ed. Econometric Society Monographs 34. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR1962511
Zentralblatt MATH: 0608.62145
[16] Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.
Mathematical Reviews (MathSciNet): MR2676890
Zentralblatt MATH: 1202.62051
Digital Object Identifier: doi:10.1214/09-AOS781
Project Euclid: euclid.aos/1278861249
[17] Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978–2004.
Mathematical Reviews (MathSciNet): MR2676881
Zentralblatt MATH: 1202.62052
Digital Object Identifier: doi:10.1214/09-AOS778
Project Euclid: euclid.aos/1278861240
[18] Koltchinskii, V. (2011). Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. Lecture Notes in Math. 2033. Springer, Berlin.
[19] Koltchinskii, V. and Yuan, M. (2008). Sparse recovery in large ensembles of kernel machines. In 21st Annual Conference on Learning Theory—COLT 2008, Helsinki, Finland, July 9-12, 2008 (R. A. Servedio and T. Zhang, eds.) 229–238. Omnipress.
[20] Lenk, P. J., DeSarbo, W. S., Green, P. E. and Young, M. R. (1996). Hierarchical Bayes conjoint analysis: Recovery of partworth heterogeneity from reduced experimental designs. Marketing Science 15 173–191.
[21] Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electron. J. Stat. 2 90–102.
Mathematical Reviews (MathSciNet): MR2386087
Zentralblatt MATH: 05274636
Digital Object Identifier: doi:10.1214/08-EJS177
Project Euclid: euclid.ejs/1202844625
[22] Lounici, K., Pontil, M., B Tsybakov, A. and van de Geer, S. A. (2009). Taking advantage of sparsity in multi-task learning. In Proc. of the 22nd Annual Conference on Learning Theory (COLT 2009) 73–82. Omnipress.
[23] Maurer, A. (2006). Bounds for linear multi-task learning. J. Mach. Learn. Res. 7 117–139.
Mathematical Reviews (MathSciNet): MR2274364
Zentralblatt MATH: 1222.68260
[24] Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53–71.
Mathematical Reviews (MathSciNet): MR2412631
Zentralblatt MATH: 05563343
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.00627.x
[25] Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist. 37 3779–3821.
Mathematical Reviews (MathSciNet): MR2572443
Zentralblatt MATH: 05644256
Digital Object Identifier: doi:10.1214/09-AOS692
Project Euclid: euclid.aos/1256303527
[26] Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models. Electron. J. Stat. 2 605–633.
Mathematical Reviews (MathSciNet): MR2426104
Digital Object Identifier: doi:10.1214/08-EJS200
Project Euclid: euclid.ejs/1217450797
[27] Nemirovski, A. (2000). Topics in non-parametric statistics. In Lectures on Probability Theory and Statistics (Saint-Flour, 1998). Lecture Notes in Math. 1738 85–277. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR1775640
Zentralblatt MATH: 0998.62033
[28] Obozinski, G., Wainwright, M. J. and Jordan, M. I. (2011). Support union recovery in high-dimensional multivariate regression. Ann. Statist. 39 1–47.
Mathematical Reviews (MathSciNet): MR2797839
Zentralblatt MATH: 05874488
Digital Object Identifier: doi:10.1214/09-AOS776
Project Euclid: euclid.aos/1291388368
[29] Petrov, V. V. (1995). Limit Theorems of Probability Theory: Sequences of Independent Random Variables. Oxford Studies in Probability 4. The Clarendon Press, New York.
Mathematical Reviews (MathSciNet): MR1353441
[30] Raskutti, G., Wainwright, M. J. and Yu, B. (2009). Minimax rates of estimation for high-dimensional linear regression over q-balls. Available at arXiv:0910.2042.
Mathematical Reviews (MathSciNet): MR2729873
Digital Object Identifier: doi:10.1109/TIT.2009.2016018
[31] Ravikumar, P., Liu, H., Lafferty, J. and Wasserman, L. (2008). Spam: Sparse additive models. In Advances in Neural Information Processing Systems (NIPS) (J. C. Platt, D. Koller, Y. Singer and S. Roweis, eds.) 22 1201–1208. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR2750255
Digital Object Identifier: doi:10.1111/j.1467-9868.2009.00718.x
[32] Rigollet, P. and Tsybakov, A. (2010). Exponential Screening and optimal rates of sparse estimation. Available at arXiv:1003.2654.
Mathematical Reviews (MathSciNet): MR2816337
Zentralblatt MATH: 1215.62043
Digital Object Identifier: doi:10.1214/10-AOS854
Project Euclid: euclid.aos/1299680953
[33] Rio, E. (2009). Moment inequalities for sums of dependent random variables under projective conditions. J. Theoret. Probab. 22 146–163.
Mathematical Reviews (MathSciNet): MR2472010
Zentralblatt MATH: 1160.60312
Digital Object Identifier: doi:10.1007/s10959-008-0155-9
[34] Srivastava, V. K. and Giles, D. E. A. (1987). Seemingly Unrelated Regression Equations Models: Estimation and Inference. Statistics: Textbooks and Monographs 80. Dekker, New York.
Mathematical Reviews (MathSciNet): MR930104
Zentralblatt MATH: 0638.62108
[35] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York. Revised and extended from the 2004 French original, Translated by Vladimir Zaiats.
Mathematical Reviews (MathSciNet): MR2724359
[36] van de Geer, S. A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
Mathematical Reviews (MathSciNet): MR2396809
Zentralblatt MATH: 1138.62323
Digital Object Identifier: doi:10.1214/009053607000000929
Project Euclid: euclid.aos/1205420513
[37] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
Mathematical Reviews (MathSciNet): MR1385671
Zentralblatt MATH: 0862.60002
[38] Wooldridge, J. M. (2002). Econometric Analysis of Cross Section and Panel Data. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR2768559
[39] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
Mathematical Reviews (MathSciNet): MR2212574
Zentralblatt MATH: 1141.62030
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00532.x
[40] Zellner, A. (1962). An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J. Amer. Statist. Assoc. 57 348–368.
Mathematical Reviews (MathSciNet): MR139235
Zentralblatt MATH: 0113.34902
Digital Object Identifier: doi:10.2307/2281644
[41] Zhang, C.-H. and Huang, J. (2008). The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist. 36 1567–1594.
Mathematical Reviews (MathSciNet): MR2435448
Zentralblatt MATH: 1142.62044
Digital Object Identifier: doi:10.1214/07-AOS520
Project Euclid: euclid.aos/1216237292
[42] Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
Mathematical Reviews (MathSciNet): MR2549566
Zentralblatt MATH: 05644286
Digital Object Identifier: doi:10.1214/07-AOS584
Project Euclid: euclid.aos/1250515393

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?