## The Annals of Applied Statistics

### Sparsity with sign-coherent groups of variables via the cooperative-Lasso

#### Abstract

We consider the problems of estimation and selection of parameters endowed with a known group structure, when the groups are assumed to be sign-coherent, that is, gathering either nonnegative, nonpositive or null parameters. To tackle this problem, we propose the cooperative-Lasso penalty. We derive the optimality conditions defining the cooperative-Lasso estimate for generalized linear models, and propose an efficient active set algorithm suited to high-dimensional problems. We study the asymptotic consistency of the estimator in the linear regression setup and derive its irrepresentable conditions, which are milder than the ones of the group-Lasso regarding the matching of groups with the sparsity pattern of the true parameters. We also address the problem of model selection in linear regression by deriving an approximation of the degrees of freedom of the cooperative-Lasso estimator. Simulations comparing the proposed estimator to the group and sparse group-Lasso comply with our theoretical results, showing consistent improvements in support recovery for sign-coherent groups. We finally propose two examples illustrating the wide applicability of the cooperative-Lasso: first to the processing of ordinal variables, where the penalty acts as a monotonicity prior; second to the processing of genomic data, where the set of differentially expressed probes is enriched by incorporating all the probes of the microarray that are related to the corresponding genes.

#### Article information

Source
Ann. Appl. Stat. Volume 6, Number 2 (2012), 795-830.

Dates
First available in Project Euclid: 11 June 2012

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1339419617

Digital Object Identifier
doi:10.1214/11-AOAS520

Mathematical Reviews number (MathSciNet)
MR2976492

Zentralblatt MATH identifier
1243.62101

#### Citation

Chiquet, Julien; Grandvalet, Yves; Charbonnier, Camille. Sparsity with sign-coherent groups of variables via the cooperative-Lasso. Ann. Appl. Stat. 6 (2012), no. 2, 795--830. doi:10.1214/11-AOAS520. https://projecteuclid.org/euclid.aoas/1339419617

#### References

• Bach, F. R. (2008). Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 9 1179–1225.
• Bakin, S. (1999). Adaptive regression and model selection in data mining problems. Ph.D. thesis, Australian National Univ., Canberra.
• Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183–202.
• Breiman, L. (1995). Better subset regression using the nonnegative garrote. Technometrics 37 373–384.
• Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350–2383.
• Breiman, L., Friedman, J. H., Olshen, R. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Belmont, CA.
• Chiquet, J., Grandvalet, Y. and Ambroise, C. (2011). Inferring multiple graphical structures. Statistic and Computing 21 537–553.
• Efron, B. (2004). The estimation of prediction error: Covariance penalties and cross-validation. J. Amer. Statist. Assoc. 99 619–642.
• Eisen, M. B., Spellman, P. T., Brown, P. O. and Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95 14863–14868.
• Foygel, R. and Drton, M. (2010). Exact block-wise optimization in group lasso for linear regression. Technical report. Available at arXiv:1010.3320.
• Frank, A. and Asuncion, A. (2010). UCI machine learning repository.
• Friedman, J., Hastie, T. and Tibshirani, R. (2010). A note on the group Lasso and a sparse group Lasso. Technical report. Available at arXiv:1001.0736.
• Gertheiss, J. and Tutz, G. (2009). Penalized regression with ordinal predictors. International Statistical Review 77 345–365.
• Gertheiss, J. and Tutz, G. (2010). Sparse modeling of categorial explanatory variables. Ann. Appl. Stat. 4 2150–2180.
• Grandvalet, Y. and Canu, S. (1999). Outcomes of the equivalence of adaptive ridge with least absolute shrinkage. In Advances in Neural Information Processing Systems 11 (NIPS 1998) 445–451.
• Hess, K. R., Anderson, K., Symmans, W. F., Valero, V., Ibrahim, N., Mejia, J. A., Booser, D., Theriault, R. L., Buzdar, U., Dempsey, P. J., Rouzier, R., Sneige, N., Ross, J. S., Vidaurre, T., Gómez, H. L., Hortobagyi, G. N. and Pustzai, L. (2006). Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with Paclitaxel and Fluorouracil, Doxorubicin, and Cyclophosphamide in breast cancer. Journal of Clinical Oncology 24 4236–4244.
• Hesterberg, T., Choi, N. H., Meier, L. and Fraley, C. (2008). Least angle and $l_{1}$ penalized regression: A review. Stat. Surv. 2 61–93.
• Huang, J. and Zhang, T. (2010). The benefit of group sparsity. Ann. Statist. 38 1978–2004.
• Jeanmougin, M., Guedj, M. and Ambroise, C. (2011). Defining a robust biological prior from pathway analysis to drive network inference. J. SFdS 152 97–110.
• Knight, K. and Fu, W. (2000). Asymptotics for lasso-type estimators. Ann. Statist. 28 1356–1378.
• Ma, S., Song, X. and Huang, J. (2007). Supervised group Lasso with applications to microarray data analysis. BMC Bioinformatics 8 60.
• Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53–71.
• Nardi, Y. and Rinaldo, A. (2008). On the asymptotic properties of the group lasso estimator for linear models. Electron. J. Stat. 2 605–633.
• Osborne, M. R., Presnell, B. and Turlach, B. A. (2000). On the LASSO and its dual. J. Comput. Graph. Statist. 9 319–337.
• Park, M. Y., Hastie, T. and Tibshirani, R. (2007). Averaged gene expressions for regression. Biostatistics 8 212–227.
• Roth, V. and Fischer, B. (2008). The group-Lasso for generalized linear models: Uniqueness of solutions and efficient algorithms. In ICML’08: Proceedings of the 25th International Conference on Machine Learning 848–855.
• Rufibach, K. (2010). An active set algorithm to estimate parameters in generalized linear models with ordered predictors. Comput. Statist. Data Anal. 54 1442–1456.
• Serlin, R. C. and Levin, J. R. (1985). Teaching how to derive directly interpretable coding schemes for multiple regression analysis. Journal of Educational Statistics 10 223–238.
• Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135–1151.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
• van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
• Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
• Yuan, M. and Lin, Y. (2007). On the non-negative garrote estimator. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 143–161.
• Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
• Zhou, H., Sehl, M. E., Sinsheimer, J. S. and Lange, K. (2010). Association screening of common and rare genetic variants by penalized regression. Bioinformatics 26 2375–2382.
• Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.
• Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173–2192.