Bayesian Analysis

Bayesian Variable Selection Regression of Multivariate Responses for Group Data

B. Liquet, K. Mengersen, A. N. Pettitt, and M. Sutton

Full-text: Open access

Abstract

We propose two multivariate extensions of the Bayesian group lasso for variable selection and estimation for data with high dimensional predictors and multi-dimensional response variables. The methods utilize spike and slab priors to yield solutions which are sparse at either a group level or both a group and individual feature level. The incorporation of group structure in a predictor matrix is a key factor in obtaining better estimators and identifying associations between multiple responses and predictors. The approach is suited to many biological studies where the response is multivariate and each predictor is embedded in some biological grouping structure such as gene pathways. Our Bayesian models are connected with penalized regression, and we prove both oracle and asymptotic distribution properties under an orthogonal design. We derive efficient Gibbs sampling algorithms for our models and provide the implementation in a comprehensive R package called MBSGS available on the Comprehensive R Archive Network (CRAN). The performance of the proposed approaches is compared to state-of-the-art variable selection strategies on simulated data sets. The proposed methodology is illustrated on a genetic dataset in order to identify markers grouping across chromosomes that explain the joint variability of gene expression in multiple tissues.

Article information

Source
Bayesian Anal. Volume 12, Number 4 (2017), 1039-1067.

Dates
First available in Project Euclid: 26 October 2017

Permanent link to this document
https://projecteuclid.org/euclid.ba/1508983455

Digital Object Identifier
doi:10.1214/17-BA1081

Keywords
Bayesian variable selection multivariate regression sparsity spike and slab

Rights
Creative Commons Attribution 4.0 International License.

Citation

Liquet, B.; Mengersen, K.; Pettitt, A. N.; Sutton, M. Bayesian Variable Selection Regression of Multivariate Responses for Group Data. Bayesian Anal. 12 (2017), no. 4, 1039--1067. doi:10.1214/17-BA1081. https://projecteuclid.org/euclid.ba/1508983455


Export citation

References

  • Babacan, S. D., Nakajima, S., and Do, M. N. (2014). “Bayesian Group-Sparse Modeling and Variational Inference.”IEEE Transactions on Signal Processing, 62(11): 2906–2921.
  • Brown, P. J., Vannucci, M., and Fearn, T. (1998). “Multivariate Bayesian Variable Selection and Prediction.”Journal of the Royal Statistical Society. Series B (Statistical Methodology), 60(3): 627–641.
  • Cai, T. T., Li, H., Liu, W., and Xie, J. (2013). “Covariate-adjusted precision matrix estimation with an application in genetical genomics.”Biometrika, 100(1): 139.
  • Dawid, A. P. (1981). “Some matrix-variate distribution theory: notational considerations and a Bayesian application.”Biometrika, 68: 265–274.
  • Eltoft, T., Kim, T., and Lee, T.-W. (2006). “Multivariate Scale Mixture of Gaussians Modeling.” InIndependent Component Analysis and Blind Signal Separation, 799–806. Springer, Berlin, Heidelberg.
  • Folks, J. and Chhikara, R. (1978). “The inverse Gaussian distribution and its statistical application–a review.”Journal of the Royal Statistical Society. Series B, 263–289.
  • Friedman, J., Hastie, T., and Tibshirani, R. (2010). “Regularization Paths for Generalized Linear Models via Coordinate Descent.”Journal of Statistical Software, 33(1): 1–22.
  • Garcia, T. P., Muller, S., Carroll, R. J., and Walzem, R. L. (2014). “Identification of important regressor groups, subgroups and individuals via regularization methods: application to gut microbiome data.”Bioinformatics, 30(6): 831–837.
  • Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2014).Bayesian data analysis, volume 2. Chapman & Hall/CRC Boca Raton, FL, USA.
  • Greenlaw, K., Szefer, E., Graham, J., Lesperance, M., and Nathoo, F. S. (2016). “A Bayesian Group Sparse Multi-Task Regression Model for Imaging Genetics.”ArXiv:1605.02234.
  • Heinig, M., Petretto, E., Wallace, C., Bottolo, L., Rotival, M., Lu, H., Li, Y., Sarwar, R., Langley, S., Bauerfeind, A., Hummel, O., Lee, Y., Paskas, S., Rintisch, C., Saar, K., Cooper, J., Buchan, R., Gray, E., Cyster, J., Erdmann, J., Hengstenberg, C., Maouche, S., Ouwehand, W., Rice, C., Samani, N., Schunkert, H., Goodall, A., Schulz, H., Roider, H., Vingron, M., Blankenberg, S., Munzel, T., Zeller, T., Szymczak, S., Ziegler, A., Tiret, L., Smyth, D., Pravenec, M., Aitman, T., Cambien, F., Clayton, D., Todd, J., Hubner, N., and Cook, S. (2010). “A Trans-Acting Locus Regulates an Anti-Viral Expression Network and Type 1 Diabetes Risk.”Nature, 467(7314): 460–464.
  • Hobert, J. P. and Geyer, C. J. (1998). “Geometric Ergodicity of Gibbs and Block Gibbs Samplers for a Hierarchical Random Effects Model.”Journal of Multivariate Analysis, 67(2): 414–430.
  • Huang, J., Breheny, P., and Ma, S. (2012). “A Selective Review of Group Selection in High-Dimensional Models.”Statistical Science, 27(4): 481–499.
  • Huang, J. and Zhang, T. (2010). “The Benefit of Group Sparsity.”Annals of Statistics, 38(4): 1978–2004.
  • Johnstone, I. M. and Silverman, B. W. (2004). “Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences.”Annals of Statistics, 32(4): 1594–1649.
  • Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). “Penalized regression, standard errors, and Bayesian lassos.”Bayesian Analysis, 5(2): 369–411.
  • Lee, W. and Liu, Y. (2012). “Simultaneous multiple response regression and inverse covariance matrix estimation via penalized Gaussian maximum likelihood.”Journal of Multivariate Analysis, 111: 241–255.
  • Leng, C., Tran, M.-N., and Nott, D. (2014). “Bayesian adaptive Lasso.”Annals of the Institue of Statistical Mathematics, 66(2): 221–244.
  • Li, Y., Nan, B., and Zhu, J. (2015). “Multivariate Sparse Group Lasso for the Multivariate Multiple Linear Regression with an Arbitrary Group Structure.”Biometrics, 71(2): 354–363.
  • Liquet, B., Bottolo, L., Campanella, G., Richardson, S., and Chadeau–Hyam, M. (2016a). “R2GUESS: A Graphics Processing Unit-Based R Package for Bayesian Variable Selection Regression of Multivariate Responses.”Journal of Statistical Software, 69(1): 1–32. URLhttps://www.jstatsoft.org/index.php/jss/article/view/v069i02.
  • Liquet, B., Lafaye de Micheaux, P., Hejblum, B., and Thiebaut, R. (2016b). “Group and Sparse Group Partial Least Square Approaches Applied in Genomics Context.”Bioinformatics, 3(1): 35–42.
  • Liquet, B., Mengersen, K., Pettitt, A. N., and Sutton, M. (2017). “Supplementary Material of the “Bayesian Variable Selection Regression of Multivariate Responses for Group Data”.”Bayesian Analysis.
  • Liquet, B. and Sutton, M. (2017).MBSGS: Multivariate Bayesian Sparse Group Selection with Spike and Slab. R package version 1.1.0. URLhttp://CRAN.R-project.org/package=MBSGS.
  • Ma, S., Song, X., and Huang, J. (2007). “Supervised Group Lasso with Applications to Microarray Data Analysis.”BMC Bioinformatics, 8(1): 60.
  • Meier, L., Svd, G., and Buhlmann, P. (2008). “The group lasso for logistic regression.”Journal of the Royal Statistical Society Series B, 70(Part 1): 53–71.
  • Nardi, Y. and Rinaldo, A. (2008). “On the asymptotic properties of the group lasso estimator for linear models.”Electronic Journal of Statistics, 2: 605–633.
  • Park, T. and Casella, G. (2008). “The Bayesian Lasso.”Journal of the American Statistical Association, 103(482): 681–686.
  • Petretto, E., Bottolo, L., Langley, S. R., Heinig, M., McDermott-Roe, C., Sarwar, R., Pravenec, M., Hübner, N., Aitman, T. J., Cook, S. A., and Richardson, S. (2010). “New Insights into the Genetic Control of Gene Expression Using a Bayesian Multi-Tissue Approach.”PLOS Computational Biology, 6(4): e1000737.
  • Puig, A., Wiesel, A., and Hero, A. (2009). “A multidimensional shrinkage-thresholding operator.” InStatistical Signal Processing, 2009. SSP ’09. IEEE/SP 15th Workshop on, 113–116. IEEE.
  • Raman, S., Fuchs, T. J., Wild, P. J., Dahl, E., and Roth, V. (2009). “The Bayesian group-Lasso for Analyzing Contingency Tables.” InProceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, 881–888. New York, NY, USA: ACM.
  • Rockova, V. and Lesaffre, E. (2014). “Incorporating Grouping Information in Bayesian Variable Selection with Applications in Genomics.”Bayesian Analysis, 9(1): 221–258.
  • Rothman, A. J. (2017).MRCE: Multivariate Regression with Covariance Estimation. R package version 2.1. URLhttps://CRAN.R-project.org/package=MRCE.
  • Rothman, A. J., Levina, E., and Zhu, J. (2010). “Sparse Multivariate Regression With Covariance Estimation.”Journal of Computational and Graphical Statistics, 19(4): 947–962.
  • Scheipl, F., Fahrmeir, L., and Kneib, T. (2012). “Spike-and-Slab Priors for Function Selection in Structured Additive Regression Models.”Journal of the American Statistical Association, 107(500): 1518–1532.
  • Simon, N. (2013). “A sparse group lasso.”Journal of Computational and Graphical Statistics, 22(2): 231–245.
  • Simon, N., Friedman, J., Hastie, T., and Tibshirani, R. (2013).SGL: Fit a GLM (or cox model) with a combination of lasso and group lasso regularization. R package version 1.1. URLhttp://CRAN.R-project.org/package=SGL.
  • Simon, N. and Tibshirani, R. (2012). “Standarization and the group lasso penalty.”Statistica Sinica, 22: 983–1001.
  • Stingo, F. C., Chen, Y. A., Tadesse, M. G., and Vannucci, M. (2011). “Incorporating biological information into linear models: A Bayesian approach to the selection of pathways and genes.”The Annals of Applied Statistics, 5(3): 1978–2002.
  • Subramanian, A., Tamayo, P., Mootha, V., Mukherjee, S., Ebert, B., Gillette, M., Paulovich, A., Pomeroy, S., Golub, T., Lander, E., and Mesirov, J. (2005). “Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.”Proceedings of the National Academy of Sciences of the United States of America, 102(43): 15545–50.
  • Wang, H. and Leng, C. (2008). “A note on adaptive group lasso.”Computational Statistics & Data Analysis, 52(12): 5277–5286.
  • Wang, H., Nie, F., Huang, H., Kim, S., Nho, K., Risacher, S. L., Saykin, A. J., Shen, L., and the Alzheimer’s Disease Neuroimaging Initiative, F. (2012). “Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort.”Bioinformatics, 28(2): 229–237.
  • Wen, X. (2014). “Bayesian Model Selection in Complex Linear Systems, as Illustrated in Genetic Association Studies.”Biometrics, 70(1): 73–83.
  • Xu, X. and Ghosh, M. (2015). “Bayesian Variable Selection and Estimation for Group Lasso.”Bayesian Analysis, 10(4): 909–936.
  • Yuan, M. and Lin, Y. (2006). “Model selection and estimation in regression with grouped variables.”Journal of the Royal Statistical Society Series B, 68(Part 1): 49–67.
  • Zhou, H. (2010). “Association screening of common and rare genetic variants by penalized regression.”Bioinformatics, 26(19): 2375–2382.
  • Zhu, H., Khondker, Z., Lu, Z., and Ibrahim, J. (2014). “Bayesian Generalized Low Rank Regression Models for Neuroimaging Phenotypes and Genetic Markers.”Journal of the American Statistical Association, 109(507): 977–990.

Supplemental materials