Bayesian Analysis

Bayesian Sparse Multivariate Regression with Asymmetric Nonlocal Priors for Microbiome Data Analysis

Kurtis Shuler, Marilou Sison-Mangus, and Juhee Lee

Full-text: Open access


We propose a Bayesian sparse multivariate regression method to model the relationship between microbe abundance and environmental factors for microbiome data. We model abundance counts of operational taxonomic units (OTUs) with a negative binomial distribution and relate covariates to the counts through regression. Extending conventional nonlocal priors, we construct asymmetric nonlocal priors for regression coefficients to efficiently identify relevant covariates and their effect directions. We build a hierarchical model to facilitate pooling of information across OTUs that produces parsimonious results with improved accuracy. We present simulation studies that compare variable selection performance under the proposed model to those under Bayesian sparse regression models with asymmetric and symmetric local priors and two frequentist models. The simulations show the proposed model identifies important covariates and yields coefficient estimates with favorable accuracy compared with the alternatives. The proposed model is applied to analyze an ocean microbiome dataset collected over time to study the association of harmful algal bloom conditions with microbial communities.

Article information

Bayesian Anal., Volume 15, Number 2 (2020), 559-578.

First available in Project Euclid: 19 June 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

count data harmful algal bloom microbiome negative binomial next-generation sequencing nonlocal prior stochastic search variable selection

Creative Commons Attribution 4.0 International License.


Shuler, Kurtis; Sison-Mangus, Marilou; Lee, Juhee. Bayesian Sparse Multivariate Regression with Asymmetric Nonlocal Priors for Microbiome Data Analysis. Bayesian Anal. 15 (2020), no. 2, 559--578. doi:10.1214/19-BA1164.

Export citation


  • Aguiar-Pulido, V., Huang, W., Suarez-Ulloa, V., Cickovski, T., Mathee, K., and Narasimhan, G. (2016). “Metagenomics, Metatranscriptomics, and Metabolomics Approaches for Microbiome Analysis: Supplementary Issue: Bioinformatics Methods and Applications for Big Metagenomics Data.” Evolutionary Bioinformatics, 12s1: EBO.S36436. URL
  • Bates, S. S., Douglas, D. J., Doucette, G. J., and Leger, C. (1995). “Enhancement of domoic acid production by reintroducing bacteria to axenic cultures of the diatom Pseudo-nitzschia multiseries.” Natural Toxins, 3(6): 428–435.
  • Benjamini, Y. and Hochberg, Y. (1995). “Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing.” Journal of the Royal Statistical Society. Series B (Methodological), 57(1): 289–300. URL
  • Bidle, K. D. and Azam, F. (2001). “Bacterial control of silicon regeneration from diatom detritus: significance of bacterial ectohydrolases and species identity.” Limnology and Oceanography, 46(7): 1606–1623.
  • Brier, G. (1950). “Verification of Forecasts Expressed in Terms of Probability.” Monthly Weather Review, 78: 1.
  • Casella, G. (1985). “An Introduction to Empirical Bayes Data Analysis.” The American Statistician, 39(2): 83–87. URL
  • Chen, J. and Li, H. (2013). “Variable selection for sparse Dirichlet-multinomial regression with an application to microbiome data analysis.” The annals of applied statistics, 7(1).
  • Clooney, A. G., Fouhy, F., Sleator, R. D., O’ Driscoll, A., Stanton, C., Cotter, P. D., and Claesson, M. J. (2016). “Comparing Apples and Oranges?: Next Generation Sequencing and Its Impact on Microbiome Analysis.” PLOS ONE, 11(2): e0148028. URL
  • Dempster, A. P. (1972). “Covariance selection.” Biometrics, 157–175.
  • Geisser, S. (1993). Predictive Inference, volume 55. CRC Press.
  • Geisser, S. and Eddy, W. F. (1979). “A Predictive Approach to Model Selection.” Journal of the American Statistical Association, 74(365): 153–160. URL
  • Gelfand, A. E. and Dey, D. K. (1994). “Bayesian model choice: asymptotics and exact calculations.” Journal of the Royal Statistical Society. Series B (Methodological), 501–514.
  • Gelfand, A. E., Dey, D. K., and Chang, H. (1992). “Model determination using predictive distributions with implementation via sampling-based methods.” Technical report, Stanford.
  • Gneiting, T. and Raftery, A. E. (2007). “Strictly Proper Scoring Rules, Prediction, and Estimation.” Journal of the American Statistical Association, 102(477): 359–378.
  • Grantham, N. S., Reich, B. J., Borer, E. T., and Gross, K. (2017). “MIMIX: a Bayesian Mixed-Effects Model for Microbiome Data from Designed Experiments.” arXiv preprint arXiv:1703.07747.
  • Higdon, D. (2002). “Space and Space-Time Modeling using Process Convolutions.” In Quantitative Methods for Current Environmental Issues, 37–56. Springer. URL
  • Johnson, V. E. and Rossell, D. (2012). “Bayesian Model Selection in High-Dimensional Settings.” Journal of the American Statistical Association, 107(498): URL
  • Knight, R., Callewaert, C., Marotz, C., Hyde, E. R., Debelius, J. W., McDonald, D., and Sogin, M. L. (2017). “The Microbiome and Human Biology.” Annual Review of Genomics and Human Genetics, 18(1): 65–86. URL
  • Lee, H. K. H., Higdon, D. M., Calder, C. A., and Holloman, C. H. (2005). “Efficient models for correlated data via convolutions of intrinsic processes.” Statistical Modelling, 5(1): 53–74.
  • Lee, J. and Sison-Mangus, M. (2018). “A Bayesian Semiparametric Regression Model for Joint Analysis of Microbiome Data.” Frontiers in Microbiology, 9: 522. URL
  • Li, Q., Guindani, M., Reich, B. J., Bondell, H. D., and Vannucci, M. (2017). “A Bayesian mixture model for clustering and selection of feature occurrence rates under mean constraints.” Statistical Analysis and Data Mining: The ASA Data Science Journal, 10(6): 393–409.
  • Mao, J., Chen, Y., and Ma, L. (2017). “Bayesian graphical compositional regression for microbiome data.” arXiv preprint arXiv:1712.04723.
  • Paulson, J. N., Stine, O. C., Bravo, H. C., and Pop, M. (2013). “Differential abundance analysis for microbial marker-gene surveys.” Nature Methods, 10: 1200. URL,
  • Ren, B., Bacallado, S., Favaro, S., Holmes, S., and Trippa, L. (2017a). “Bayesian nonparametric ordination for the analysis of microbial communities.” Journal of the American Statistical Association, 112(520): 1430–1442.
  • Ren, B., Bacallado, S., Favaro, S., Vatanen, T., Huttenhower, C., and Trippa, L. (2017b). “Bayesian Nonparametric Mixed Effects Models in Microbiome Data Analysis.” arXiv preprint arXiv:1711.01241.
  • Robinson, M. D., McCarthy, D. J., and Smyth, G. K. (2010). “edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.” Bioinformatics, 26(1): 139–140.
  • Robinson, M. D. and Oshlack, A. (2010). “A scaling normalization method for differential expression analysis of RNA-seq data.” Genome biology, 11(3): R25.
  • Rossell, D. and Telesca, D. (2017). “Nonlocal Priors for High-Dimensional Estimation.” Journal of the American Statistical Association, 112(517): 254–265.
  • Scott, J. G. and Berger, J. O. (2010). “Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem.” The Annals of Statistics, 2587–2619.
  • Shin, M., Bhattacharya, A., and Johnson, V. E. (2018). “Scalable Bayesian Variable Selection Using Nonlocal Prior Densities in Ultrahigh-dimensional Settings.” Statistica Sinica, 28(2): 1053–1078. URL
  • Shuler, K., Sison-Mangusy, M., and Lee, J. (2019). “Supplementary Materials: Bayesian Sparse Multivariate Regression with Asymmetric Nonlocal Priors for Microbiome Data Analysis.” Bayesian Analysis.
  • Sison-Mangus, M. P., Jiang, S., Kudela, R. M., and Mehic, S. (2016). “Phytoplankton-Associated Bacterial Community Composition and Succession during Toxic Diatom Bloom and Non-Bloom Events.” Frontiers in Microbiology, 7: 1433. URL
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. (2002). “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4): 583–639.
  • Wadsworth, W. D., Argiento, R., Guindani, M., Galloway-Pena, J., Shelburne, S. A., and Vannucci, M. (2017). “An integrative Bayesian Dirichlet-multinomial regression model for the analysis of taxonomic abundances in microbiome data.” BMC Bioinformatics, 18(1): 94. URL
  • Witten, D. M. (2011). “Classification and clustering of sequencing data using a poisson model.” Annals of Applied Statistics, 5(4): 2493–2518.
  • Wu, H.-H. (2016). “Nonlocal Priors for Bayesian Variable Selection in Generalized Linear Models and Generalized Linear Mixed Models and Their Applications in Biology Data.” Ph.d. thesis, The University of Missouri.
  • Xia, F., Chen, J., Fung, W. K., and Li, H. (2013). “A logistic normal multinomial regression model for microbiome compositional data analysis.” Biometrics, 69(4): 1053–1063.
  • Xiao, S. (2015). “Bayesian nonparametric modeling for some classes of temporal point processes.” Ph.D. thesis, University of California Santa Cruz, Santa Cruz. URL,
  • Zhang, X., Mallick, H., Tang, Z., Zhang, L., Cui, X., Benson, A. K., and Yi, N. (2017). “Negative binomial mixed models for analyzing microbiome count data.” BMC Bioinformatics, 18(1): 4. URL

Supplemental materials