Annals of Applied Statistics

Modeling microbial abundances and dysbiosis with beta-binomial regression

Bryan D. Martin, Daniela Witten, and Amy D. Willis

Using a sample from a population to estimate the proportion of the population with a certain category label is a broadly important problem. In the context of microbiome studies, this problem arises when researchers wish to use a sample from a population of microbes to estimate the population proportion of a particular taxon, known as the taxon’s relative abundance. In this paper, we propose a beta-binomial model for this task. Like existing models, our model allows for a taxon’s relative abundance to be associated with covariates of interest. However, unlike existing models, our proposal also allows for the overdispersion in the taxon’s counts to be associated with covariates of interest. We exploit this model in order to propose tests not only for differential relative abundance, but also for differential variability. The latter is particularly valuable in light of speculation that dysbiosis, the perturbation from a normal microbiome that can occur in certain disease conditions, may manifest as a loss of stability, or increase in variability, of the counts associated with each taxon. We demonstrate the performance of our proposed model using a simulation study and an application to soil microbial data.

Article information

Ann. Appl. Stat., Volume 14, Number 1 (2020), 94-115.

Received: January 2019
Revised: June 2019
First available in Project Euclid: 16 April 2020

Relative abundance microbiome correlated data overdispersion high throughput sequencing beta-binomial


Martin, Bryan D.; Witten, Daniela; Willis, Amy D. Modeling microbial abundances and dysbiosis with beta-binomial regression. Ann. Appl. Stat. 14 (2020), no. 1, 94--115. doi:10.1214/19-AOAS1283.

Supplemental materials

  • Supplement A: corncob R package. We provide an R package implementing all methods proposed in this paper.
  • Supplement B: Figure code. We provide code to reproduce all simulations and data analyses in this paper.