The Annals of Applied Probability
- Ann. Appl. Probab.
- Volume 20, Number 5 (2010), 1567-1606.
The diversity of a distributed genome in bacterial populations
The distributed genome hypothesis states that the set of genes in a population of bacteria is distributed over all individuals that belong to the specific taxon. It implies that certain genes can be gained and lost from generation to generation. We use the random genealogy given by a Kingman coalescent in order to superimpose events of gene gain and loss along ancestral lines. Gene gains occur at a constant rate along ancestral lines. We assume that gained genes have never been present in the population before. Gene losses occur at a rate proportional to the number of genes present along the ancestral line. In this infinitely many genes model we derive moments for several statistics within a sample: the average number of genes per individual, the average number of genes differing between individuals, the number of incongruent pairs of genes, the total number of different genes in the sample and the gene frequency spectrum. We demonstrate that the model gives a reasonable fit with gene frequency data from marine cyanobacteria.
Ann. Appl. Probab., Volume 20, Number 5 (2010), 1567-1606.
First available in Project Euclid: 25 August 2010
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Primary: 92D15: Problems related to evolution 60J70: Applications of Brownian motions and diffusion theory (population genetics, absorption problems, etc.) [See also 92Dxx] 92D20: Protein sequences, DNA sequences
Secondary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]
Baumdicker, F.; Hess, W. R.; Pfaffelhuber, P. The diversity of a distributed genome in bacterial populations. Ann. Appl. Probab. 20 (2010), no. 5, 1567--1606. doi:10.1214/09-AAP657. https://projecteuclid.org/euclid.aoap/1282747394