The Annals of Applied Probability

The diversity of a distributed genome in bacterial populations

F. Baumdicker, W. R. Hess, and P. Pfaffelhuber
Source: Ann. Appl. Probab. Volume 20, Number 5 (2010), 1567-1606.

Abstract

The distributed genome hypothesis states that the set of genes in a population of bacteria is distributed over all individuals that belong to the specific taxon. It implies that certain genes can be gained and lost from generation to generation. We use the random genealogy given by a Kingman coalescent in order to superimpose events of gene gain and loss along ancestral lines. Gene gains occur at a constant rate along ancestral lines. We assume that gained genes have never been present in the population before. Gene losses occur at a rate proportional to the number of genes present along the ancestral line. In this infinitely many genes model we derive moments for several statistics within a sample: the average number of genes per individual, the average number of genes differing between individuals, the number of incongruent pairs of genes, the total number of different genes in the sample and the gene frequency spectrum. We demonstrate that the model gives a reasonable fit with gene frequency data from marine cyanobacteria.

First Page: Show Hide
Primary Subjects: 92D15, 60J70, 92D20
Secondary Subjects: 60K35
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoap/1282747394
Digital Object Identifier: doi:10.1214/09-AAP657
Zentralblatt MATH identifier: 05795064
Mathematical Reviews number (MathSciNet): MR2724396

References

Bentley, S. (2009). Sequencing the species pan-genome. Nature Rev. Microbiol. 7 258–259.
Dufresne, A., Ostrowski, M., Scanlan, D. J., Garczarek, L., Mazard, S., Palenik, B. P., Paulsen, I. T., de Marsac, N. T., Wincker, P., Dossat, C., Ferriera, S., Johnson, J., Post, A. F., Hess, W. R. and Partensky, F. (2008). Unraveling the genomic mosaic of a ubiquitous genus of marine cyanobacteria. Genome Biol. 9 R90.
Durrett, R. (2008). Probability Models for DNA Sequence Evolution, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR2439767
Durrett, R. and Popovic, L. (2009). Degenerate diffusions arising from gene duplication models. Ann. Appl. Probab. 19 15–48.
Mathematical Reviews (MathSciNet): MR2498670
Zentralblatt MATH: 05538893
Digital Object Identifier: doi:10.1214/08-AAP530
Project Euclid: euclid.aoap/1235140331
Dykhuizen, D. E. and Green, L. (1991). Recombination in Escherichia coli and the definition of biological species. J. Bacteriol. 173 7257–7268.
Ehrlich, G. D., Hu, F. Z., Shen, K., Stoodley, P. and Post, J. C. (2005). Bacterial plurality as a general mechanism driving persistence in chronic infections. Clin. Orthop. Relat. Res. 437 20–24.
Evans, S., Shvets, S. and Slatkin, M. (2007). Non-equlibrium theory of the allele frequency spectrum. Theo. Pop. Biol. 71 109–119.
Ewens, W. J. (2004). Mathematical Population Genetics. I. Theoretical Introduction, 2nd ed. Interdisciplinary Applied Mathematics 27. Springer, New York.
Mathematical Reviews (MathSciNet): MR2026891
Zentralblatt MATH: 1060.92046
Fraser, C., Hanage, W. P. and Spratt, B. G. (2007). Recombination and the nature of bacterial speciation. Science 315 476–480.
Fu, Y. X. (1995). Statistical properties of segregating sites. Theo. Pop. Biol. 48 172–197.
Griffiths, R. C. (2003). The frequency spectrum of a mutation and its age, in a general diffusion model. Theo. Pop. Biol. 64 241–251.
Hiller, N. L., Janto, B., Hogg, J. S., Boissy, R., Yu, S., Powell, E., Keefe, R., Ehrlich, N. E., Shen, K., Hayes, J., Barbadora, K., Klimke, W., Dernovoy, D., Tatusova, T., Parkhill, J., Bentley, S. D., Post, J. C., Ehrlich, G. D. and Hu, F. Z. (2007). Comparative genomic analyses of seventeen Streptococcus pneumoniae strains: Insights into the pneumococcal supragenome. J. Bacteriol. 189 8186–8195.
Hogg, J. S., Hu, F. Z., Janto, B., Boissy, R., Hayes, J., Keefe, R., Post, J. C. and Ehrlich, G. D. (2007). Characterization and modeling of the Haemophilus influenzae core and supragenomes based on the complete genomic sequences of Rd and 12 clinical nontypeable strains. Genome Biol. 8 R103.
Huson, D. H. and Steel, M. (2004). Phylogenetic trees based on gene content. Bioinformatics 20 2044–2049.
Kettler, G. C., Martiny, A. C., Huang, K., Zucker, J., Coleman, M. L., Rodrigue, S., Chen, F., Lapidus, A., Ferriera, S., Johnson, J., Steglich, C., Church, G. M., Richardson, P. and Chisholm, S. W. (2007). Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 3 e231.
Kimura, M. (1964). Diffusion models in population genetics. J. Appl. Probab. 1 177–232.
Mathematical Reviews (MathSciNet): MR172727
Zentralblatt MATH: 0134.38103
Digital Object Identifier: doi:10.2307/3211856
Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235–248.
Mathematical Reviews (MathSciNet): MR671034
Zentralblatt MATH: 0491.60076
Digital Object Identifier: doi:10.1016/0304-4149(82)90011-4
Kunin, V. and Ouzounis, C. A. (2003). GeneTRACE-reconstruction of gene content of ancestral species. Bioinformatics 19 1412–1416.
Lapierre, P. and Gogarten, J. P. (2009). Estimating the size of the bacterial pan-genome. Trends in Genetics 25 107–110.
Lefébure, T. and Stanhope, M. J. (2007). Evolution of the core and pan-genome of Streptococcus: Positive selection, recombination, and genome composition. Genome Biol. 8 R71.
Maiden, M. C., Bygraves, J. A., Feil, E., Morelli, G., Russell, J. E., Urwin, R., Zhang, Q., Zhou, J., Zurth, K., Caugant, D. A., Feavers, I. M., Achtman, M. and Spratt, B. G. (1998). Multilocus sequence typing: A portable approach to the identification of clones within populations of pathogenic microorganisms. Proc. Natl. Acad. Sci. USA 95 3140–3145.
Maynard-Smith, J. (1995). Do bacteria have population genetics? In Population Genetics of Bacteria 1–12. Cambridge Univ. Press, Cambridge.
Medini, D., Donati, C., Tettelin, H., Masignani, V. and Rappuoli, R. (2005). The microbial pan-genome. Curr. Opin. Genet. Dev. 15 589–594.
Möhle, M. and Sagitov, S. (2001). A classification of coalescent processes for haploid exchangeable population models. Ann. Probab. 29 1547–1562.
Mathematical Reviews (MathSciNet): MR1880231
Zentralblatt MATH: 1013.92029
Digital Object Identifier: doi:10.1214/aop/1015345761
Project Euclid: euclid.aop/1015345761
Perna, N. T., Plunkett, G., Burland, V., Mau, B., Glasner, J. D., Rose, D. J., Mayhew, G. F., Evans, P. S., Gregor, J., Kirkpatrick, H. A., Pésfai, G., Hackett, J., Klink, S., Boutin, A., Shao, Y., Miller, L., Grotbeck, E. J., Davis, N. W., Lim, A., Dimalanta, E. T., Potamousis, K. D., Apodaca, J., Anantharaman, T. S., Lin, J., Yen, G., Schwartz, D. C., Welch, R. A. and Blattner, F. R. (2001). Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 409 529–533.
Riley, M. A. and Lizotte-Waniewski, M. (2009). Population genomics and the bacterial species concept. Methods Mol. Biol. 532 367–377.
Tettelin, H., Masignani, V., Cieslewicz, M. J., Donati, C., Medini, D., Ward, N. L., Angiuoli, S. V., Crabtree, J., Jones, A. L., Durkin, A. S., Deboy, R. T., Davidsen, T. M., Mora, M., Scarselli, M., Margarit y Ros, I., Peterson, J. D., Hauser, C. R., Sundaram, J. P., Nelson, W. C., Madupu, R., Brinkac, L. M., Dodson, R. J., Rosovitz, M. J., Sullivan, S. A., Daugherty, S. C., Haft, D. H., Selengut, J., Gwinn, M. L., Zhou, L., Zafar, N., Khouri, H., Radune, D., Dimitrov, G., Watkins, K., O’Connor, K. J., Smith, S., Utterback, T. R., White, O., Rubens, C. E., Grandi, G., Madoff, L. C., Kasper, D. L., Telford, J. L., Wessels, M. R., Rappuoli, R. and Fraser, C. M. (2005). Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial “pan-genome.” Proc. Natl. Acad. Sci. USA 102 13950–13955.
Tettelin, H., Riley, D., Cattuto, C. and Medini, D. (2008). Comparative genomics: The bacterial pan-genome. Curr. Opin. Microbiol. 11 472–477.
Vulic, M., Dionisio, F., Taddei, F. and Radman, M. (1997). Molecular keys to speciation: DNA polymorphism and the control of genetic exchange in enterobacteria. Proc. Natl. Acad. Sci. USA 94 9763–9767.
Wakeley, J. (2008). Coalescent Theory: An Introduction. Roberts and Company, Colorado.
Wright, S. (1938). The distribution of gene frequencies under irreversible mutation. Proc. Natl. Acad. Sci. USA 24 253–259.

2012 © Institute of Mathematical Statistics

The Annals of Applied Probability

The Annals of Applied Probability