Electronic Journal of Probability

The infinitely many genes model with horizontal gene transfer

Franz Baumdicker and Peter Pfaffelhuber

Full-text: Open access


The genome of bacterial species is much more flexible than that of eukaryotes. Moreover, the distributed genome hypothesis for bacteria states that the total number of genes present in a bacterial population is greater than the genome of every single individual. The pangenome, i.e. the set of all genes of a bacterial species (or a sample), comprises the core genes which are present in all living individuals, and accessory genes, which are carried only by some individuals. In order to use accessory genes for adaptation to environmental forces, genes can be transferred horizontally between individuals. Here, we extend the infinitely many genes model from Baumdicker, Hess and Pfaffelhuber (2010) for horizontal gene transfer. We take a genealogical view and give a construction – called the Ancestral Gene Transfer Graph – of the joint genealogy of all genes in the pangenome. As application, we compute moments of several statistics (e.g. the number of differences between two individuals and the gene frequency spectrum) under the infinitely many genes model with horizontal gene transfer.


Article information

Electron. J. Probab., Volume 19 (2014), paper no. 115, 27 pp.

Accepted: 14 December 2014
First available in Project Euclid: 4 June 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 92D15: Problems related to evolution 60J70: Applications of Brownian motions and diffusion theory (population genetics, absorption problems, etc.) [See also 92Dxx] 92D20: Protein sequences, DNA sequences
Secondary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Prokaryote bacterial evolution coalescent gene frequency spectrum pangenome

This work is licensed under a Creative Commons Attribution 3.0 License.


Baumdicker, Franz; Pfaffelhuber, Peter. The infinitely many genes model with horizontal gene transfer. Electron. J. Probab. 19 (2014), paper no. 115, 27 pp. doi:10.1214/EJP.v19-2642. https://projecteuclid.org/euclid.ejp/1465065757

Export citation


  • Artalejo, J. R.; Lopez-Herrero, M. J. Analysis of the busy period for the $M/M/c$ queue: an algorithmic approach. J. Appl. Probab. 38 (2001), no. 1, 209–222.
  • Baumdicker, F.; Hess, W. R.; Pfaffelhuber, P. The diversity of a distributed genome in bacterial populations. Ann. Appl. Probab. 20 (2010), no. 5, 1567–1606.
  • Baumdicker, F., W. R. Hess, and P. Pfaffelhuber (2012). The infinitely many genes model for the distributed genome of bacteria. Genome Biol. Evol./~ 4/(4), 443–456.
  • Berg, O. G. and C. G. Kurland (2002). Evolution of microbial genomes: sequence acquisition and loss. Mol. Biol. Evol./~ 19/(12), 2265–2276.
  • Billingsley, Patrick. Convergence of probability measures. Second edition. Wiley Series in Probability and Statistics: Probability and Statistics. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York, 1999. x+277 pp. ISBN: 0-471-19745-9.
  • Collins, R. E. and P. G. Higgs (2012). Testing the infinitely many genes model for the evolution of the bacterial core genome and pangenome. Mol. Biol. Evol./~ online first, 1–15.
  • Dagan, T. (2011). Phylogenomic networks. Trends Microbiol./~ 19/(10), 483–491.
  • Dagan, T. and W. Martin (2006). The tree of one percent. Genome Biol./~ 7/(10), 118–118.
  • Dagan, T. and W. Martin (2007). Ancestral genome sizes specify the minimum rate of lateral gene transfer during prokaryote evolution. Proc. Natl. Acad. Sci. U.S.A./~ 104/(3), 870–875.
  • Daley, D. J.; Vere-Jones, D. An introduction to the theory of point processes. Vol. I. Elementary theory and methods. Second edition. Probability and its Applications (New York). Springer-Verlag, New York, 2003. xxii+469 pp. ISBN: 0-387-95541-0.
  • de la Cruz, F. and J. Davies (2000). Horizontal gene transfer and the origin of species: lessons from bacteria. Trends Microbiol./~ 8/(3), 128–133.
  • Didelot, X., D. Lawson, A. Darling, and D. Falush (2010). Inference of homologous recombination in bacteria using whole-genome sequences. Genetics/~ 186/(4), 1435–1449.
  • Doolittle, W. F. (1999). Lateral genomics. Trends Cell Biol./~ 9/(12), 5–8.
  • Durrett, Richard. Probability models for DNA sequence evolution. Second edition. Probability and its Applications (New York). Springer, New York, 2008. xii+431 pp. ISBN: 978-0-387-78168-6
  • Ehrlich, G. D., F. Z. Hu, K. Shen, P. Stoodley, and J. C. Post (2005). Bacterial plurality as a general mechanism driving persistence in chronic infections. Clin. Orthop. Relat. Res./~ 437, 20–24.
  • Ewens, Warren J. Mathematical population genetics. I. Theoretical introduction. Second edition. Interdisciplinary Applied Mathematics, 27. Springer-Verlag, New York, 2004. xx+417 pp. ISBN: 0-387-20191-2
  • Fisher, R. (1930). The distribution of gene ratios for rare mutations. Proc. Roy. Soc. Edinburgh/~ 50, 205–220.
  • Fraser, C., E. J. Alm, M. F. Polz, B. G. Spratt, and W. P. Hanage (2009). The bacterial species challenge: making sense of genetic and ecological diversity. Science/~ 323/(5915), 741–746.
  • Gogarten, J. P., W. F. Doolittle, and J. G. Lawrence (2002). Prokaryotic evolution in light of gene transfer. Mol. Biol. Evol./~ 19/(12), 2226–2238.
  • Griffiths, Robert C.; Marjoram, Paul. An ancestral recombination graph. Progress in population genetics and human evolution (Minneapolis, MN, 1994), 257–270, IMA Vol. Math. Appl., 87, Springer, New York, 1997.
  • Haegeman, B. and J. S. Weitz (2012). A neutral theory of genome evolution and the frequency distribution of genes. BMC Genomics/~ 13, 196–196.
  • Hudson, R. R. (1983). Properties of a neutral allele model with intragenic recombination. Theoretical Population Biology/~ 23, 183–201.
  • Huson, D. H. and C. Scornavacca (2011). A survey of combinatorial methods for phylogenetic networks. Genome Biol. Evol./~ 3, 23–35.
  • Huson, D. H. and M. Steel (2004). Phylogenetic trees based on gene content. Bioinformatics/~ 20/(13), 2044–2049.
  • Kallenberg, Olav. Foundations of modern probability. Second edition. Probability and its Applications (New York). Springer-Verlag, New York, 2002. xx+638 pp. ISBN: 0-387-95313-2
  • Karlin, Samuel; Taylor, Howard M. A first course in stochastic processes. Second edition. Academic Press [A subsidiary of Harcourt Brace Jovanovich, Publishers], New York-London, 1975. xvii+557 pp.
  • Kimura, Motoo. Diffusion models in population genetics. J. Appl. Probability 1 1964 177–232.
  • Kimura, M. (1969). The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 61, 893–903.
  • Kingman, J. F. C. On the genealogy of large populations. Essays in statistical science. J. Appl. Probab. 1982, Special Vol. 19A, 27–43.
  • Koonin, E. V., K. S. Makarova, and L. Aravind (2001). Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol./~ 55, 709–742.
  • Koonin, E. V., A. R. Mushegian, M. Y. Galperin, and D. R. Walker (1997). Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol./~ 25/(4), 619–637.
  • Koonin, E. V. and Y. I. Wolf (2008). Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Research/~ 36/(21), 6688–6719.
  • Koonin, E. V. and Y. I. Wolf (2012). Evolution of microbes and viruses: a paradigm shift in evolutionary biology? Front Cell Infect. Microbiol./~ 2, 119–119.
  • Krone, S. and C. Neuhauser (1997). Ancestral processes with selection. Theo. Pop. Biol./~ 51, 210–237.
  • Kunin, V. and C. A. Ouzounis (2003). The balance of driving forces during genome evolution in prokaryotes. Genome Research/~ 13/(7), 1589–1594.
  • Lawrence, J. G. and H. Ochman (2002). Reconciling the many faces of lateral gene transfer. Trends in Microbiology/~ 10/(1), 1–4.
  • Linz, S., A. Radtke, and A. von Haeseler (2007). A likelihood framework to measure horizontal gene transfer. Molecular Biology and Evolution/~ 24/(6), 1312–1319.
  • Lobovsky, A., Y. Wolf, and E. Koonin (2013). Gene frequency distributions reject a neutral model of genome evolution. Genome Biology and Evolution/~ 5/(1), 233–242.
  • McDaniel, L. D., E. Young, J. Delaney, F. Ruhnau, K. B. Ritchie, and J. H. Paul (2010). High frequency of horizontal gene transfer in the oceans. Science/~ 330, 50.
  • Medini, D., C. Donati, H. Tettelin, V. Masignani, and R. Rappuoli (2005). The microbial pan-genome. Curr. Opin. Genet. Dev./~ 15/(6), 589–594.
  • Mozhayskiy, V. and I. Tagkopoulos (2012). Horizontal gene transfer dynamics and distribution of fitness effects during microbial in silico evolution. BMC Bioinformatics/~ 13 Suppl 10, 1–17.
  • Nakhleh, Luay; Ruths, Derek; Wang, Li-San. RIATA-HGT: a fast and accurate heuristic for reconstructing horizontal gene transfer. Computing and combinatorics, 84–93, Lecture Notes in Comput. Sci., 3595, Springer, Berlin, 2005.
  • Neuhauser, C. and S. Krone (1997). The genealogy of samples in models with selection. Genetics/~ 145, 519–534.
  • Novozhilov, A. S., G. P. Karev, and E. V. Koonin (2005). Mathematical modeling of evolution of horizontally transferred genes. Mol. Biol. Evol./~ 22/(8), 1721–1732.
  • NIST handbook of mathematical functions. Edited by Frank W. J. Olver, Daniel W. Lozier, Ronald F. Boisvert and Charles W. Clark. With 1 CD-ROM (Windows, Macintosh and UNIX). U.S. Department of Commerce, National Institute of Standards and Technology, Washington, DC; Cambridge University Press, Cambridge, 2010. xvi+951 pp. ISBN: 978-0-521-14063-8.
  • Perna, N. T., G. Plunkett, V. Burland, B. Mau, J. D. Glasner, D. J. Rose, G. F. Mayhew, P. S. Evans, J. Gregor, H. A. Kirkpatrick, G. Posfai, J. Hackett, S. Klink, A. Boutin, Y. Shao, L. Miller, E. J. Grotbeck, N. W. Davis, A. Lim, E. T. Dimalanta, K. D. Potamousis, J. Apodaca, T. S. Anantharaman, J. Lin, G. Yen, D. C. Schwartz, R. A. Welch, and F. R. Blattner (2001). Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature/~ 409, 529–533.
  • Price, M. N., P. S. Dehal, and A. P. Arkin (2008). Horizontal gene transfer and the evolution of transcriptional regulation in Escherichia coli. Genome Biology/~ 9/(1), R4.
  • Tazzyman, S. J. and S. Bonhoeffer (2013). Fixation probability of mobile genetic elements such as plasmids. Theo. Pop. Biol./~ 90, 49–55.
  • Tettelin, H., V. Masignani, M. J. Cieslewicz, C. Donati, D. Medini, N. L. Ward, S. V. Angiuoli, J. Crabtree, A. L. Jones, A. S. Durkin, R. T. DeBoy, T. M. Davidsen, M. Mora, M. Scarselli, J. D. Peterson, C. R. Hauser, J. P. Sundaram, W. C. Nelson, R. Madupu, L. M. Brinkac, R. J. Dodson, M. J. Rosovitz, S. A. Sullivan, S. C. Daugherty, D. H. Haft, J. Selengut, M. L. Gwinn, L. Zhou, N. Zafar, H. Khouri, D. Radune, G. Dimitrov, K. Watkins, K. J. B. O'Connor, S. Smith, T. R. Utterback, O. White, C. E. Rubens, G. Grandi, L. C. Madoff, D. L. Kasper, J. L. Telford, M. R. Wessels, R. Rappuoli, and C. M. Fraser (2005). Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: Implications for the microbial pan-genome. Proc. Natl. Acad. Sci. U.S.A./~ 102/(39), 13950–13955.
  • Tettelin, H., D. Riley, C. Cattuto, and D. Medini (2008). Comparative genomics: the bacterial pan-genome. Current Opinion Microbiol./~ 11/(5), 472–477.
  • Vogan, A. A. and P. G. Higgs (2011). The advantages and disadvantages of horizontal gene transfer and the emergence of the first species. Biol. Direct/~ 6, 1–14.
  • Wright, S. (1938). The distribution of gene frequencies under irreversible mutation. Proc. Natl. Acad. Sci. U.S.A./~ 24, 253–259.