Annals of Applied Statistics

Lateral transfer in Stochastic Dollo models

Luke J. Kelly and Geoff K. Nicholls

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Lateral transfer, a process whereby species exchange evolutionary traits through nonancestral relationships, is a frequent source of model misspecification in phylogenetic inference. Lateral transfer obscures the phylogenetic signal in the data as the histories of affected traits are mosaics of the overall phylogeny. We control for the effect of lateral transfer in a Stochastic Dollo model and a Bayesian setting. Our likelihood is highly intractable, as the parameters are the solution of a sequence of large systems of differential equations representing the expected evolution of traits along a tree. We illustrate our method on a data set of lexical traits in Eastern Polynesian languages, and obtain an improved fit over the corresponding model without lateral transfer.

Article information

Ann. Appl. Stat., Volume 11, Number 2 (2017), 1146-1168.

Received: January 2016
Revised: March 2017
First available in Project Euclid: 20 July 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian phylogenetics lateral trait transfer Stochastic Dollo model


Kelly, Luke J.; Nicholls, Geoff K. Lateral transfer in Stochastic Dollo models. Ann. Appl. Stat. 11 (2017), no. 2, 1146--1168. doi:10.1214/17-AOAS1040.

Export citation


  • Abby, S. S., Tannier, E., Gouy, M. and Daubin, V. (2010). Detecting lateral gene transfers by statistical reconciliation of phylogenetic forests. BMC Bioinform. 11 324.
  • Alekseyenko, A. V., Lee, C. J. and Suchard, M. A. (2008). Wagner and Dollo: A stochastic duet by composing two parsimonious solos. Syst. Biol. 57 772–784.
  • Beiko, R. G. and Hamilton, N. (2006). Phylogenetic identification of lateral genetic transfer events. BMC Evol. Biol. 6 15.
  • Bouchard-Côté, A. and Jordan, M. I. (2013). Evolutionary inference via the Poisson Indel Process. Proc. Natl. Acad. Sci. USA 110 1160–1166.
  • Bouckaert, R. and Heled, J. (2014). DensiTree 2: Seeing trees through the forest. BioRxiv.
  • Bouckaert, R., Lemey, P., Dunn, M., Greenhill, S. J., Alekseyenko, A. V., Drummond, A. J., Gray, R. D., Suchard, M. A. and Atkinson, Q. D. (2012). Mapping the origins and expansion of the Indo-European language family. Science 337 957–960.
  • Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.-H., Xie, D., Suchard, M. A., Rambaut, A. and Drummond, A. J. (2014). BEAST 2: A software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10 1–6.
  • Chang, W., Cathcart, C., Hall, D. and Garrett, A. (2015). Ancestry-constrained phylogenetic analysis supports the Indo-European steppe hypothesis. Language 91 194–244.
  • Conte, E. and Molle, G. (2014). Reinvestigating a key site for Polynesian prehistory: New results from the Hane dune site, Ua Huka (Marquesas). Archaeol. Ocean. 49 121–136.
  • Cybis, G. B., Sinsheimer, J. S., Bedford, T., Mather, A. E., Lemey, P. and Suchard, M. A. (2015). Assessing phenotypic correlation through the multivariate phylogenetic latent liability model. Ann. Appl. Stat. 9 969–991.
  • Daubin, V., Gouy, M. and Perrière, G. (2002). A phylogenomic approach to bacterial phylogeny: Evidence of a core of genes sharing a common history. Genome Res. 12 1080–1090.
  • Drummond, A. J., Suchard, M. A., Xie, D. and Rambaut, A. (2012). Bayesian phylogenetics with BEAUti and the BEAST 1.7. Mol. Biol. Evol. 29 1969–1973.
  • Felsenstein, J. (1981). Evolutionary trees from DNA sequences: A maximum likelihood approach. J. Mol. Evol. 17 368–376.
  • Geyer, C. J. (1992). Practical Markov chain Monte Carlo. Statist. Sci. 7 473–483.
  • Gray, R. D. and Atkinson, Q. D. (2003). Language-tree divergence times support the Anatolian theory of Indo-European origin. Nature 426 435–439.
  • Gray, R. D., Bryant, D. and Greenhill, S. J. (2010). On the shape and fabric of human history. Philos. Trans. R. Soc. Lond. B, Biol. Sci. 365 3923–3933.
  • Gray, R. D., Drummond, A. J. and Greenhill, S. J. (2009). Language phylogenies reveal expansion pulses and pauses in Pacific settlement. Science 323 479–483.
  • Greenhill, S. J., Blust, R. and Gray, R. D. (2008). The Austronesian Basic Vocabulary Database: From bioinformatics to lexomics. Evol. Bioinform. 4 271–283.
  • Greenhill, S. J., Currie, T. E. and Gray, R. D. (2009). Does horizontal transmission invalidate cultural phylogenies? Proc. R. Soc. Lond., B Biol. Sci. 276 2299–2306.
  • Heled, J. and Drummond, A. J. (2012). Calibrated tree priors for relaxed phylogenetics and divergence time estimation. Syst. Biol. 61 138–149.
  • Huson, D. H. and Bryant, D. (2006). Application of phylogenetic networks in evolutionary studies. Mol. Biol. Evol. 23 254–267.
  • Huson, D. H. and Steel, M. (2004). Phylogenetic trees based on gene content. Bioinformatics 20 2044–2049.
  • Jofré, P., Das, P., Bertranpetit, J. and Foley, R. (2017). Cosmic phylogeny: Reconstructing the chemical history of the solar neighbourhood with an evolutionary tree. Mon. Not. R. Astron. Soc. 467 1140–1153.
  • Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773–795.
  • Kelly, L. J. (2016). A Stochastic Dollo model for lateral transfer. Ph.D. thesis, Univ. Oxford.
  • Kelly, L. J. and Nicholls, G. K. (2017). Supplement to “Lateral transfer in Stochastic Dollo models.” DOI:10.1214/17-AOAS1040SUPP.
  • Kingman, J. F. C. (1993). Poisson Processes. Oxford Studies in Probability 3. The Clarendon Press, Oxford.
  • Kitchen, A., Ehret, C., Assefa, S. and Mulligan, C. J. (2009). Bayesian phylogenetic analysis of Semitic languages identifies an Early Bronze Age origin of Semitic in the Near East. Proc. R. Soc. Lond., B Biol. Sci. 276 2703–2710.
  • Kubatko, L. S. (2009). Identifying hybridization events in the presence of coalescence via model selection. Syst. Biol. 58 478–488.
  • Lathrop, G. M. (1982). Evolutionary trees and admixture: Phylogenetic inference when some populations are hybridized. Ann. Hum. Genet. 46 245–255.
  • Madigan, D. and Raftery, A. E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Amer. Statist. Assoc. 89 1535–1546.
  • Marck, J. C. (2000). Topics in Polynesian Language and Culture History 504. Pacific Linguistics, Canberra.
  • McPherson, A., Roth, A., Laks, E., Masud, T., Bashashati, A., Zhang, A. W., Ha, G., Biele, J., Yap, D., Wan, A., Prentice, L. M., Khattra, J., Smith, M. A., Nielsen, C. B., Mullaly, S. C., Kalloger, S., Karnezis, A., Shumansky, K., Siu, C., Rosner, J., Chan, H. L., Ho, J., Melnyk, N., Senz, J., Yang, W., Moore, R., Mungall, A. J., Marra, M. A., Bouchard-Côté, A., Gilks, C. B., Huntsman, D. G., McAlpine, J. N., Aparicio, S. and Shah, S. P. (2016). Divergent modes of clonal spread and intraperitoneal mixing in high-grade serous ovarian cancer. Nat. Genet. 48 758–767.
  • Nicholls, G. K. and Gray, R. D. (2008). Dated ancenstral trees from binary trait data and their application to the diversification of languages. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 545–566.
  • Nicholls, G. K. and Ryder, R. J. (2011). Phylogenetic models for Semitic vocabulary. In Proceedings of the International Workshop on Statistical Modelling (D. Conesa, A. Forte, A. López-Quílez and F. Muñoz, eds.) 431–436.
  • Nicholls, G. K., Ryder, R. J. and Welch, D. (2013). TraitLab: A MatLab package for fitting and simulating binary trait-like data.
  • Oldman, J., Wu, T., van Iersel, L. and Moulton, V. (2016). TriLoNet: Piecing together small networks to reconstruct reticulate evolutionary histories. Mol. Biol. Evol. 33 2151–2162.
  • Patterson, N., Moorjani, P., Luo, Y., Mallick, S., Rohland, N., Zhan, Y., Genschoreck, T., Webster, T. and Reich, D. (2012). Ancient admixture in human history. Genetics 192 1065–1093.
  • Pickrell, J. K. and Pritchard, J. K. (2012). Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 8 e1002967.
  • Rannala, B. and Yang, Z. (2003). Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164 1645–1656.
  • Roch, S. and Snir, S. (2013). Recovering the treelike trend of evolution despite extensive lateral genetic transfer: A probabilistic analysis. J. Comput. Biol. 20 93–112.
  • Ryder, R. J. and Nicholls, G. K. (2011). Missing data in a stochastic Dollo model for binary trait data, and its application to the dating of Proto-Indo-European. J. R. Stat. Soc. Ser. C. Appl. Stat. 60 71–92.
  • Skelton, C. (2008). Methods of using phylogenetic systematics to reconstruct the history of the Linear B script. Archaeometry 50 158–176.
  • Spriggs, M. and Anderson, A. (1993). Late colonization of East Polynesia. Antiquity 67 200–217.
  • Szöllosi, G. J., Boussau, B., Abby, S. S., Tannier, E. and Daubin, V. (2012). Phylogenetic modeling of lateral gene transfer reconstructs the pattern and relative timing of speciations. Proc. Natl. Acad. Sci. USA 109 17513–17518.
  • Szöllősi, G. J., Tannier, E., Lartillot, N. and Daubin, V. (2013). Lateral gene transfer from the dead. Syst. Biol. 62 386–397.
  • Tavaré, S., Balding, D. J., Griffiths, R. C. and Donnelly, P. (1997). Inferring coalescence times from DNA sequence data. Genetics 145 505–518.
  • Veeramah, K. R., Woerner, A. E., Johnstone, L., Gut, I., Gut, M., Marques-Bonet, T., Carbone, L., Wall, J. D. and Hammer, M. F. (2015). Examining phylogenetic relationships among Gibbon genera using whole genome sequence data using an approximate Bayesian computation approach. Genetics 200 295–308.
  • Walworth, M. (2014). Eastern Polynesian: The linguistic evidence revisited. Ocean. Linguist. 53 256–272.
  • Wen, D., Yu, Y. and Nakhleh, L. (2016). Bayesian inference of reticulate phylogenies under the multispecies network coalescent. PLoS Genet. 12 e1006006.
  • Wilmshurst, J. M., Hunt, T. L., Lipo, C. P. and Anderson, A. J. (2011). High-precision radiocarbon dating shows recent and rapid initial human colonization of East Polynesia. Proc. Natl. Acad. Sci. USA 108 1815–1820.

Supplemental materials

  • Supplemental Materials: Lateral transfer in Stochastic Dollo models. The supplement contains a proof of Theorem 1 and supporting material for the analyses in Sections 7 and 8.