## Brazilian Journal of Probability and Statistics

### Probabilistic models for the (sub)tree(s) of life

Amaury Lambert

#### Abstract

The goal of these lectures is to review some mathematical aspects of random tree models used in evolutionary biology to model species trees.

We start with stochastic models of tree shapes (finite trees without edge lengths), culminating in the $\beta$-family of Aldous’ branching models.

We next introduce real trees (trees as metric spaces) and show how to study them through their contour, provided they are properly measured and ordered.

We then focus on the reduced tree, or coalescent tree, which is the tree spanned by species alive at the same fixed time. We show how reduced trees, like any compact ultrametric space, can be represented in a simple way via the so-called comb metric. Beautiful examples of random combs include the Kingman coalescent and coalescent point processes.

We end up displaying some recent biological applications of coalescent point processes to the inference of species diversification, to conservation biology and to epidemiology.

#### Article information

Source
Braz. J. Probab. Stat., Volume 31, Number 3 (2017), 415-475.

Dates
Accepted: April 2016
First available in Project Euclid: 22 August 2017

https://projecteuclid.org/euclid.bjps/1503388824

Digital Object Identifier
doi:10.1214/16-BJPS320

Mathematical Reviews number (MathSciNet)
MR3693976

Zentralblatt MATH identifier
1373.05178

#### Citation

Lambert, Amaury. Probabilistic models for the (sub)tree(s) of life. Braz. J. Probab. Stat. 31 (2017), no. 3, 415--475. doi:10.1214/16-BJPS320. https://projecteuclid.org/euclid.bjps/1503388824

#### References

• Aldous, D. (1991). The continuum random tree. I. The Annals of Probability 19, 1–28.
• Aldous, D. (1993). The continuum random tree. III. The Annals of Probability 21, 248–289.
• Aldous, D. (1996). Probability distributions on cladograms. In Random Discrete Structures (A. Friedman, W. Miller, D. Aldous and R. Pemantle, eds.) 76 1–18. New York: Springer.
• Aldous, D. and Popovic, L. (2005). A critical branching process model for biodiversity. Advances in Applied Probability 37, 1094–1115.
• Aldous, D. J. (2001). Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statistical Science 16, 23–34.
• Barthélémy, J.-P. and Guénoche, A. (1991). Trees and Proximity Representations. New York: Wiley.
• Bertoin, J. (1996). Lévy Processes. Cambridge Tracts in Mathematics 121. Cambridge: Cambridge Univ. Press.
• Bertoin, J. (2006). Random Fragmentation and Coagulation Processes. Cambridge Studies in Advanced Mathematics 102. Cambridge: Cambridge Univ. Press.
• Blum, M. G. and François, O. (2006). Which random processes describe the tree of life? A large-scale study of phylogenetic tree imbalance. Systematic Biology 55, 685–691.
• Brown, J. K. M. (1994). Probabilities of evolutionary trees. Systematic Biology 43, 78–91.
• Burago, D., Burago, Y. and Ivanov, S. (2001). A Course in Metric Geometry. Graduate Studies in Mathematics 33. Providence, RI: American Mathematical Society.
• Champagnat, N. and Lambert, A. (2012). Splitting trees with neutral Poissonian mutations I: Small families. In Stochastic Processes and Their Applications 122, 1003–1033.
• Champagnat, N. and Lambert, A. (2013). Splitting trees with neutral Poissonian mutations II: Largest and oldest families. Stochastic Processes and their Applications 123, 1368–1414.
• Delaporte, C., Achaz, G. and Lambert, A. (2016). Mutational pattern of a sample from a critical branching population. Journal of Mathematical Biology Journal of Mathematical Biology 73, 627–664.
• Dress, A., Moulton, V. and Terhalle, W. (1996). T-theory: An overview. European Journal of Combinatorics 17, 161–175.
• Duquesne, T. (2006). The coding of compact real trees by real valued functions. Preprint. Available at arXiv:math/0604106.
• Duquesne, T. and Le Gall, J.-F. (2002). Random trees, Lévy processes and spatial branching processes. Asterisque—Société Mathématique de France 281. Paris: Société Mathématique de France.
• Etienne, R. S., Morlon, H. and Lambert, A. (2014). Estimating the duration of speciation from phylogenies. Evolution 68, 2430–2440.
• Etienne, R. S. and Rosindell, J. (2012). Prolonging the past counteracts the pull of the present: Protracted speciation can explain observed slowdowns in diversification. Systematic Biology 61, 204–213.
• Evans, S. N. (2008). Probability and Real Trees. Lectures from the 35th Summer School on Probability Theory held in Saint-Flour, July 6–23, 2005. Lecture Notes in Mathematics 1920. Berlin: Springer.
• Evans, S. N., Pitman, J. and Winter, A. (2005). Rayleigh processes, real trees, and root growth with re-grafting. Probability Theory and Related Fields 134, 81–126.
• Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoretical Population Biology 3, 87–112. Erratum Theoretical Population Biology 3 240, 376.
• Geiger, J. (1996). Size-biased and conditioned random splitting trees. In Stochastic Processes and Their Applications 65, 187–207.
• Geiger, J. and Kersting, G. (1997). Depth-first search of random trees, and Poisson point processes. In Classical and Modern Branching Processes Minneapolis, MN, 1994. IMA Vol. Math. Appl. 84, 111–126. New York: Springer.
• Haas, B. (2016). Scaling limits of Markov-branching trees and applications. Preprint. Available at arXiv:1605.07873.
• Haas, B., Miermont, G., Pitman, J. and Winkel, M. (2008). Continuum tree asymptotics of discrete fragmentations and applications to phylogenetic models. The Annals of Probability 36, 1790–1837.
• Hagen, O., Hartmann, K., Steel, M. and Stadler, T. (2015). Age-dependent speciation can explain the shape of empirical phylogenies. Systematic Biology 64, 432–440.
• Harding, E. F. (1971). The probabilities of rooted tree-shapes generated by random bifurcation. Advances in Applied Probability 3, 44–77.
• Jetz, W., Thomas, G. H., Joy, J. B., Hartmann, K. and Mooers, A. O. (2012). The global diversity of birds in space and time. Nature 491, 444–448.
• Kingman, J. (1982). The coalescent. Stochastic Processes and Their Applications 13, 235–248.
• Knuth, D. E. (1997). The Art of Computer Programming. Reading, MA: Addison-Wesley.
• Kyprianou, A. E. (2006). Introductory Lectures on Fluctuations of Lévy Processes with Applications. Berlin: Springer.
• Lambert, A. (2008). Population dynamics and random genealogies. Stochastic Models 24, 45–163.
• Lambert, A. (2009). The allelic partition for coalescent point processes. Markov Processes and Related Fields 15, 359–386.
• Lambert, A. (2010). The contour of splitting trees is a Lévy process. The Annals of Probability 38, 348–395.
• Lambert, A. (2011). Species abundance distributions in neutral models with immigration or mutation and general lifetimes. Journal of Mathematical Biology 63, 57–72.
• Lambert, A., Alexander, H. K. and Stadler, T. (2014a). Phylogenetic analysis accounting for age-dependent death and sampling with applications to epidemics. Journal of Theoretical Biology 352, 60–70.
• Lambert, A., Morlon, H. and Etienne, R. S. (2014b). The reconstructed tree in the lineage-based model of protracted speciation. Journal of Mathematical Biology 70, 367–397.
• Lambert, A. and Popovic, L. (2013). The coalescent point process of branching trees. Annals of Applied Probability 23, 99–144.
• Lambert, A., Simatos, F. and Zwart, B. (2013). Scaling limits via excursion theory: Interplay between Crump–Mode–Jagers branching processes and processor-sharing queues. The Annals of Applied Probability 23, 2357–2381.
• Lambert, A. and Stadler, T. (2013). Birth–death models and coalescent point processes: The shape and probability of reconstructed phylogenies. Theoretical Population Biology 90, 113–128.
• Lambert, A. and Steel, M. (2013). Predicting the loss of phylogenetic diversity under non-stationary diversification models. Journal of Theoretical Biology 337, 111–124.
• Lambert, A. and Trapman, P. (2013). Splitting trees stopped when the first clock rings and Vervaat’s transformation. Journal of Applied Probability 50, 208–227.
• Lambert, A. and Uribe Bravo, G. (2016a). The comb representation of compact ultrametric spaces. Preprint. Available at arXiv:1602.08246.
• Lambert, A. and Uribe Bravo, G. (2016b). Totally ordered, measured trees and splitting trees with infinite variation. Preprint. Available at arXiv:1607.02114.
• Le Gall, J.-F. (1993). The uniform random tree in a Brownian excursion. Probability Theory and Related Fields 96, 369–383.
• Le Gall, J.-F. (2005). Random trees and applications. Probability Surveys 2, 245–311.
• Le Gall, J.-F. and Miermont, G. (2012). Scaling limits of random trees and planar maps. In Probability and Statistical Physics in Two and More Dimensions (D. Ellwood, ed.). Clay Math. Proc. 15, 155–211. Providence, RI: American Mathematical Society.
• Manceau, M., Lambert, A. and Morlon, H. (2015). Phylogenies support out-of-equilibrium models of biodiversity. Ecology Letters 18, 347–356.
• Mooers, A., Gascuel, O., Stadler, T., Li, H. and Steel, M. (2012). Branch lengths on birth–death trees and the expected loss of phylogenetic diversity. Systematic Biology 61, 195–203.
• Murtagh, F. (1984). Counting dendrograms: A survey. Discrete Applied Mathematics 7, 191–199.
• Nee, S. (2006). Birth-death models in macroevolution. Annual Review of Ecology, Evolution and Systematics 37, 1–17.
• Nee, S., May, R. and Harvey, P. (1994). The reconstructed evolutionary process. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences 344, 305–311.
• Nee, S. and May, R. M. (1997). Extinction and the loss of evolutionary history. Science 278, 692–694.
• Paulin, F. (1989). The Gromov topology on $R$-trees. Topology and its Applications 32, 197–221.
• Pitman, J. (2006). Combinatorial Stochastic Processes. In Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour, July 7–24, 2002. Lecture Notes in Mathematics 1875. Berlin: Springer.
• Popovic, L. (2004). Asymptotic genealogy of a critical branching process. Annals of Applied Probability 14, 2120–2148.
• Richard, M. (2014). Splitting trees with neutral mutations at birth. In Stochastic Processes and Their Applications 124, 3206–3230.
• Semple, C. and Steel, M. A. (2003). Phylogenetics. Oxford Lecture Series in Mathematics and its Applications 24. Oxford: Oxford Univ. Press.
• Slowinski, J. B. (1990). Probabilities of $n$-trees under two models: A demonstration that asymmetrical interior nodes are not improbable. Systematic Biology 39, 89–94.
• Stadler, T. (2010). Sampling-through-time in birth–death trees. Journal of Theoretical Biology 267, 396–404.
• Stadler, T. (2011). Mammalian phylogeny reveals recent diversification rate shifts. Proceedings of the National Academy of Sciences 108, 6187–6192.
• Stanley, R. P. (1999). Enumerative Combinatorics. Cambridge Studies in Advanced Mathematics 62. Cambridge: Cambridge Univ. Press.
• Trapman, P. and Bootsma, M. C. J. (2009). A useful relationship between epidemiology and queueing theory: The distribution of the number of infectives at the moment of the first detection. Mathematical Biosciences 219, 15–22.