Statistical Science

Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today

David J. Aldous

Full-text: Open access


In 1924 Yule observed that distributions of number of species per genus were typically long­tailed, and proposed a stochastic model to fit these data. Modern taxonomists often prefer to represent relationships between species via phylogenetic trees; the counterpart to Yule’s observation is that actual reconstructed trees look surprisingly unbalanced. The imbalance can readily be seen via a scatter diagram of the sizes of clades involved in the splits of published large phylogenetic trees. Attempting stochastic modeling leads to two puzzles. First, two somewhat opposite possible biological descriptions of what dominates the macroevolutionary process (adaptive radiation; “neutral” evolution) lead to exactly the same mathematical model (Markov or Yule or coalescent). Second, neither this nor any other simple stochastic model predicts the observed pattern of imbalance. This essay represents a probabilist’s musings on these puzzles, complementing the more detailed survey of biological literature by Mooers and Heard, Quart. Rev. Biol. 72 [(1997) 31–54].

Article information

Statist. Sci., Volume 16, Number 1 (2001), 23-34.

First available in Project Euclid: 27 August 2001

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Descriptive statistics phylogenetic tree stochastic model tree balance Yule process


Aldous, David J. Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statist. Sci. 16 (2001), no. 1, 23--34. doi:10.1214/ss/998929474.

Export citation


  • Asmussen, S. and Hering, H. (1983). Branching Processes. Birkh¨auser, Boston.
  • Athreya, K. B. and Ney, P. (1972). Branching Processes. Springer, Berlin.
  • Bininda-Emonds, O. R. P. and Russell, A. P. (1996). A morphological perspective on the phylogenetic relationships of the extant phocid seals (Mammalia: Carnivora: Phocidae). Bonner Zoologische Monographien 41 1-256.
  • Breiman, L. (1994). The 1991 census adjustment: undercount or bad data? Statist. Sci. 9 458-475.
  • Chase, M. W. et al. (1993). Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Ann. MO Botanical Garden 80 528-580.
  • de Queiroz, A. (1998). Interpreting sister-group tests of key innovation hypotheses. Systematic Biology 47 710-718.
  • Eldredge, N. and Cracraft, J. (1980). Phylogenetic Patterns and the Evolutionary Process. Columbia Univ. Press.
  • Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Population. Biol. 3 87-112.
  • Ewens, W. J. (1979). Mathematical Population Genetics. Springer, Berlin.
  • Ewens, W. J. (1990). Population genetics theory-the past and the future. In Mathematical and Statistical Developments of Evolutionary Theory (S. Lessard, ed.) 177-227. Kluwer, Dordrecht.
  • Gould, S. J. (1977). Ever Since Darwin: Reflections in Natural History. Norton, New York.
  • Gould, S. J. (1989). Wonderful Life: The Burgess Shale and the Nature of History. Norton, New York. Gould, S. J., Raup, D. M., Sepkoski, J. J., Schopf, T. J. M.
  • and Simberloff, D. S. (1977). The shape of evolution: a comparison of real and random clades. Paleobiology 3 23-40.
  • Guyer, C. and Slowinski, J. B. (1991). Comparisons between observed phylogenetictopologies with null expec tations among three monophyleticlineages. Evolution 45 340-350.
  • Guyer, C. and Slowinski, J. B. (1993). Adaptive radiation and the topology of large phylogenies. Evolution 47 253-263.
  • Harrington, B. J. (1980). A genericlevel revision and cladistic analysis of the Myodochini of the world (Hemiptera, Lygaeidae, Rhyparochrominae). Bull. Amer. Museum Natural History 167 45-166.
  • Harris, T. E. (1963). The Theory of Branching Processes. Springer, New York.
  • Heard, S. B. (1996). Patterns in phylogenetictree balanc e with variable and evolving speciation rates. Evolution 50 2141-2148.
  • Holmes, S. P. (1998). Phylogenies: an overview. In Statistics in Genetics (B. Halloran and S. Geisser, eds.) 81-118. Springer, New York.
  • Jagers, P. (1975). Branching Processes with Biological Applications. Wiley, New York.
  • Karlin, S. and Taylor, H. M. (1975). A First Course in Stochastic Processes, 2nd ed. Academic Press, New York.
  • Kingman, J. F. C. (1980). Mathematics of Genetic Diversity.
  • SIAM, Philadelphia.
  • Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235-248.
  • Kirkpatrick, M. and Slatkin, M. (1993). Searching for evolutionary pattern in the shape of a phylogenetictree. Evolution 47 1171-1181.
  • Kotz, S. and Johnson, N. L. (1989). Yule distributions. In Encyclopedia of Statistical Sciences 9 191. Wiley, New York.
  • Lawler, G. F. (1995). Introduction to Stochastic Processes. Chapman and Hall, London.
  • Maddox, J. (1998). What Remains to Be Discovered. Free Press, New York.
  • Mooers, A. O. and Heard, S. B. (1997). Inferring evolutionary process from phylogenetic tree shape. Quart. Rev. Biol. 72 31-54.
  • Paradis, E. (1998). Detecting shifts in diversification rates without fossils. American Naturalist 152 176-187.
  • Raup, D. M. (1991). Extinction: Bad Genes or Bad Luck? Norton, New York.
  • Rogers, J. S. (1996). Central moments and probability distributions of three measures of phylogenetictree balance. Systematic Biol. 45 99-110.
  • Ross, S. (1983). Stochastic Processes. Wiley, New York.
  • Savage, H. M. (1983). The shape of evolution: systematictree topology. Biol. J. Linnean Soc. 20 225-244.
  • Shao, K. and Sokal, R. R. (1990). Tree balance. Systematic Zoology 39 266-276.
  • Slowinski, J. B. and Guyer, C. (1989). Testing the stochasticity of patterns of organisimal diversity: an improved null model. American Naturalist 134 907-921. Stanley, S. M., Signor, P. W. III, Lidgard, S. and Karr,
  • A. F. (1981). Natural clades differ from "random" clades: simulations and analyses. Paleobiology 7 115-127.
  • Stoyan, D., Stoyan, H. and Fiksel, T. (1983). Modelling the evolution of the number of genera in animal groups (Yule's problem revisited). Biometrical J. J. Math. Methods Biosci. 25 443-451.
  • Tavar´e, S. (1984). Line-of-descent and genealogical processes and their applications in population genetics. Theoret. Population Biol. 26 119-164.
  • Yule, G. U. (1924). A mathematical theory of evolution, based on the conclusions of Dr. J. C. Willis. Philos. Trans. Roy. Soc. London Ser. B 213 21-87.