The Annals of Applied Probability

The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance

Michael G. B. Blum, Olivier François, and Svante Janson

Full-text: Open access


For two decades, the Colless index has been the most frequently used statistic for assessing the balance of phylogenetic trees. In this article, this statistic is studied under the Yule and uniform model of phylogenetic trees. The main tool of analysis is a coupling argument with another well-known index called the Sackin statistic. Asymptotics for the mean, variance and covariance of these two statistics are obtained, as well as their limiting joint distribution for large phylogenies. Under the Yule model, the limiting distribution arises as a solution of a functional fixed point equation. Under the uniform model, the limiting distribution is the Airy distribution. The cornerstone of this study is the fact that the probabilistic models for phylogenetic trees are strongly related to the random permutation and the Catalan models for binary search trees.

Article information

Ann. Appl. Probab., Volume 16, Number 4 (2006), 2195-2214.

First available in Project Euclid: 17 January 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 05C05: Trees
Secondary: 60F05: Central limit and other weak theorems 60C05: Combinatorial probability 92D15: Problems related to evolution

Random phylogenetic trees Yule process Catalan trees shape statistics contraction method central limit theorem Airy distribution


Blum, Michael G. B.; François, Olivier; Janson, Svante. The mean, variance and limiting distribution of two statistics sensitive to phylogenetic tree balance. Ann. Appl. Probab. 16 (2006), no. 4, 2195--2214. doi:10.1214/105051606000000547.

Export citation


  • Agapow, P.-M. and Purvis, A. (2002). Power of eight tree shape statistics to detect nonrandom diversification: A comparison by simulation of two models of cladogenesis. Systematic Biology 51 866–872.
  • Aldous, D. J. (1991a). The continuum random tree II: An overview. In Stochastic Analysis (N. T. Barlow and N. H. Bingham, eds.) 23–70. Cambridge Univ. Press.
  • Aldous, D. J. (1991b). Asymptotic fringe distributions for general families of random trees. Ann. Appl. Probab. 1 228–266.
  • Aldous, D. J. (1996). Probability distributions on cladograms. In Random Discrete Structures (D. Aldous and R. Pemantle, eds.) 1–18. Springer, Berlin.
  • Aldous, D. J. (2001). Stochastic models and descriptive statistics for phylogenetic trees, from Yule to today. Statist. Sci. 16 23–34.
  • Blum, M. G. B. and François, O. (2005). On statistical tests of phylogeny imbalance: The Sackin and other indices revisited. Math. Biosci. 195 141–153.
  • Blum, M. G. B. and François, O. (2006). Which random processes describe the Tree-of-Life? A large-scale study of phylogenetic tree imbalance. Systematic Biology 55 685–691.
  • Chan, K. M. A. and Moore, B. R. (2002). Whole-tree methods for detecting differential diversification rates. Systematic Biology 51 855–865.
  • Colless, D. H. (1982). Review of phylogenetics: The theory and practice of phylogenetic systematics. Systematic Zoology 31 100–104.
  • Darwin, C. (1859). The Origin of Species. Reprinted by Penguin Books, London, UK.
  • Fill, J. A. (1996). On the distribution for binary search trees under the random permutation model. Random Structures Algorithms 8 1–25.
  • Fill, J. A. and Kapur, N. (2004). Limiting distributions for additive functionals on Catalan trees. Theoret. Comput. Sci. 326 69–102.
  • Flajolet, P. and Louchard, G. (2001). Analytic variations on the Airy distribution. Algorithmica 31 361–377.
  • Ford, D. J. (2005). Probabilities on cladogram: Introduction to the alpha model. Arxiv preprint math-0511246.
  • Harding, E. F. (1971). The probabilities of rooted tree-shapes generated by random bifurcation. Adv. in Appl. Probab. 3 44–77.
  • Hoare, C. A. R. (1962). Quicksort. Comput. J. 5 10–15.
  • Hwang, H.-K. and Neininger, R. (2002). Phase change of limit laws in the quicksort recurrence under varying toll functions. SIAM J. Comput. 31 1687–1722.
  • Janson, S. (2003). The Wiener index of simply generated random trees. Random Structures Algorithms 22 337–358.
  • Kingman, J. F. C. (1982). The coalescent. Stochastic Process. Appl. 13 235–248.
  • Kirkpatrick, M. and Slatkin, M. (1993). Searching for evolutionary patterns in the shape of a phylogenetic tree. Evolution 47 1171–1181.
  • Knuth, D. E. (1973). The Art of Computer Programming 3. Sorting and Searching. Addison–Wesley, Reading, MA.
  • Mahmoud, H. (1992). Evolution of Random Search Trees. Wiley, New York.
  • Martinez, C., Panholzer, A. and Prodinger, H. (1998). The number of descendants and ascendants in random search trees. Electron. J. Combin. 5.
  • McKenzie, A. and Steel, M. (2001). Properties of phylogenetic trees generated by Yule-type speciation models. Math. Biosci. 170 91–112.
  • Mooers, A. and Heard, S. B. (1997). Inferring evolutionary process from phylogenetic tree shape. Quarterly Review Biology 72 31–54.
  • Neininger, R. (2001). On a multivariate contraction method for random recursive structures with applications to quicksort. Random Structures Algorithms 19 498–524.
  • Neininger, R. (2002). The Wiener index of random trees. Combin. Probab. Comput. 11 587–597.
  • Purvis, A., Katzourakis, A. and Agapow, P. M. (2002). Evaluating phylogenetic tree shape: Two modifications to Fusco and Cronk's method. J. Theor. Biol. 214 99–103.
  • Rachev, S. T. and Rüschendorf, L. (1995). Probability metrics and recursive algorithms. Adv. in Appl. Probab. 27 770–799.
  • Rogers, J. S. (1994). Central moments and probability distribution of Colless' coefficient of tree imbalance. Evolution 48 2026–2036.
  • Rogers, J. S. (1996). Central moments and probability distributions of three measures of phylogenetic tree imbalance. Systematic Biology 45 99–110.
  • Rösler, U. (1991). A limit theorem for “Quicksort.” Theor. Inform. Appl. 25 85–100.
  • Rüschendorf, L. and Neininger, R. (2006). Survey of multivariate aspects of the contraction method. Discrete Math. Theor. Comput. Sci. 8 31–56.
  • Sackin, M. J. (1972). “Good” and “bad” phenograms. Systematic Zoology 21 225–226.
  • Sedgewick, R. and Flajolet, P. (1996). An Introduction to the Analysis of Algorithms. Addison–Wesley, Reading, MA.
  • Shao, K. and Sokal, R. R. (1990). Tree balance. Systematic Zoology 39 266–276.
  • Takacs, L. (1991). A Bernoulli excursion and its various applications. Adv. in Appl. Probab. 23 557–585.
  • Yule, G. U. (1924). A mathematical theory of evolution, based on the conclusions of Dr J. C. Willis. Philos. Trans. Roy. Soc. London Ser. B 213 21–87.