The Annals of Applied Probability

Limitations of Markov chain Monte Carlo algorithms for Bayesian inference of phylogeny

Elchanan Mossel and Eric Vigoda

Source: Ann. Appl. Probab. Volume 16, Number 4 (2006), 2215-2234.

Abstract

Markov chain Monte Carlo algorithms play a key role in the Bayesian approach to phylogenetic inference. In this paper, we present the first theoretical work analyzing the rate of convergence of several Markov chains widely used in phylogenetic inference. We analyze simple, realistic examples where these Markov chains fail to converge quickly. In particular, the data studied are generated from a pair of trees, under a standard evolutionary model. We prove that many of the popular Markov chains take exponentially long to reach their stationary distribution. Our construction is pertinent since it is well known that phylogenetic trees for genes may differ within a single organism. Our results shed a cautionary light on phylogenetic analysis using Bayesian inference and highlight future directions for potential theoretical work.

Primary Subjects: 60J10, 92D15
Keywords: Markov chain Monte Carlo; phylogeny; tree space

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoap/1169065222
Digital Object Identifier: doi:10.1214/105051600000000538
Mathematical Reviews number (MathSciNet): MR2288719
Zentralblatt MATH identifier: 1121.60078

References

Bhatnagar, N. and Randall, D. (2004). Torpid mixing of simulated tempering on the Potts model. In Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) 478--487.
Mathematical Reviews (MathSciNet): MR2291087
Cavender, J. A. (1978). Taxonomy with confidence. Math. Biosci. 40 271--280.
Mathematical Reviews (MathSciNet): MR0503936
Digital Object Identifier: doi:10.1016/0025-5564(78)90089-5
Chor, B., Hendy, M. D., Holland, B. R. and Penny, D. (2000). Multiple maxima of likelihood in phylogenetic trees: An analytic approach. Mol. Biol. Evol. 17 1529--1541.
Durbin, R., Eddy, S., Krogn, A. and Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge Univ. Press.
Zentralblatt MATH: 0929.92010
Dyer, M., Frieze, A. and Jerrum, M. (2002). On counting independent sets in sparse graphs. SIAM J. Comput. 31 1527--1541.
Mathematical Reviews (MathSciNet): MR1936657
Digital Object Identifier: doi:10.1137/S0097539701383844
Diaconis, P. and Holmes, S. P. (2002). Random walks on trees and matchings. Electron. J. Probab. 7.
Mathematical Reviews (MathSciNet): MR1887626
Develin, M. and Sturmfels, B. (2004). Tropical convexity. Doc. Math. 9 1--27.
Mathematical Reviews (MathSciNet): MR2054977
Farris, J. S. (1973). A probability model for inferring evolutionary trees. Syst. Zool. 22 250--256.
Felsenstein, J. (2004). Inferring Phylogenies. Sinauer Associates, Inc., Sunderland, MA.
Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. Computing Science and Statistics: Proc. 23rd Symp. on the Interface 156--163. Interface Foundation, Fairfax Station, VA.
Graur, D. and Li, W.-H. (1999). Fundamentals of Molecular Evolution, 2nd ed. Sinauer Associates, Inc., Sunderland, MA.
Huelsenbeck, J. P., Larget, B., Miller, R. E. and Ronquist, F. (2002). Potential applications and pitfalls of Bayesian inference of phylogeny. Syst Biol. 51 673--688.
Huelsenbeck, J. P. and Ronquist, F. (2001). MRBAYES: Bayesian inference of phylogenetic trees. Bioinformatics 17 754--755.
Huelsenbeck, J. P., Ronquist, F., Nielsen, R. and Bollback, J. P. (2001). Bayesian inference of phylogeny and its impact on evolutionary biology. Science 294 2310--2314.
Janson, S., \'Luczak, T. and Rucinński, A. (2000). Random Graphs. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1782847
Larget, B. and Simon, D. L. (1999). Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol. Biol. Evol. 16 750--759.
Li, S., Pearl, D. K. and Doss, H. (2000). Phylogenetic tree construction using Markov chain Monte Carlo. J. Amer. Statist. Assoc. 95 493--508.
Nei, M. and Kumar, S. (2000). Molecular Evolution and Phylogenetics. Oxford Univ. Press.
Neyman, J. (1971). Molecular studies of evolution: A source of novel statistical problems. In Statistical Decision Theory and Related Topics (S. S Gupta and J. Yackel, eds.) 1--27. Academic Press, New York.
Mathematical Reviews (MathSciNet): MR0327321
Zentralblatt MATH: 0231.62010
Mossel, E. and Vigoda, E. (2005). Phylogenetic MCMC algorithms are misleading on mixtures of trees. Science 309 2207--2209.
Rannala, B. and Yang, Z. (1996). Probability distribution of molecular evolutionary trees: A new method of phylogenetic inference. J. Mol. Evol. 43 304--311.
Simon, D. L. and Larget, B. (2000). Bayesian analysis in molecular biology and evolution (BAMBE). Version 2.03 beta, Dept. Mathematics and Computer Science, Duquesne Univ., Pittsburgh, PA.
Speyer, D. and Sturmfels, B. (2004). The tropical Grassmannian. Adv. Geom. 4 389--411.
Mathematical Reviews (MathSciNet): MR2071813
Digital Object Identifier: doi:10.1515/advg.2004.023
Yang, Z. (2000). Complexity of the simplest phylogenetic estimation problem. Proc. R. Soc. Lond. B Biol. Sci. 267 109--116.
Yang, Z. and Rannala, B. (1997). Bayesian phylogenetic inference using DNA sequences: A Markov chain Monte Carlo method. Mol. Biol. Evol. 14 717--724.

2010 © Institute of Mathematical Statistics