Inference of evolutionary trees and rates from biological
sequences is commonly performed using continuous-time Markov
models of character change. The Markov process evolves along an
unknown tree while observations arise only from the tips of the
tree. Rate heterogeneity is present in most real data sets and is
accounted for by the use of flexible mixture models where each
site is allowed its own rate. Very little has been rigorously
established concerning the identifiability of the models currently
in common use in data analysis, although nonidentifiability was
proven for a semiparametric model and an incorrect proof of
identifiability was published for a general parametric model
(GTR + Γ + I). Here we prove that one of the most widely
used models (GTR + Γ) is identifiable for generic
parameters, and for all parameter choices in the case of
four-state (DNA) models. This is the first proof of
identifiability of a phylogenetic model with a continuous
distribution of rates.
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription.
Read more about accessing full-text
Alternatively, the document is available for a cost of $6. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
References
Allman, E. S. and Rhodes, J. A. (2006). The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J. Comput. Biol. 13, 1101--1113.
Allman, E. S. and Rhodes, J. A. (2008). Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math. Biosci. 211, 18--33.
Chang, J. T. (1996). Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci. 137, 51--73.
Felsenstein, J. (2004). Inferring Phylogenies. Sinauer Associates, Sunderland, MA.
Gascuel, O. and Guidon, S. (2007). Modelling the variability of evolutionary processes. In Reconstructing Evolution: New Mathematical and Computational Advances, eds O. Gascuel and M. Steel, Oxford University Press, pp. 65--107.
Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge University Press.
Mathematical Reviews (MathSciNet):
MR832183
Kolaczkowski, B. and Thornton, J. (2004). Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980--984.
Matsen, F. A. and Steel, M. A. (2007). Phylogenetic mixtures on a single tree can mimic a tree of another topology. Syst. Biol. 56, 767--775.
Matsen, F. A., Mossel, E. and Steel, M. (2008). Mixed-up trees: the structure of phylogenetic mixtures. To appear in Bull. Math. Biol.
Pagel, M. and Meade, A. (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571--581.
Rogers, J. S. (2001). Maximum likelihood estimation of phylogenetic trees is consistent when substitution rates vary according to the invariable sites plus gamma distribution. Syst. Biol. 50, 713--722.
Semple, C. and Steel, M. (2003). Phylogenetics (Oxford Lecture Ser. Math. Appl. 24). Oxford University Press.
Steel, M. A., Székely, L. and Hendy, M. D. (1994). Reconstructing trees from sequences whose sites evolve at variable rates. J. Comput. Biol. 1, 153--163.
Štefankovič, D. and Vigoda, E. (2007). Phylogeny of mixture models: robustness of maximum likelihood and non-identifiable distributions. J. Comput. Biol. 14, 156--189.
Štefankovič, D. and Vigoda, E. (2007). Pitfalls of heterogeneous processes for phylogenetic reconstruction. Syst. Biol. 56, 113--124.
Sullivan, J., Swofford, D. L. and Naylor, G. J. P. (1999). The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Molec. Biol. Evolution 16, 1347--1356.
Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Molec. Evol. 39, 306--314.