Identifiability of a Markovian model of molecular evolution with Gamma-distributed rates



Advances in Applied Probability

Identifiability of a Markovian model of molecular evolution with Gamma-distributed rates

Elizabeth S. Allman, Cécile Ané, and John A. Rhodes

Source: Adv. in Appl. Probab. Volume 40, Number 1 (2008), 229-249.

Abstract

Inference of evolutionary trees and rates from biological sequences is commonly performed using continuous-time Markov models of character change. The Markov process evolves along an unknown tree while observations arise only from the tips of the tree. Rate heterogeneity is present in most real data sets and is accounted for by the use of flexible mixture models where each site is allowed its own rate. Very little has been rigorously established concerning the identifiability of the models currently in common use in data analysis, although nonidentifiability was proven for a semiparametric model and an incorrect proof of identifiability was published for a general parametric model (GTR + Γ + I). Here we prove that one of the most widely used models (GTR + Γ) is identifiable for generic parameters, and for all parameter choices in the case of four-state (DNA) models. This is the first proof of identifiability of a phylogenetic model with a continuous distribution of rates.

Primary Subjects: 60J25
Secondary Subjects: 92D15, 92D20
Keywords: Phylogenetics; identifiability

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Alternatively, the document is available for a cost of $6. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aap/1208358894
Digital Object Identifier: doi:10.1239/aap/1208358894

References

Allman, E. S. and Rhodes, J. A. (2006). The identifiability of tree topology for phylogenetic models, including covarion and mixture models. J. Comput. Biol. 13, 1101--1113.
Mathematical Reviews (MathSciNet): MR2255411
Digital Object Identifier: doi:10.1089/cmb.2006.13.1101
Allman, E. S. and Rhodes, J. A. (2008). Identifying evolutionary trees and substitution parameters for the general Markov model with invariable sites. Math. Biosci. 211, 18--33.
Mathematical Reviews (MathSciNet): MR2392412
Digital Object Identifier: doi:10.1016/j.mbs.2007.09.001
Chang, J. T. (1996). Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci. 137, 51--73.
Mathematical Reviews (MathSciNet): MR1410044
Digital Object Identifier: doi:10.1016/S0025-5564(96)00075-2
Felsenstein, J. (2004). Inferring Phylogenies. Sinauer Associates, Sunderland, MA.
Gascuel, O. and Guidon, S. (2007). Modelling the variability of evolutionary processes. In Reconstructing Evolution: New Mathematical and Computational Advances, eds O. Gascuel and M. Steel, Oxford University Press, pp. 65--107.
Mathematical Reviews (MathSciNet): MR2359350
Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge University Press.
Mathematical Reviews (MathSciNet): MR832183
Zentralblatt MATH: 0576.15001
Kolaczkowski, B. and Thornton, J. (2004). Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431, 980--984.
Matsen, F. A. and Steel, M. A. (2007). Phylogenetic mixtures on a single tree can mimic a tree of another topology. Syst. Biol. 56, 767--775.
Matsen, F. A., Mossel, E. and Steel, M. (2008). Mixed-up trees: the structure of phylogenetic mixtures. To appear in Bull. Math. Biol.
Mathematical Reviews (MathSciNet): MR2391182
Digital Object Identifier: doi:10.1007/s11538-007-9293-y
Pagel, M. and Meade, A. (2004). A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst. Biol. 53, 571--581.
Rogers, J. S. (2001). Maximum likelihood estimation of phylogenetic trees is consistent when substitution rates vary according to the invariable sites plus gamma distribution. Syst. Biol. 50, 713--722.
Semple, C. and Steel, M. (2003). Phylogenetics (Oxford Lecture Ser. Math. Appl. 24). Oxford University Press.
Mathematical Reviews (MathSciNet): MR2060009
Zentralblatt MATH: 1043.92026
Steel, M. A., Székely, L. and Hendy, M. D. (1994). Reconstructing trees from sequences whose sites evolve at variable rates. J. Comput. Biol. 1, 153--163.
Štefankovič, D. and Vigoda, E. (2007). Phylogeny of mixture models: robustness of maximum likelihood and non-identifiable distributions. J. Comput. Biol. 14, 156--189.
Mathematical Reviews (MathSciNet): MR2299868
Digital Object Identifier: doi:10.1089/cmb.2006.0126
Štefankovič, D. and Vigoda, E. (2007). Pitfalls of heterogeneous processes for phylogenetic reconstruction. Syst. Biol. 56, 113--124.
Sullivan, J., Swofford, D. L. and Naylor, G. J. P. (1999). The effect of taxon sampling on estimating rate heterogeneity parameters of maximum-likelihood models. Molec. Biol. Evolution 16, 1347--1356.
Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Molec. Evol. 39, 306--314.

2008 © Applied Probability Trust