Open Access
December 2020 Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference
Naomi E. Hannaford, Sarah E. Heaps, Tom M. W. Nye, Tom A. Williams, T. Martin Embley
Ann. Appl. Stat. 14(4): 1964-1983 (December 2020). DOI: 10.1214/20-AOAS1369

Abstract

Phylogenetics uses alignments of molecular sequence data to learn about evolutionary trees. Substitutions in sequences are modelled through a continuous-time Markov process, characterised by an instantaneous rate matrix, which standard models assume is time-reversible and stationary. These assumptions are biologically questionable and induce a likelihood function which is invariant to a tree’s root position. This hampers inference because a tree’s biological interpretation depends critically on where it is rooted. Relaxing both assumptions, we introduce a model whose likelihood can distinguish between rooted trees. The model is nonstationary with step changes in the instantaneous rate matrix at each speciation event. Exploiting recent theoretical work, each rate matrix belongs to a nonreversible family of Lie Markov models. These models are closed under matrix multiplication, so our extension offers the conceptually appealing property that a tree and all its subtrees could have arisen from the same family of nonstationary models.

We adopt a Bayesian approach, describe an MCMC algorithm for posterior inference and provide software. The biological insight that our model can provide is illustrated through an analysis in which nonreversible but stationary and nonstationary but reversible models cannot identify a plausible root.

Citation

Download Citation

Naomi E. Hannaford. Sarah E. Heaps. Tom M. W. Nye. Tom A. Williams. T. Martin Embley. "Incorporating compositional heterogeneity into Lie Markov models for phylogenetic inference." Ann. Appl. Stat. 14 (4) 1964 - 1983, December 2020. https://doi.org/10.1214/20-AOAS1369

Information

Received: 1 June 2020; Published: December 2020
First available in Project Euclid: 19 December 2020

MathSciNet: MR4194256
Digital Object Identifier: 10.1214/20-AOAS1369

Keywords: Compositional heterogeneity , Lie Markov models , Phylogenetics , rooting

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.14 • No. 4 • December 2020
Back to Top