Open Access
October 2017 Distance-based species tree estimation under the coalescent: Information-theoretic trade-off between number of loci and sequence length
Elchanan Mossel, Sebastien Roch
Ann. Appl. Probab. 27(5): 2926-2955 (October 2017). DOI: 10.1214/16-AAP1273

Abstract

We consider the reconstruction of a phylogeny from multiple genes under the multispecies coalescent. We establish a connection with the sparse signal detection problem, where one seeks to distinguish between a distribution and a mixture of the distribution and a sparse signal. Using this connection, we derive an information-theoretic trade-off between the number of genes, $m$, needed for an accurate reconstruction and the sequence length, $k$, of the genes. Specifically, we show that to detect a branch of length $f$, one needs $m=\Theta(1/[f^{2}\sqrt{k}])$ genes.

Citation

Download Citation

Elchanan Mossel. Sebastien Roch. "Distance-based species tree estimation under the coalescent: Information-theoretic trade-off between number of loci and sequence length." Ann. Appl. Probab. 27 (5) 2926 - 2955, October 2017. https://doi.org/10.1214/16-AAP1273

Information

Received: 1 August 2015; Revised: 1 September 2016; Published: October 2017
First available in Project Euclid: 3 November 2017

zbMATH: 1379.92040
MathSciNet: MR3719950
Digital Object Identifier: 10.1214/16-AAP1273

Subjects:
Primary: 60K35 , 92D15

Keywords: coalescent theory , Phylogenetics , sequence-length requirement

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.27 • No. 5 • October 2017
Back to Top