Abstract
We consider the reconstruction of a phylogeny from multiple genes under the multispecies coalescent. We establish a connection with the sparse signal detection problem, where one seeks to distinguish between a distribution and a mixture of the distribution and a sparse signal. Using this connection, we derive an information-theoretic trade-off between the number of genes, $m$, needed for an accurate reconstruction and the sequence length, $k$, of the genes. Specifically, we show that to detect a branch of length $f$, one needs $m=\Theta(1/[f^{2}\sqrt{k}])$ genes.
Citation
Elchanan Mossel. Sebastien Roch. "Distance-based species tree estimation under the coalescent: Information-theoretic trade-off between number of loci and sequence length." Ann. Appl. Probab. 27 (5) 2926 - 2955, October 2017. https://doi.org/10.1214/16-AAP1273
Information