Journal of Applied Probability

Computational inference beyond Kingman's coalescent

Jere Koskela, Paul Jenkins, and Dario Spanò

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Full likelihood inference under Kingman's coalescent is a computationally challenging problem to which importance sampling (IS) and the product of approximate conditionals (PAC) methods have been applied successfully. Both methods can be expressed in terms of families of intractable conditional sampling distributions (CSDs), and rely on principled approximations for accurate inference. Recently, more general Λ- and Ξ-coalescents have been observed to provide better modelling fits to some genetic data sets. We derive families of approximate CSDs for finite sites Λ- and Ξ-coalescents, and use them to obtain 'approximately optimal' IS and PAC algorithms for Λ-coalescents, yielding substantial gains in efficiency over existing methods.

Article information

Source
J. Appl. Probab., Volume 52, Number 2 (2015), 519-537.

Dates
First available in Project Euclid: 23 July 2015

Permanent link to this document
https://projecteuclid.org/euclid.jap/1437658613

Digital Object Identifier
doi:10.1239/jap/1437658613

Mathematical Reviews number (MathSciNet)
MR3372090

Zentralblatt MATH identifier
1347.60120

Subjects
Primary: 60G09: Exchangeability
Secondary: 93E10: Estimation and detection [See also 60G35] 92D25: Population dynamics (general)

Keywords
Lambda-coalescent xi-coalescent product of approximate conditionals importance sampling conditional sampling distribution population genetics

Citation

Koskela, Jere; Jenkins, Paul; Spanò, Dario. Computational inference beyond Kingman's coalescent. J. Appl. Probab. 52 (2015), no. 2, 519--537. doi:10.1239/jap/1437658613. https://projecteuclid.org/euclid.jap/1437658613


Export citation

References

  • Árnason, E. (2004). Mitochondrial cytochrome b DNA variation in the high-fecundity Atlantic cod: trans-Atlantic clines and shallow gene genealogy. Genetics 166, 1871–1885.
  • Birkner, M. and Blath, J. (2008). Computing likelihoods for coalescents with multiple collisions in the infinitely many sites model. J. Math. Biol. 57, 435–465.
  • Birkner, M. and Blath, J. (2009). Measure-valued diffusions, general coalescents and population genetic inference. In Trends in Stochastic Analysis (London Math. Soc. Lecture Notes Ser. 353), Cambridge University Press, pp. 329–363.
  • Birkner, M., Blath, J. and Eldon, B. (2013). An ancestral recombination graph for diploid populations with skewed offspring distribution. Genetics 193, 255–290.
  • Birkner, M., Blath, J. and Steinrücken, M. (2011). Importance sampling for Lambda-coalescents in the infinitely many sites model. Theoret. Pop. Biol. 79, 155–173.
  • Birkner, M. et al. (2009). A modified lookdown construction for the Xi–Fleming–Viot process with mutation and populations with recurrent bottlenecks. ALEA Lat. Amer. J. Prob. Math. Statist. 6, 25–61.
  • Boom, J. D. G., Boulding, E. G. and Beckenback, A. T. (1994). Mitochondrial DNA variation in introduced populations of Pacific oyster, Crassostrea gigas, in British Columbia. Canad. J. Fish. Aquat. Sci. 51, 1608–1614.
  • De Iorio, M. and Griffiths, R. C. (2004). Importance sampling on coalescent histories. I. Adv. Appl. Prob. 36, 417–433.
  • De Iorio, M. and Griffiths, R. C. (2004). Importance sampling on coalescent histories. II. Subdivided population models. Adv. Appl. Prob. 36, 434–454.
  • De Iorio, M., Griffiths, R. C., Leblois, R. and Rousset, F. (2005). Stepwise mutation likelihood computation by sequential importance sampling in subdivided population models. Theoret. Pop. Biol. 68, 41–53.
  • Donnelly, P. and Kurtz, T. G. (1999). Particle representations for measure-valued population models. Ann. Prob. 27, 166–205.
  • Eldon, B. and Wakeley, J. (2006). Coalescent processes when the distribution of offspring number among individuals is highly skewed. Genetics 172, 2621–2633.
  • Fearnhead, P. and Donnelly, P. (2001). Estimating recombination rates from population genetic data. Genetics 159, 1299–1318.
  • Felsenstein, J., Kuhner, M. K., Yamato, J. and Beerli, P. (1999). Likelihoods on Coalescents: A Monte Carlo Sampling Approach to Inferring Parameters from Population Samples of Molecular Data (IMS Lect. Notes Monogr. Ser. 33), Institute of Mathematical Statistics, Hayward, CA, pp. 163–185.
  • Görür, D. and Teh, Y. W. (2008). An efficient sequential Monte Carlo algorithm for coalescent clustering. In Advances in Neural Information Processing Systems 21 (NIPS 2008), 8pp.
  • Griffiths, R. C. and Marjoram, P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3, 479–502.
  • Griffiths, R. C. and Tavaré, S. (1994). Ancestral inference in population genetics. Statist. Sci. 9, 307–319.
  • Griffiths, R. C. and Tavaré, S. (1994). Sampling theory for neutral alleles in a varying environment. Phil. Trans. R. Soc. London B 344, 403–410.
  • Griffiths, R. C. and Tavaré, S. (1994). Simulating probability distributions in the coalescent. Theoret. Pop. Biol. 46, 131–159.
  • Griffiths, R. C. and Tavaré, S. (1999). The ages of mutations in gene trees. Ann. Appl. Prob. 9, 567–590.
  • Griffiths, R. C., Jenkins, P. A. and Song, Y. S. (2008). Importance sampling and the two-locus model with subdivided population structure. Adv. Appl. Prob. 40, 473–500.
  • Hobolth, A., Uyenoyama, M. K. and Wiuf, C. (2008). Importance sampling for the infinite sites model. Statist. Appl. Genet. Mol. Biol. 7, Article 32.
  • Jenkins, P. A. (2012). Stopping-time resampling and population genetic inference under coalescent models. Statist. Appl. Genet. Mol. Biol. 11, Article 9.
  • Jenkins, P. A. and Griffiths, R. C. (2011). Inference from samples of DNA sequences using a two-locus model. J. Comput. Biol. 18, 109–127.
  • Kingman, J. F. C. (1982). The coalescent. Stoch. Process. Appl. 13, 235–248.
  • Li, N. and Stephens, M. (2003). Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 2213–2233.
  • Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: a theoretical exploration. Statist. Sinica 6, 831–860.
  • Möhle, M. (2006). On sampling distributions for coalescent processes with simultaneous multiple collisions. Bernoulli 12, 35–53.
  • Möhle, M. and Sagitov, S. (2001). A classification of coalescent processes for haploid exchangeable population models. Ann. Prob. 29, 1547–1562.
  • Möhle, M. and Sagitov, S. (2003). Coalescent patterns in diploid exchangeable population models. J. Math. Biol. 47, 337–352.
  • Paul, J. S. and Song, Y. S. (2010). A principled approach to deriving approximate conditional sampling distributions in population genetic models with recombination. Genetics 186, 321–338.
  • Paul, J. S., Steinrücken, M. and Song, Y. S. (2011). An accurate sequentially Markov conditional sampling distribution for the coalescent with recombination. Genetics 187, 1115–1128.
  • Pitman, J. (1999). Coalescents with multiple collisions. Ann. Prob. 27, 1870–1902.
  • Sagitov, S. (1999). The general coalescent with asynchronous mergers of ancestral lines. J. Appl. Prob. 36, 1116–1125.
  • Sargsyan, O. and Wakeley, J. (2008). A coalescent process with simultaneous multiple mergers for approximating the gene genealogies of many marine organisms. Theoret. Pop. Biol. 74, 104–114.
  • Schweinsberg, J. (2000). Coalescents with simultaneous multiple collisions. Electron. J. Prob. 5, 50pp.
  • Schweinsberg, J. (2003). Coalescent processes obtained from supercritical Galton–Watson processes. Stoch. Process. Appl. 106, 107–139.
  • Sheehan, S., Harris, K. and Song, Y. S. (2013). Estimating variable effective population sizes from multiple genomes: a sequentially Markov conditional sampling distribution approach. Genetics 194, 647–662.
  • Steinrücken, M., Birkner, M. and Blath, J. (2013). Analysis of DNA sequence variation within marine species using Beta-coalescents. Theoret. Pop. Biol. 87, 15–24.
  • Steinrücken, M., Paul, J. S. and Song, Y. S. (2013). A sequentially Markov conditional sampling distribution for structured populations with migration and recombination. Theoret. Pop. Biol. 87, 51–61.
  • Stephens, M. and Donnelly, P. (2000). Inference in molecular population genetics. J. R. Statist. Soc. B 62, 605–655.
  • Taylor, J. E. and Véber, A. (2009). Coalescent processes in subdivided populations subject to recurrent mass extinctions. Electron. J. Prob. 14, 242–288.