Electronic Journal of Probability

Tractable diffusion and coalescent processes for weakly correlated loci

Paul Fearnhead, Paul Jenkins, and Yun Song

Full-text: Open access


Widely used models in genetics include the Wright-Fisher diffusion and its moment dual, Kingman's coalescent. Each has a multilocus extension but under neither extension is the sampling distribution available in closed-form, and their computation is extremely difficult. In this paper we derive two new multilocus population genetic models, one a diffusion and the other a coalescent process, which are much simpler than the standard models, but which capture their key properties for large recombination rates. The diffusion model is based on a central limit theorem for density dependent population processes, and we show that the sampling distribution is a linear combination of moments of Gaussian distributions and hence available in closed-form. The coalescent process is based on a probabilistic coupling of the ancestral recombination graph to a simpler genealogical process which exposes the leading dynamics of the former. We further demonstrate that when we consider the sampling distribution as an asymptotic expansion in inverse powers of the recombination parameter, the sampling distributions of the new models agree with the standard ones up to the first two orders.

Article information

Electron. J. Probab., Volume 20 (2015), paper no. 58, 25 pp.

Accepted: 29 May 2015
First available in Project Euclid: 4 June 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 92D15: Problems related to evolution

diffusion sampling distribution coupling population genetics recombination

This work is licensed under a Creative Commons Attribution 3.0 License.


Fearnhead, Paul; Jenkins, Paul; Song, Yun. Tractable diffusion and coalescent processes for weakly correlated loci. Electron. J. Probab. 20 (2015), paper no. 58, 25 pp. doi:10.1214/EJP.v20-3564. https://projecteuclid.org/euclid.ejp/1465067164

Export citation


  • Baake, Ellen; Herms, Inke. Single-crossover dynamics: finite versus infinite populations. Bull. Math. Biol. 70 (2008), no. 2, 603–624.
  • Bhaskar, Anand; Song, Yun S. Closed-form asymptotic sampling distributions under the coalescent with recombination for an arbitrary number of loci. Adv. in Appl. Probab. 44 (2012), no. 2, 391–407.
  • Bhaskar, Anand; Kamm, John A.; Song, Yun S. Approximate sampling formulae for general finite-alleles models of mutation. Adv. in Appl. Probab. 44 (2012), no. 2, 408–428.
  • M. Birkner, J. Blath, and B. Eldon. An ancestral recombination graph for diploid populations with skewed offspring distribution. Genetics, 193: 255–290, 2013.
  • S. Boitard and P. Loisel. Probability distribution of haplotype frequencies under the two-locus Wright-Fisher model by diffusion approximation. Theoretical Population Biology, 71: 380–391, 2007.
  • A. H. Chan, P. A. Jenkins, and Y. S. Song. Genome-wide fine-scale recombination rate variation in Drosophila melanogaster. PLoS Genetics, 8 (12): e1003090, 2012.
  • A. M. Etheridge and R. C. Griffiths. A coalescent dual process in a Moran model with genic selection. Theoretical Population Biology, 75: 320–330, 2009.
  • Ethier, S. N. A limit theorem for two-locus diffusion models in population genetics. J. Appl. Probab. 16 (1979), no. 2, 402–408.
  • Ethier, S. N.; Griffiths, R. C. On the two-locus sampling distribution. J. Math. Biol. 29 (1990), no. 2, 131–159.
  • Ethier, S. N.; Nagylaki, Thomas. Diffusion approximations of Markov chains with two time scales and applications to population genetics. Adv. in Appl. Probab. 12 (1980), no. 1, 14–49.
  • Ethier, S. N.; Nagylaki, Thomas. Diffusion approximations of Markov chains with two time scales and applications to population genetics. II. Adv. in Appl. Probab. 20 (1988), no. 3, 525–545.
  • Ethier, S. N.; Nagylaki, Thomas. Diffusion approximations of the two-locus Wright-Fisher model. J. Math. Biol. 27 (1989), no. 1, 17–28.
  • Ethier, Stewart N.; Kurtz, Thomas G. Markov processes. Characterization and convergence. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. John Wiley & Sons, Inc., New York, 1986. x+534 pp. ISBN: 0-471-08186-8
  • Ewens, W. J. The sampling theory of selectively neutral alleles. Theoret. Population Biology 3 (1972), 87–112; erratum, ibid. 3 (1972), 240; erratum, ibid. 3 (1972), 376.
  • Ewens, Warren J. Mathematical population genetics. Biomathematics, 9. Springer-Verlag, Berlin-New York, 1979. xii+325 pp. ISBN: 3-540-09577-2
  • P. Fearnhead and P. Donnelly. Estimating recombination rates from population genetic data. Genetics, 159: 1299–1318, 2001.
  • A. F. Feder, S. Kryazhimskiy, and J. B. Plotkin. Identifying signatures of selection in genetic time series. Genetics, 196: 509–522, 2014.
  • Feller, William. Diffusion processes in genetics. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability, 1950, pp. 227–246. University of California Press, Berkeley and Los Angeles, 1951.
  • G. B. Golding. The sampling distribution of linkage disequilibrium. Genetics, 108: 257–274, 1984.
  • Griffiths, R. C. The two-locus ancestral graph. Selected Proceedings of the Sheffield Symposium on Applied Probability (Sheffield, 1989), 100–117, IMS Lecture Notes Monogr. Ser., 18, Inst. Math. Statist., Hayward, CA, 1991.
  • R. C. Griffiths and P. Marjoram. Ancestral inference from samples of DNA sequences with recombination. Journal of Computational Biology, 3 (4): 479–502, 1996.
  • Griffiths, Robert C.; Jenkins, Paul A.; Song, Yun S. Importance sampling and the two-locus model with subdivided population structure. Adv. in Appl. Probab. 40 (2008), no. 2, 473–500.
  • P. A. Jenkins and R. C. Griffiths. Inference from samples of DNA sequences using a two-locus model. Journal of Computational Biology, 18 (1): 109–127, 2011.
  • P. A. Jenkins and Y. S. Song. Closed-form two-locus sampling distributions: accuracy and universality. Genetics, 183: 1087–1103, 2009.
  • Jenkins, Paul A.; Song, Yun S. An asymptotic sampling formula for the coalescent with recombination. Ann. Appl. Probab. 20 (2010), no. 3, 1005–1028.
  • P. A. Jenkins and Y. S. Song. The effect of recurrent mutation on the frequency spectrum of a segregating site and the age of an allele. Theoretical Population Biology, 80 (2): 158–173, 2011.
  • Jenkins, Paul A.; Song, Yun S. Pade approximants and exact two-locus sampling distributions. Ann. Appl. Probab. 22 (2012), no. 2, 576–607.
  • Kang, Hye-Won; Kurtz, Thomas G.; Popovic, Lea. Central limit theorems and diffusion approximations for multiscale Markov chain models. Ann. Appl. Probab. 24 (2014), no. 2, 721–759.
  • N. Kaplan, T. Darden, and R. R. Hudson. The coalescent process in models with selection. Genetics, 120: 819–829, 1988.
  • Kingman, J. F. C. The coalescent. Stochastic Process. Appl. 13 (1982), no. 3, 235–248.
  • M. K. Kuhner, J. Yamato, and J. Felsenstein. Maximum likelihood estimation of recombination rates from population data. Genetics, 156: 1393–1401, 2000.
  • Kurtz, T. G. Limit theorems for sequences of jump Markov processes approximating ordinary differential processes. J. Appl. Probability 8 1971 344–356.
  • Michalowicz, J. V.; Nichols, J. M.; Bucholtz, F.; Olson, C. C. A general Isserlis theorem for mixed-Gaussian random variables. Statist. Probab. Lett. 81 (2011), no. 8, 1233–1240.
  • C. Miura. On an approximate formula for the distribution of 2-locus 2-allele model with mutual mutations. Genes and Genetic Systems, 86: 207–214, 2011.
  • Moehle, M. A convergence theorem for Markov chains arising in population genetics and the coalescent with selfing. Adv. in Appl. Probab. 30 (1998), no. 2, 493–512.
  • Nagylaki, Thomas. The Gaussian approximation for random genetic drift. Evolutionary processes and theory (Israel, 1985), 629–642, Academic Press, Orlando, FL, 1986.
  • Nagylaki, Thomas. Models and approximations for random genetic drift. Theoret. Population Biol. 37 (1990), no. 1, 192–212.
  • R. Nielsen. Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics, 154: 931–942, 2000.
  • Norman, M. Frank. Markov processes and learning models. Mathematics in Science and Engineering, Vol. 84. Academic Press, New York-London, 1972. xiii+274 pp.
  • Norman, M. Frank. Approximation of stochastic processes by Gaussian diffusions, and applications to Wright-Fisher genetic models. SIAM J. Appl. Math. 29 (1975), no. 2, 225–242.
  • T. Ohta and M. Kimura. Linkage disequilibrium due to random genetic drift. Genetical Research, 13 (1): 47–55, 1969.
  • T. Ohta and M. Kimura. Linkage disequilibrium at steady state determined by random genetic drift and recurrent mutations. Genetics, 63: 229–238, 1969.
  • M. D. Rasmussen, M. J. Hubisz, I. Gronau, and A. Siepel. Genome-wide inference of ancestral recombination graphs. PLOS Genetics, 10 (5): e1004342, 2014.
  • J. Wakeley. The limits of theoretical population genetics. Genetics, 169: 1–7, 2005.
  • J. Wakeley. Coalescent theory: an introduction. Roberts & Company Publishers, Greenwood Village, Colorado, 2008.
  • J. Wakeley and O. Sargsyan. The conditional ancestral selection graph with strong balancing selection. Theoretical Population Biology, 75: 355–364, 2009.
  • Y. Wang and B. Rannala. Bayesian inference of fine-scale recombination rates using population genomic data. Philosophical Transactions of the Royal Society B, 363 (1512): 3921–3930, 2008.
  • S. Wright. Adaptation and selection. In G. L. Jepson, E. Mayr, and G. G. Simpson, editors, Genetics, Paleontology and Evolution, pages 365–389. Princeton University Press, Princeton, 1949.