The Annals of Applied Probability

An asymptotic sampling formula for the coalescent with Recombination

Paul A. Jenkins and Yun S. Song
Source: Ann. Appl. Probab. Volume 20, Number 3 (2010), 1005-1028.

Abstract

Ewens sampling formula (ESF) is a one-parameter family of probability distributions with a number of intriguing combinatorial connections. This elegant closed-form formula first arose in biology as the stationary probability distribution of a sample configuration at one locus under the infinite-alleles model of mutation. Since its discovery in the early 1970s, the ESF has been used in various biological applications, and has sparked several interesting mathematical generalizations. In the population genetics community, extending the underlying random-mating model to include recombination has received much attention in the past, but no general closed-form sampling formula is currently known even for the simplest extension, that is, a model with two loci. In this paper, we show that it is possible to obtain useful closed-form results in the case the population-scaled recombination rate ρ is large but not necessarily infinite. Specifically, we consider an asymptotic expansion of the two-locus sampling formula in inverse powers of ρ and obtain closed-form expressions for the first few terms in the expansion. Our asymptotic sampling formula applies to arbitrary sample sizes and configurations.

First Page: Show Hide
Primary Subjects: 92D15
Secondary Subjects: 65C50, 92D10
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoap/1276867305
Digital Object Identifier: doi:10.1214/09-AAP646
Zentralblatt MATH identifier: 1193.92077
Mathematical Reviews number (MathSciNet): MR2680556

References

Arratia, A., Barbour, A. D. and Tavaré, S. (2003). Logarithmic Combinatorial Structures: A Probabilistic Approach. European Mathematical Society Publishing House, Switzerland.
Mathematical Reviews (MathSciNet): MR2032426
Zentralblatt MATH: 1040.60001
De Iorio, M. and Griffiths, R. C. (2004a). Importance sampling on coalescent histories. I. Adv. in Appl. Probab. 36 417–433.
Mathematical Reviews (MathSciNet): MR2058143
Zentralblatt MATH: 1045.62111
Digital Object Identifier: doi:10.1239/aap/1086957579
Project Euclid: euclid.aap/1086957579
De Iorio, M. and Griffiths, R. C. (2004b). Importance sampling on coalescent histories. II. Adv. in Appl. Probab. 36 434–454.
Mathematical Reviews (MathSciNet): MR2058143
Zentralblatt MATH: 1045.62111
Digital Object Identifier: doi:10.1239/aap/1086957579
Project Euclid: euclid.aap/1086957579
Ethier, S. N. and Griffiths, R. C. (1990). On the two-locus sampling distribution. J. Math. Biol. 29 131–159.
Mathematical Reviews (MathSciNet): MR1116000
Zentralblatt MATH: 0729.92012
Digital Object Identifier: doi:10.1007/BF00168175
Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3 87–112.
Mathematical Reviews (MathSciNet): MR325177
Digital Object Identifier: doi:10.1016/0040-5809(72)90035-4
Fearnhead, P. and Donnelly, P. (2001). Estimating recombination rates from population genetic data. Genetics 159 1299–1318.
Golding, G. B. (1984). The sampling distribution of linkage disequilibrium. Genetics 108 257–274.
Griffiths, R. C. (1981). Neutral two-locus multiple allele models with recombination. Theor. Popul. Biol. 19 169–186.
Mathematical Reviews (MathSciNet): MR630871
Zentralblatt MATH: 0512.92012
Digital Object Identifier: doi:10.1016/0040-5809(81)90016-2
Griffiths, R. C. (1991). The two-locus ancestral graph. In Selected Proceedings of the Sheffield Symposium on Applied Probability. IMS Lecture Notes—Monograph Series (I. V. Basawa and R. L. Taylor, eds.) 18 100–117. IMS, Hayward, CA.
Mathematical Reviews (MathSciNet): MR1193063
Zentralblatt MATH: 0781.92022
Griffiths, R. C., Jenkins, P. A. and Song, Y. S. (2008). Importance sampling and the two-locus model with subdivided population structure. Adv. in Appl. Probab. 40 473–500.
Mathematical Reviews (MathSciNet): MR2433706
Zentralblatt MATH: 1144.62092
Digital Object Identifier: doi:10.1239/aap/1214950213
Project Euclid: euclid.aap/1214950213
Griffiths, R. C. and Lessard, S. (2005). Ewens’ sampling formula and related formulae: Combinatorial proofs, extensions to variable population size and applications to ages of alleles. Theor. Popul. Biol. 68 167–177.
Griffiths, R. C. and Marjoram, P. (1996). Ancestral inference from samples of DNA sequences with recombination. J. Comput. Biol. 3 479–502.
Hoppe, F. (1984). Pólya-like urns and the Ewens’ sampling formula. J. Math. Biol. 20 91–94.
Hudson, R. R. (1985). The sampling distribution of linkage disequilibrium under an infinite allele model without selection. Genetics 109 611–631.
Hudson, R. R. (2001). Two-locus sampling distributions and their application. Genetics 159 1805–1817.
Jenkins, P. A. and Song, Y. S. (2009). Closed-form two-locus sampling distributions: Accuracy and universality. Genetics 183 1087–1103.
Kingman, J. F. C. (1982a). The coalescent. Stochastic Process. Appl. 13 235–248.
Mathematical Reviews (MathSciNet): MR671034
Zentralblatt MATH: 0491.60076
Digital Object Identifier: doi:10.1016/0304-4149(82)90011-4
Kingman, J. F. C. (1982b). On the genealogy of large populations. J. Appl. Probab. 19 27–43.
Kuhner, M. K., Yamato, J. and Felsenstein, J. (2000). Maximum likelihood estimation of recombination rates from population data. Genetics 156 1393–1401.
McVean, G. A. T., Myers, S., Hunt, S., Deloukas, P., Bentley, D. R. and Donnelly, P. (2004). The fine-scale structure of recombination rate variation in the human genome. Science 304 581–584.
Myers, S., Bottolo, L., Freeman, C., McVean, G. and Donnelly, P. (2005). A fine-scale map of recombination rates and hotspots across the human genome. Science 310 321–324.
Zentralblatt MATH: 1073.65036
Nielsen, R. (2000). Estimation of population parameters and recombination rates from single nucleotide polymorphisms. Genetics 154 931–942.
Pitman, J. (1992). The two-parameter generalization of Ewens’ random partition structure. Technical Report 345, Dept. Statistics, Univ. California, Berkeley.
Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145–158.
Mathematical Reviews (MathSciNet): MR1337249
Zentralblatt MATH: 0821.60047
Digital Object Identifier: doi:10.1007/BF01213386
Slatkin, M. (1994). An exact test for neutrality based on the Ewens sampling distribution. Genet. Res. 64 71–74.
Slatkin, M. (1996). A correction to an exact test based on the Ewens sampling distribution. Genet. Res. 68 259–260.
Stephens, M. (2001). Inference under the coalescent. In Handbook of Statistical Genetics (D. Balding, M. Bishop and C. Cannings, eds.) 213–238. Wiley, Chichester, UK.
Stephens, M. and Donnelly, P. (2000). Inference in molecular population genetics. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 605–655.
Mathematical Reviews (MathSciNet): MR1796282
Zentralblatt MATH: 0962.62107
Digital Object Identifier: doi:10.1111/1467-9868.00254
Wang, Y. and Rannala, B. (2008). Bayesian inference of fine-scale recombination rates using population genomic data. Philos. Trans. R. Soc. 363 3921–3930.
Watterson, G. A. (1977). Heterosis or neutrality? Genetics 85 789–814.
Mathematical Reviews (MathSciNet): MR504021

2013 © Institute of Mathematical Statistics

The Annals of Applied Probability

The Annals of Applied Probability

Turn MathJax Off
What is MathJax?