Advances in Applied Probability

Approximate sampling formulae for general finite-alleles models of mutation

Anand Bhaskar, John A. Kamm, and Yun S. Song
Source: Adv. in Appl. Probab. Volume 44, Number 2 (2012), 408-428.

Abstract

Many applications in genetic analyses utilize sampling distributions, which describe the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation, such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions under the coalescent have been known for many decades. However, no exact formula is currently known for more general models of mutation that are of biological interest. In this paper, models with finitely-many alleles are considered, and an urn construction related to the coalescent is used to derive approximate closed-form sampling formulae for an arbitrary irreducible recurrent mutation model or for a reversible recurrent mutation model, depending on whether the number of distinct observed allele types is at most three or four, respectively. It is demonstrated empirically that the formulae derived here are highly accurate when the per-base mutation rate is low, which holds for many biological organisms.

First Page: Show Hide
Primary Subjects: 92D15
Secondary Subjects: 65C50, 92D10, 41A58
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aap/1339878718
Digital Object Identifier: doi:10.1239/aap/1339878718
Zentralblatt MATH identifier: 06055128
Mathematical Reviews number (MathSciNet): MR2977402

References

Arratia, A., Barbour, A. D. and Tavaré, S. (2003). Logarithmic Combinatorial Structures: A Probabilistic Approach. European Mathematical Society, Zürich.
Mathematical Reviews (MathSciNet): MR2032426
Zentralblatt MATH: 1040.60001
Bhaskar, A. and Song, Y. S. (2012). Closed-form asymptotic sampling distributions under the coalescent with recombination for an arbitrary number of loci. Adv. Appl. Prob. 44, 391–407.
Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theoret. Pop. Biol. 3, 87–112.
Mathematical Reviews (MathSciNet): MR325177
Digital Object Identifier: doi:10.1016/0040-5809(72)90035-4
Fu, Y.-X. (1995). Statistical properties of segregating sites. Theoret. Pop. Biol. 48, 172–197.
Griffiths, R. C. (2003). The frequency spectrum of a mutation, and its age, in a general diffusion model. Theoret. Pop. Biol. 64, 241–251.
Griffiths, R. C. and Lessard, S. (2005). Ewens' sampling formula and related formulae: combinatorial proofs, extensions to variable population size and applications to ages of alleles. Theoret. Pop. Biol. 68, 167–77.
Griffiths, R. C. and Tavaré, S. (1994). Ancestral inference in population genetics. Statist. Sci. 9, 307–319.
Mathematical Reviews (MathSciNet): MR1325431
Digital Object Identifier: doi:10.1214/ss/1177010378
Project Euclid: euclid.ss/1177010378
Griffiths, R. C. and Tavaré, S. (1994). Sampling theory for neutral alleles in a varying environment. Phil. Trans. R. Soc. London B 344, 403–410.
Hoppe, F. M. (1984). Pólya-like urns and the Ewens' sampling formula. J. Math. Biol. 20, 91–94.
Mathematical Reviews (MathSciNet): MR758915
Digital Object Identifier: doi:10.1007/BF00275863
Jenkins, P. A. and Song, Y. S. (2009). Closed-form two-locus sampling distributions: accuracy and universality. Genetics 183, 1087–1103.
Jenkins, P. A. and Song, Y. S. (2010). An asymptotic sampling formula for the coalescent with recombination. Ann. Appl. Prob. 20, 1005–1028.
Mathematical Reviews (MathSciNet): MR2680556
Zentralblatt MATH: 1193.92077
Digital Object Identifier: doi:10.1214/09-AAP646
Project Euclid: euclid.aoap/1276867305
Jenkins, P. A. and Song, Y. S. (2011). The effect of recurrent mutation on the frequency spectrum of a segregating site and the age of an allele. Theoret. Pop. Biol. 80, 158–173.
Jenkins, P. A. and Song, Y. S. (2012). Padé approximants and exact two-locus sampling distributions. Ann. Appl. Prob. 22, 576–607.
Kingman, J. F. C. (1982). The coalescent. Stoch. Process. Appl. 13, 235–248.
Mathematical Reviews (MathSciNet): MR671034
Zentralblatt MATH: 0491.60076
Digital Object Identifier: doi:10.1016/0304-4149(82)90011-4
Kingman, J. F. C. (1982). On the genealogy of large populations. In Essays in Statistical Science (J. Appl. Prob. Spec. Vol. 19A), eds J. Gani and E. J. Hannan, Applied Probability Trust, Sheffield, pp. 27–43.
Mathematical Reviews (MathSciNet): MR633178
Digital Object Identifier: doi:10.2307/3213548
Nachman, M. W. and Crowell, S. L. (2000). Estimate of the mutation rate per nucleotide in humans. Genetics 156, 297–304.
Pitman, J. (1992). The two-parameter generalization of Ewens' random partition structure. Tech. Rep. 345, Department of Statistics, University of California, Berkeley.
Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Prob. Theory Relat. Fields 102, 145–158.
Mathematical Reviews (MathSciNet): MR1337249
Zentralblatt MATH: 0821.60047
Digital Object Identifier: doi:10.1007/BF01213386
Stephens, M. (2001). Inference under the coalescent. In Handbook of Statistical Genetics, eds D. Balding, M. Bishop, and C. Cannings, John Wiley, Chichester, pp. 213–238.
Wright, S. (1949). Adaptation and selection. In Genetics, Paleontology, and Evolution, eds G. L. Jepson, G. G. Simpson, and E. Mayr, Princeton University Press, pp. 365–389.
Yang, Z. (1994). Estimating the pattern of nucleotide substitution. J. Molec. Evol. 39, 105–111.

2013 © Applied Probability Trust

Advances in Applied Probability

Advances in Applied Probability