## Advances in Applied Probability

- Adv. in Appl. Probab.
- Volume 44, Number 2 (2012), 408-428.

### Approximate sampling formulae for general finite-alleles models of mutation

Anand Bhaskar, John A. Kamm, and Yun S. Song

#### Abstract

Many applications in genetic analyses utilize sampling distributions, which describe the probability of observing a sample of DNA sequences randomly drawn from a population. In the one-locus case with special models of mutation, such as the infinite-alleles model or the finite-alleles parent-independent mutation model, closed-form sampling distributions under the coalescent have been known for many decades. However, no exact formula is currently known for more general models of mutation that are of biological interest. In this paper, models with finitely-many alleles are considered, and an urn construction related to the coalescent is used to derive approximate closed-form sampling formulae for an arbitrary irreducible recurrent mutation model or for a reversible recurrent mutation model, depending on whether the number of distinct observed allele types is at most three or four, respectively. It is demonstrated empirically that the formulae derived here are highly accurate when the per-base mutation rate is low, which holds for many biological organisms.

#### Article information

**Source**

Adv. in Appl. Probab. Volume 44, Number 2 (2012), 408-428.

**Dates**

First available in Project Euclid: 16 June 2012

**Permanent link to this document**

http://projecteuclid.org/euclid.aap/1339878718

**Digital Object Identifier**

doi:10.1239/aap/1339878718

**Mathematical Reviews number (MathSciNet)**

MR2977402

**Zentralblatt MATH identifier**

1241.92053

**Subjects**

Primary: 92D15: Problems related to evolution

Secondary: 65C50: Other computational problems in probability 92D10: Genetics {For genetic algebras, see 17D92} 41A58: Series expansions (e.g. Taylor, Lidstone series, but not Fourier series)

**Keywords**

Sampling probability coalescent theory urn model martingale

#### Citation

Bhaskar, Anand; Kamm, John A.; Song, Yun S. Approximate sampling formulae for general finite-alleles models of mutation. Adv. in Appl. Probab. 44 (2012), no. 2, 408--428. doi:10.1239/aap/1339878718. http://projecteuclid.org/euclid.aap/1339878718.