## The Annals of Applied Probability

### On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size

Koji Tsukuda

#### Abstract

The Ewens sampling formula was first introduced in the context of population genetics by Warren John Ewens in 1972, and has appeared in a lot of other scientific fields. There are abundant approximation results associated with the Ewens sampling formula especially when one of the parameters, the sample size $n$ or the mutation parameter $\theta$ which denotes the scaled mutation rate, tends to infinity while the other is fixed. By contrast, the case that $\theta$ grows with $n$ has been considered in a relatively small number of works, although this asymptotic setup is also natural. In this paper, when $\theta$ grows with $n$, we advance the study concerning the asymptotic properties of the total number of alleles and of the component counts in the allelic partition assuming the Ewens sampling formula, from the viewpoint of Poisson approximations. Specifically, the main contributions of this paper are deriving Poisson approximations of the total number of alleles, an independent process approximation of small component counts, and functional central limit theorems, under the asymptotic regime that both $n$ and $\theta$ tend to infinity.

#### Article information

Source
Ann. Appl. Probab., Volume 29, Number 2 (2019), 1188-1232.

Dates
Received: August 2017
Revised: August 2018
First available in Project Euclid: 24 January 2019

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1548298939

Digital Object Identifier
doi:10.1214/18-AAP1433

Mathematical Reviews number (MathSciNet)
MR3910026

Zentralblatt MATH identifier
07047447

#### Citation

Tsukuda, Koji. On Poisson approximations for the Ewens sampling formula when the mutation parameter grows with the sample size. Ann. Appl. Probab. 29 (2019), no. 2, 1188--1232. doi:10.1214/18-AAP1433. https://projecteuclid.org/euclid.aoap/1548298939

#### References

• Antoniak, C. E. (1974). Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Statist. 2 1152–1174.
• Arratia, R., Barbour, A. D. and Tavaré, S. (1992). Poisson process approximations for the Ewens sampling formula. Ann. Appl. Probab. 2 519–535.
• Arratia, R., Barbour, A. D. and Tavaré, S. (2000). Limits of logarithmic combinatorial structures. Ann. Probab. 28 1620–1644.
• Arratia, R., Barbour, A. D. and Tavaré, S. (2016). Exploiting the Feller coupling for the Ewens sampling formula [comment on MR3458585]. Statist. Sci. 31 27–29.
• Arratia, R. and DeSalvo, S. (2016). Probabilistic divide-and-conquer: A new exact simulation method, with integer partitions as an example. Combin. Probab. Comput. 25 324–351.
• Arratia, R., Stark, D. and Tavaré, S. (1995). Total variation asymptotics for Poisson process approximations of logarithmic combinatorial assemblies. Ann. Probab. 23 1347–1388.
• Arratia, R. and Tavaré, S. (1992a). Limit theorems for combinatorial structures via discrete process approximations. Random Structures Algorithms 3 321–345.
• Arratia, R. and Tavaré, S. (1992b). The cycle structure of random permutations. Ann. Probab. 20 1567–1591.
• Arratia, R., Barbour, A. D., Ewens, W. J. and Tavaré, S. (2018). Dual diffusions, killed diffusions, and the age distribution problem in population genetics. Theor. Popul. Biol. 122 5–11.
• Barbour, A. D. (1992). Refined approximations for the Ewens sampling formula. Random Structures Algorithms 3 267–276.
• Barbour, A. D. and Hall, P. (1984). On the rate of Poisson convergence. Math. Proc. Cambridge Philos. Soc. 95 473–480.
• Barbour, A. D., Holst, L. and Janson, S. (1992). Poisson Approximation. Oxford Studies in Probability 2. The Clarendon Press, New York.
• Crane, H. (2016). The ubiquitous Ewens sampling formula. Statist. Sci. 31 1–19.
• DeLaurentis, J. M. and Pittel, B. G. (1985). Random permutations and Brownian motion. Pacific J. Math. 119 287–301.
• DeSalvo, S. (2018). Probabilistic divide-and-conquer: Deterministic second half. Adv. in Appl. Math. 92 17–50.
• Ewens, W. J. (1972). The sampling theory of selectively neutral alleles. Theor. Popul. Biol. 3 87–112; erratum, ibid. 3 (1972), 240; erratum, ibid. 3 (1972), 376.
• Favaro, S. and James, L. F. (2016). Relatives of the Ewens sampling formula in Bayesian nonparametrics [comment on MR3458585]. Statist. Sci. 31 30–33.
• Feng, S. (2007). Large deviations associated with Poisson–Dirichlet distribution and Ewens sampling formula. Ann. Appl. Probab. 17 1570–1595.
• Feng, S. (2010). The Poisson–Dirichlet Distribution and Related Topics: Models and Asymptotic Behaviors. Springer, Heidelberg.
• Feng, S. (2016). Diffusion processes and the Ewens sampling formula [comment on MR3458585]. Statist. Sci. 31 20–22.
• Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
• Flajolet, P. and Soria, M. (1990). Gaussian limiting distributions for the number of components in combinatorial structures. J. Combin. Theory Ser. A 53 165–182.
• Goncharov, V. L. (1944). Some facts from combinatorics. Izv. Akad. Nauk SSSR, Ser. Mat. 8 3–48.
• Hansen, J. C. (1990). A functional central limit theorem for the Ewens sampling formula. J. Appl. Probab. 27 28–43.
• Johnson, N. L., Kotz, S. and Balakrishnan, N. (1997). Discrete Multivariate Distributions. Wiley, New York.
• Knuth, D. E. and Wilf, H. S. (1989). A short proof of Darboux’s lemma. Appl. Math. Lett. 2 139–140.
• Mano, S. (2017). Extreme sizes in Gibbs-type exchangeable random partitions. Ann. Inst. Statist. Math. 69 1–37.
• Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145–158.
• Shepp, L. A. and Lloyd, S. P. (1966). Ordered cycle lengths in a random permutation. Trans. Amer. Math. Soc. 121 340–357.
• Teh, Y. W. (2016). Bayesian nonparametric modeling and the ubiquitous Ewens sampling formula [comment on MR3458585]. Statist. Sci. 31 34–36.
• Tsukuda, K. (2017a). A change detection procedure for an ergodic diffusion process. Ann. Inst. Statist. Math. 69 833–864.
• Tsukuda, K. (2017b). Estimating the large mutation parameter of the Ewens sampling formula. J. Appl. Probab. 54 42–54. Correction: to appear in J. Appl. Probab. 55, no. 3.
• Tsukuda, K. (2018). Functional central limit theorems in $L^{2}(0,1)$ for logarithmic combinatorial assemblies. Bernoulli 24 1033–1052.
• van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
• Varron, D. (2014). Donsker and Glivenko–Cantelli theorems for a class of processes generalizing the empirical process. Electron. J. Stat. 8 2296–2320.
• Watterson, G. A. (1974a). Models for the logarithmic species abundance distributions. Theor. Popul. Biol. 6 217–250.
• Watterson, G. A. (1974b). The sampling theory of selectively neutral alleles. Adv. in Appl. Probab. 6 463–488.
• Yamato, H. (2013). Edgeworth expansions for the number of distinct components associated with the Ewens sampling formula. J. Japan Statist. Soc. 43 17–28.
• Yannaros, N. (1991). Poisson approximation for random sums of Bernoulli random variables. Statist. Probab. Lett. 11 161–165.