## The Annals of Applied Probability

### Ordered and size-biased frequencies in GEM and Gibbs’ models for species sampling

#### Abstract

We describe the distribution of frequencies ordered by sample values in a random sample of size $n$ from the two parameter $\mathsf{GEM}(\alpha,\theta)$ random discrete distribution on the positive integers. These frequencies are a (size-$\alpha$)-biased random permutation of the sample frequencies in either ranked order, or in the order of appearance of values in the sampling process. This generalizes a well-known identity in distribution due to Donnelly and Tavaré [Adv. in Appl. Probab. 18 (1986) 1–19] for $\alpha=0$ to the case $0\le\alpha<1$. This description extends to sampling from $\operatorname{Gibbs}(\alpha)$ frequencies obtained by suitable conditioning of the $\mathsf{GEM}(\alpha,\theta)$ model, and yields a value-ordered version of the Chinese restaurant construction of $\mathsf{GEM}(\alpha,\theta)$ and $\operatorname{Gibbs}(\alpha)$ frequencies in the more usual size-biased order of their appearance. The proofs are based on a general construction of a finite sample $(X_{1},\dots,X_{n})$ from any random frequencies in size-biased order from the associated exchangeable random partition $\Pi_{\infty}$ of $\mathbb{N}$ which they generate.

#### Article information

Source
Ann. Appl. Probab., Volume 28, Number 3 (2018), 1793-1820.

Dates
Revised: August 2017
First available in Project Euclid: 1 June 2018

https://projecteuclid.org/euclid.aoap/1527840032

Digital Object Identifier
doi:10.1214/17-AAP1343

Mathematical Reviews number (MathSciNet)
MR3809477

Zentralblatt MATH identifier
06919738

Subjects
Primary: 60C05: Combinatorial probability
Secondary: 60G09: Exchangeability

#### Citation

Pitman, Jim; Yakubovich, Yuri. Ordered and size-biased frequencies in GEM and Gibbs’ models for species sampling. Ann. Appl. Probab. 28 (2018), no. 3, 1793--1820. doi:10.1214/17-AAP1343. https://projecteuclid.org/euclid.aoap/1527840032

#### References

• [1] Arratia, R., Barbour, A. D. and Tavaré, S. (2003). Logarithmic Combinatorial Structures: A Probabilistic Approach. European Mathematical Society (EMS), Zürich.
• [2] Bacallado, S., Favaro, S. and Trippa, L. (2015). Looking-backward probabilities for Gibbs-type exchangeable random partitions. Bernoulli 21 1–37.
• [3] Cerquetti, A. (2008). On a Gibbs characterization of normalized generalized Gamma processes. Statist. Probab. Lett. 78 3123–3128.
• [4] Cerquetti, A. (2009). A generalized sequential construction of exchangeable Gibbs partitions with application. In S. Co. 2009. Sixth Conference. Complex Data Modeling and Computationally Intensive Statistical Methods for Estimation and Prediction 115–120. Maggioli Editore, Santarcangelo di Romagna.
• [5] Cerquetti, A. (2013). Marginals of multivariate Gibbs distributions with applications in Bayesian species sampling. Electron. J. Stat. 7 697–716.
• [6] Cerquetti, A. (2013). Some contributions to the theory of conditional Gibbs partitions. In Complex Models and Computational Methods in Statistics. 77–89. Physica-Verlag/Springer, Milan.
• [7] Cesari, O., Favaro, S. and Nipoti, B. (2014). Posterior analysis of rare variants in Gibbs-type species sampling models. J. Multivariate Anal. 131 79–98.
• [8] Costantini, C., De Blasi, P., Ethier, S. N., Ruggiero, M. and Spanò, D. (2017). Wright–Fisher construction of the two-parameter Poisson–Dirichlet diffusion. Ann. Appl. Probab. 27 1923–1950.
• [9] Crane, H. (2016). The ubiquitous Ewens sampling formula. Statist. Sci. 31 1–19.
• [10] Crane, H. (2016). Rejoinder: The ubiquitous Ewens sampling formula. Statist. Sci. 31 37–39.
• [11] De Blasi, P., Favaro, S., Lijoi, A., Mena, R. H., Prünster, I. and Ruggiero, M. (2015). Are Gibbs-type priors the most natural generalization of the Dirichlet process? IEEE Trans. Pattern Anal. Mach. Intell. 37 212–229.
• [12] Donnelly, P. (1991). The heaps process, libraries, and size-biased permutations. J. Appl. Probab. 28 321–335.
• [13] Donnelly, P. and Joyce, P. (1991). Consistent ordered sampling distributions: Characterization and convergence. Adv. in Appl. Probab. 23 229–258.
• [14] Donnelly, P. and Tavaré, S. (1986). The ages of alleles and a coalescent. Adv. in Appl. Probab. 18 1–19.
• [15] Ethier, S. N. (1990). The distribution of the frequencies of age-ordered alleles in a diffusion model. Adv. in Appl. Probab. 22 519–532.
• [16] Favaro, S. and James, L. F. (2015). A note on nonparametric inference for species variety with Gibbs-type priors. Electron. J. Stat. 9 2884–2902.
• [17] Feng, S. (2010). The Poisson–Dirichlet Distribution and Related Topics: Models and Asymptotic Behaviors. Springer, Heidelberg.
• [18] Gnedin, A., Haulk, C. and Pitman, J. (2010). Characterizations of exchangeable partitions and random discrete distributions by deletion properties. In Probability and Mathematical Genetics. London Mathematical Society Lecture Note Series 378 264–298. Cambridge Univ. Press, Cambridge.
• [19] Gnedin, A. and Pitman, J. (2004). Regenerative partition structures. Electron. J. Combin. 11 Research Paper 12.
• [20] Gnedin, A. and Pitman, J. (2005). Regenerative composition structures. Ann. Probab. 33 445–479.
• [21] Gnedin, A. and Pitman, J. (2005). Exchangeable Gibbs partitions and Stirling triangles. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 325 83–102, 244–245.
• [22] Gnedin, A. V. (1997). The representation of composition structures. Ann. Probab. 25 1437–1450.
• [23] Gnedin, A. V. (2010). Regeneration in random combinatorial structures. Probab. Surv. 7 105–156.
• [24] Griffiths, R. C. and Spanò, D. (2007). Record indices and age-ordered frequencies in exchangeable Gibbs partitions. Electron. J. Probab. 12 1101–1130.
• [25] Halmos, P. R. (1944). Random alms. Ann. Math. Stat. 15 182–189.
• [26] Ho, M.-W., James, L. F. and Lau, J. W. (2007). Gibbs partitions (EPPF’s) derived from a stable subordinator are Fox $H$- and Meijer $G$-transforms. Preprint. Available at arXiv:0708.0619v2.
• [27] James, L. F. (2006). Poisson calculus for spatial neutral to the right processes. Ann. Statist. 34 416–440.
• [28] Kerov, S. (2005). Coherent random allocations, and the Ewens–Pitman formula. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 325 127–145, 246. PDMI Preprint, Steklov Math. Institute, St. Petersburg, 1995.
• [29] Kingman, J. F. C. (1978). The representation of partition structures. J. Lond. Math. Soc. (2) 18 374–380.
• [30] Lijoi, A., Prünster, I. and Walker, S. G. (2008). Investigating nonparametric priors with Gibbs structure. Statist. Sinica 18 1653–1668.
• [31] Lijoi, A., Prünster, I. and Walker, S. G. (2008). Bayesian nonparametric estimators derived from conditional Gibbs structures. Ann. Appl. Probab. 18 1519–1547.
• [32] Perman, M., Pitman, J. and Yor, M. (1992). Size-biased sampling of Poisson point processes and excursions. Probab. Theory Related Fields 92 21–39.
• [33] Petrov, L. A. (2009). A two-parameter family of infinite-dimensional diffusions on the Kingman simplex. Funktsional. Anal. i Prilozhen. 43 45–66.
• [34] Pitman, J. (1995). Exchangeable and partially exchangeable random partitions. Probab. Theory Related Fields 102 145–158.
• [35] Pitman, J. (1996). Random discrete distributions invariant under size-biased permutation. Adv. in Appl. Probab. 28 525–539.
• [36] Pitman, J. (2003). Poisson–Kingman partitions. In Statistics and Science: A Festschrift for Terry Speed. Institute of Mathematical Statistics Lecture Notes—Monograph Series 40 1–34. IMS, Beachwood, OH.
• [37] Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Math. 1875. Springer, Berlin.
• [38] Pitman, J. and Yakubovich, Y. (2017). Extremes and gaps in sampling from a GEM random discrete distribution. Electron. J. Probab. 22 Paper No. 44.
• [39] Sawyer, S. and Hartl, D. (1985). A sampling theory for local selection. J. Genet. 64 21–29.
• [40] Watterson, G. A. (1976). Reversibility and the age of an allele. I. Moran’s infinitely many neutral alleles model. Theor. Popul. Biol. 10 239–253.
• [41] Watterson, G. A. (1977). Reversibility and the age of an allele. II. Two-allele models, with selection and mutation. Theor. Popul. Biol. 12 179–196.
• [42] Watterson, G. A. and Guess, H. A. (1977). Is the most frequent allele the oldest? Theor. Popul. Biol. 11 141–160.
• [43] Whittaker, E. T. and Watson, G. N. (1996). A Course of Modern Analysis: An Introduction to the General Theory of Infinite Processes and of Analytic Functions; with an Account of the Principal Transcendental Functions. Cambridge Univ. Press, Cambridge. Reprint of the fourth (1927) edition.