## Electronic Journal of Probability

### Extremes and gaps in sampling from a GEM random discrete distribution

#### Abstract

We show that in a sample of size $n$ from a $\mathsf{GEM} (0,\theta )$ random discrete distribution, the gaps $G_{i:n}:= X_{n-i+1:n} - X_{n-i:n}$ between order statistics $X_{1:n} \le \cdots \le X_{n:n}$ of the sample, with the convention $G_{n:n} := X_{1:n} - 1$, are distributed like the first $n$ terms of an infinite sequence of independent geometric$(i/(i+\theta ))$ variables $G_i$. This extends a known result for the minimum $X_{1:n}$ to other gaps in the range of the sample, and implies that the maximum $X_{n:n}$ has the distribution of $1 + \sum _{i=1}^n G_i$, hence the known result that $X_{n:n}$ grows like $\theta \log (n)$ as $n\to \infty$, with an asymptotically normal distribution. Other consequences include most known formulas for the exact distributions of $\mathsf{GEM} (0,\theta )$ sampling statistics, including the Ewens and Donnelly–Tavaré sampling formulas. For the two-parameter GEM$(\alpha ,\theta )$ distribution we show that the maximal value grows like a random multiple of $n^{\alpha /(1-\alpha )}$ and find the limit distribution of the multiplier.

#### Article information

Source
Electron. J. Probab., Volume 22 (2017), paper no. 44, 26 pp.

Dates
Accepted: 21 April 2017
First available in Project Euclid: 3 May 2017

https://projecteuclid.org/euclid.ejp/1493777020

Digital Object Identifier
doi:10.1214/17-EJP59

Mathematical Reviews number (MathSciNet)
MR3646070

Zentralblatt MATH identifier
1364.60069

#### Citation

Pitman, Jim; Yakubovich, Yuri. Extremes and gaps in sampling from a GEM random discrete distribution. Electron. J. Probab. 22 (2017), paper no. 44, 26 pp. doi:10.1214/17-EJP59. https://projecteuclid.org/euclid.ejp/1493777020

#### References

• [1] Gerold Alsmeyer, Alexander Iksanov, and Alexander Marynych, Functional limit theorems for the number of occupied boxes in the Bernoulli sieve, Stochastic Process. Appl. 127 (2017), no. 3, 995–1017.
• [2] Charles E. Antoniak, Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems, Ann. Statist. 2 (1974), 1152–1174.
• [3] Richard Arratia, A. D. Barbour, and Simon Tavaré, Poisson process approximations for the Ewens sampling formula, Ann. Appl. Probab. 2 (1992), no. 3, 519–535.
• [4] Richard Arratia, A. D. Barbour, and Simon Tavaré, Logarithmic combinatorial structures: a probabilistic approach, EMS Monographs in Mathematics, European Mathematical Society (EMS), Zürich, 2003.
• [5] Richard Arratia, A. D. Barbour, and Simon Tavaré, A tale of three couplings: Poisson–Dirichlet and GEM approximations for random permutations, Combin. Probab. Comput. 15 (2006), nos 1–2, 31–62.
• [6] Richard Arratia, A. D. Barbour, and Simon Tavaré, Exploiting the Feller coupling for the Ewens sampling formula [comment on MR3458585], Statist. Sci. 31 (2016), no. 1, 27–29.
• [7] Yuliy Baryshnikov, Bennett Eisenberg, and Gilbert Stengle, A necessary and sufficient condition for the existence of the limiting probability of a tie for first place, Statist. Probab. Lett. 23 (1995), no. 3, 203–209.
• [8] Arup Bose, Anirban Dasgupta, and Herman Rubin, A contemporary review and bibliography of infinitely divisible distributions and processes, Sankhyā Ser. A 64 (2002), no. 3, part 2, 763–819, Special issue in memory of D. Basu.
• [9] Jos J. A. M. Brands, Frederik W. Steutel, and Roeland J. G. Wilms, On the number of maxima in a discrete sample, Statist. Probab. Lett. 20 (1994), no. 3, 209–217.
• [10] Franz Thomas Bruss and Rudolf Grübel, On the multiplicity of the maximum in a discrete random sample, Ann. Appl. Probab. 13 (2003), no. 4, 1252–1263.
• [11] Cristina Costantini, Pierpaolo De Blasi, Stewart N. Ethier, Matteo Ruggiero, and Dario Spano, Wright–Fisher construction of the two-parameter Poisson–Dirichlet diffusion, arXiv preprint arXiv:1601.06064 (2016).
• [12] Harry Crane, The ubiquitous Ewens sampling formula, Statist. Sci. 31 (2016), no. 1, 1–19.
• [13] Harry Crane, Rejoinder: The ubiquitous Ewens sampling formula [ MR3458586; MR3458587; MR3458588; MR3458589; MR3458590; MR3458585], Statist. Sci. 31 (2016), no. 1, 37–39.
• [14] Herbert A. David and Haikady N. Nagaraja, Order statistics, third ed., Wiley Series in Probability and Statistics, Wiley-Interscience [John Wiley & Sons], Hoboken, NJ, 2003.
• [15] Peter Donnelly, Partition structures, Pólya urns, the Ewens sampling formula, and the ages of alleles, Theoret. Population Biol. 30 (1986), no. 2, 271–288.
• [16] Peter Donnelly, The heaps process, libraries, and size-biased permutations, J. Appl. Probab. 28 (1991), no. 2, 321–335.
• [17] Peter Donnelly and Paul Joyce, Continuity and weak convergence of ranked and size-biased permutations on the infinite simplex, Stochastic Process. Appl. 31 (1989), no. 1, 89–103.
• [18] Peter Donnelly and Simon Tavaré, The ages of alleles and a coalescent, Adv. in Appl. Probab. 18 (1986), no. 1, 1–19.
• [19] Bennett Eisenberg, The number of players tied for the record, Statist. Probab. Lett. 79 (2009), no. 3, 283–288.
• [20] Steinar Engen, Stochastic abundance models, Chapman and Hall, London; Halsted Press [John Wiley & Sons], New York, 1978, With emphasis on biological communities and species diversity, Monographs on Applied Probability and Statistics.
• [21] Steiner Engen, A note on the geometric series as a species frequency model, Biometrika 62 (1975), no. 3, 697–699.
• [22] Arthur Erdélyi, Wilhelm Magnus, Fritz Oberhettinger, and Francesco G. Tricomi, Higher transcendental functions. Vols. I, II, McGraw-Hill Book Company, Inc., New York-Toronto-London, 1953, Based, in part, on notes left by Harry Bateman.
• [23] Warren J. Ewens, The sampling theory of selectively neutral alleles, Theoret. Population Biology 3 (1972), 87–112; erratum, ibid. 3 (1972), 240; erratum, ibid. 3 (1972), 376.
• [24] Warren J. Ewens, Mathematical population genetics. I, second ed., Interdisciplinary Applied Mathematics, vol. 27, Springer-Verlag, New York, 2004, Theoretical introduction.
• [25] William Feller, An introduction to probability theory and its applications. Vol. I, Third edition, John Wiley & Sons, Inc., New York-London-Sydney, 1968.
• [26] Shui Feng, The Poisson–Dirichlet distribution and related topics, Probability and its Applications (New York), Springer, Heidelberg, 2010, Models and asymptotic behaviors.
• [27] Thomas S. Ferguson, On characterizing distributions by properties of order statistics, Sankhyā Ser. A 29 (1967), 265–278.
• [28] Alexander Gnedin, Ben Hansen, and Jim Pitman, Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws, Probab. Surv. 4 (2007), 146–171.
• [29] Alexander Gnedin, Alex Iksanov, and Uwe Roesler, Small parts in the Bernoulli sieve, Fifth Colloquium on Mathematics and Computer Science, Discrete Math. Theor. Comput. Sci. Proc., AI, Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 2008, pp. 235–242.
• [30] Alexander Gnedin, Alexander Iksanov, and Alexander Marynych, The Bernoulli sieve: an overview, 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10), Discrete Math. Theor. Comput. Sci. Proc., AM, Assoc. Discrete Math. Theor. Comput. Sci., Nancy, 2010, pp. 329–341.
• [31] Alexander Gnedin and Jim Pitman, Regenerative composition structures, Ann. Probab. 33 (2005), no. 2, 445–479.
• [32] Alexander Gnedin and Jim Pitman, Exchangeable Gibbs partitions and Stirling triangles, Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 325 (2005), no. Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 12, 83–102, 244–245.
• [33] Alexander Gnedin and Jim Pitman, Self-similar and Markov composition structures, Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 326 (2005), no. Teor. Predst. Din. Sist. Komb. i Algoritm. Metody. 13, 59–84, 280–281.
• [34] Alexander Gnedin and Jim Pitman, Poisson representation of a Ewens fragmentation process, Combin. Probab. Comput. 16 (2007), no. 6, 819–827.
• [35] Alexander V. Gnedin, On convergence and extensions of size-biased permutations, J. Appl. Probab. 35 (1998), no. 3, 642–650.
• [36] Alexander V. Gnedin, The Bernoulli sieve, Bernoulli 10 (2004), no. 1, 79–96.
• [37] Alexander V. Gnedin, Alexander M. Iksanov, Pavlo Negadajlov, and Uwe Rösler, The Bernoulli sieve revisited, Ann. Appl. Probab. 19 (2009), no. 4, 1634–1655.
• [38] Louis Gordon, A stochastic approach to the gamma function, Amer. Math. Monthly 101 (1994), no. 9, 858–865.
• [39] Ronald L. Graham, Donald E. Knuth, and Oren Patashnik, Concrete mathematics, Addison-Wesley Publishing Company, Advanced Book Program, Reading, MA, 1989, A foundation for computer science.
• [40] Robert C. Griffiths, Unpublished notes, Monash Univ., Melbourne, Australia, 1980.
• [41] Rudolf Grübel and Paweł Hitczenko, Gaps in discrete random samples, J. Appl. Probab. 46 (2009), no. 4, 1038–1051.
• [42] Paul R. Halmos, Random alms, Ann. Math. Statistics 15 (1944), 182–189.
• [43] Paweł Hitczenko and Arnold Knopfmacher, Gap-free compositions and gap-free samples of geometric random variables, Discrete Math. 294 (2005), no. 3, 225–239.
• [44] Tsvetan Ignatov, A constant arising in the asymptotic theory of symmetric groups, and Poisson–Dirichlet measures, Teor. Veroyatnost. i Primenen. 27 (1982), no. 1, 129–140.
• [45] Alexander M. Iksanov, Alexander V. Marynych, and Vladimir A. Vatutin, Weak convergence of finite-dimensional distributions of the number of empty boxes in the Bernoulli sieve, Theory Probab. Appl. 59 (2015), no. 1, 87–113.
• [46] Alexander Iksanov, On the number of empty boxes in the Bernoulli sieve II, Stochastic Process. Appl. 122 (2012), no. 7, 2701–2729.
• [47] Alexander Iksanov, On the number of empty boxes in the Bernoulli sieve I, Stochastics 85 (2013), no. 6, 946–959.
• [48] Lancelot F. James, Lamperti-type laws, Ann. Appl. Probab. 20 (2010), no. 4, 1303–1340.
• [49] John F. C. Kingman, Random discrete distributions, J. Roy. Statist. Soc. Ser. B 37 (1975), 1–22.
• [50] John F. C. Kingman, The representation of partition structures, J. London Math. Soc. (2) 18 (1978), no. 2, 374–380.
• [51] Fabrizio Leisen, Antonio Lijoi, and Christian Paroissin, Limiting behavior of the search cost distribution for the move-to-front rule in the stable case, Statist. Probab. Lett. 81 (2011), no. 12, 1827–1832.
• [52] John William McCloskey, A model for distribution of individuals by species in an environment, Annals of Mathematical Statistics 35 (1964), no. 4, 1839–1840, Abstract of Ph. D. Thesis, Michigan State University, 1965.
• [53] Valery B. Nevzorov, Records: Mathematical theory, Translations of Mathematical Monographs, vol. 194, American Mathematical Society, Providence, RI, 2001, Translated from the Russian manuscript by D. M. Chibisov.
• [54] Fritz Oberhettinger, Tables of Mellin transforms, Springer-Verlag, New York-Heidelberg, 1974.
• [55] Anthony G. Pakes, The laws of some random series of independent summands, Advances in the theory and practice of statistics, Wiley Ser. Probab. Statist. Appl. Probab. Statist., Wiley, New York, 1997, pp. 499–516.
• [56] Mihael Perman, Jim Pitman, and Marc Yor, Size-biased sampling of Poisson point processes and excursions, Probab. Theory Related Fields 92 (1992), no. 1, 21–39.
• [57] Leonid A. Petrov, A two-parameter family of infinite-dimensional diffusions on the Kingman simplex, Funktsional. Anal. i Prilozhen. 43 (2009), no. 4, 45–66.
• [58] Jim Pitman, Exchangeable and partially exchangeable random partitions, Probab. Theory Related Fields 102 (1995), no. 2, 145–158.
• [59] Jim Pitman, Random discrete distributions invariant under size-biased permutation, Adv. in Appl. Probab. 28 (1996), no. 2, 525–539.
• [60] Jim Pitman, Combinatorial stochastic processes, Lecture Notes in Mathematics, vol. 1875, Springer-Verlag, Berlin, 2006, Lectures from the 32nd Summer School on Probability Theory held in Saint-Flour, July 7–24, 2002, With a foreword by Jean Picard.
• [61] Jim Pitman, Extremes and gaps in sampling from a residual allocation model, 2017, In preparation.
• [62] Jim Pitman and Yuri Yakubovich, Ordered and size-biased frequencies in GEM and Gibbs models for species sampling, arXiv preprint, 2017, arXiv:1704.04732.
• [63] Jim Pitman and Marc Yor, The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator, Ann. Probab. 25 (1997), no. 2, 855–900.
• [64] Yongcheng Qi, A note on the number of maxima in a discrete sample, Statist. Probab. Lett. 33 (1997), no. 4, 373–377.
• [65] Alfréd Rényi, On the theory of order statistics, Acta Math. Acad. Sci. Hungar. 4 (1953), 191–231.
• [66] Philippe Robert and Florian Simatos, Occupancy schemes associated to Yule processes, Adv. in Appl. Probab. 41 (2009), no. 2, 600–622.
• [67] Ian W. Saunders, Simon Tavaré, and G. A. Watterson, On the genealogy of nested subsamples from a haploid population, Adv. in Appl. Probab. 16 (1984), no. 3, 471–491.
• [68] Stanley Sawyer and Daniel Hartl, A sampling theory for local selection, Journal of Genetics 64 (1985), no. 1, 21–29.
• [69] George P. Yanev and Santanu Chakraborty, A characterization of exponential distribution and the Sukhatme–Rényi decomposition of exponential maxima, Statist. Probab. Lett. 110 (2016), 94–102.