Asymptotics for the Number of Blocks in a Conditional Ewens-pitman Sampling Model

J o u r n a l o f P r o b a b i l i t y Electron. Abstract The study of random partitions has been an active research area in probability over the last twenty years. A quantity that has attracted a lot of attention is the number of blocks in the random partition. Depending on the area of applications this quantity could represent the number of species in a sample from a population of individuals or the number of cycles in a random permutation, etc. In the context of Bayesian non-parametric inference such a quantity is associated with the exchangeable random partition induced by sampling from certain prior models, for instance the Dirichlet process and the two parameter Poisson-Dirichlet process. In this paper we generalize some existing asymptotic results from this prior setting to the so-called posterior, or conditional, setting. Specifically, given an initial sample from a two parameter Poisson-Dirichlet process, we establish conditional fluctuation limits and conditional large deviation principles for the number of blocks generated by a large additional sample.


Introduction
Among various definitions of the Ewens-Pitman sampling model, a simple and intuitive one arises from Zabell [27] in terms of the following urn model.See also Feng and Hoppe [10].Let X be a complete and separable metric space and let ν be a nonatomic probability measure on X.Let α ∈ [0, 1) and consider an urn that initially contains a black ball with mass θ > 0. Balls are drawn from the urn successively with probabilities proportional to their masses.When a black ball is drawn, it is returned to the urn together with a black ball of mass α and a ball of a new color, which is sampled from ν, with mass (1 − α).When a non-black ball is drawn, it is returned to the urn with an additional ball of the same color with mass one.If (X i ) i≥1 denotes the sequence of non-black colors, then for any i ≥ 1, with X * 1 , . . ., X * j being the j distinct colors in (X 1 , . . ., X i ) with frequencies n = (n 1 , . . ., n j ).The predictive distribution (1.1) was first introduced in Pitman [21] for any α ∈ (0, 1) and θ > −α, and it is referred to as the Ewens-Pitman sampling model with parameter (α, θ, ν).In particular, Pitman [21] showed that the sequence (X i ) i≥1 generated by (1.1) is exchangeable and its de Finetti measure Π is the distribution of the two parameter Poisson-Dirichlet process Pα,θ in Perman et al. [20].Accordingly, we can write Pα,θ,ν ∼ Π, for any n ≥ 1. See Pitman and Yor [23] for details on Pα,θ,ν .For α = 0 the urn model generating the X i 's reduces to the one in Hoppe [15], and the Ewens-Pitman sampling model reduces to the celebrated sampling model by Ewens [5].Accordingly, for α = 0 the two parameter Poisson-Dirichlet process reduces to the Dirichlet process by Ferguson [11].The Ewens sampling model and its two parameter generalization play an important role in many research areas such as population genetics, machine learning, Bayesian nonparametrics, combinatorics and statistical physics.We refer to the monograph by Pitman [25] and references therein for a comprehensive account on these sampling models.
According to (1.1) and (1.2), a sample (X 1 , . . ., X n ) from Pα,θ,ν induces a random partition of the set {1, . . ., n} into K n blocks with corresponding frequencies N n = (N 1 , . . ., N Kn ).The exchangeable sequence (X i ) i≥1 , then, induces an exchangeable random partition of the set of natural numbers N. See Pitman [21] for details.Such a random partition has been the subject of a rich and active literature and, in particular, there have been several studies on the large n asymptotic behavior of K n .For any α ∈ (0, 1) and q > −1, let S α,qα be a positive and finite random variable with density function of the form g Sα,qα (y) = Γ(qα + 1) αΓ(q + 1) where f α denotes the density function of a positive α-stable random variable.S α,qα is referred to as polynomially tilted α-stable.For any α ∈ (0, 1) and θ > −α Pitman [22] showed that See Pitman [25] and references therein for various generalizations and refinements of the fluctuation limit (1.3).In contrast, for α = 0 and θ > 0, Korwar and Hollander [17] showed that See Arratia et al. [1] for details.Weak convergence versions of (1.4) and (1.3) can also be derived from general asymptotic results for urn model with weighted balls.The reader is referred to Proposition 16 in Flajolet et al. [12] and Theorem 5 in Janson [16] for details.Fluctuation limits (1.4)  both the clustering structure and the large n asymptotic behaviour of K n .In general, α ∈ (0, 1) controls the flatness of the distribution of K n : the bigger α the flatter is the distribution of K n .Feng and Hoppe [10] further investigated the large n asymptotic behaviour of K n and established a large deviation principle for n −1 K n .Specifically, for any α ∈ (0, 1) and θ > −α, they showed that n −1 K n satisfies a large deviation principle with speed n and rate function of the form 3) shows that K n fluctuates at the scale of n α .This is analogous to a central limit type theorem where the fluctuation occurs at the scale of

√
n. Then the large deviation scaling of n can be understood through a comparison with the classical Cramér theorem where the law of large numbers is at the scale of n.In contrast, for α = 0 and θ > 0, Equation (1.4) is analogous to a law of large numbers type limit.In particular, it was shown in Feng and Hoppe [10] that (log n) −1 K n satisfies a large deviation principle with speed log(n) and rate function of the form It is worth pointing out that rate function (1.5) depends only on the parameter α which displays the different roles of the two parameters at different scales.We refer to Feng and Hoppe [10] for an intuitive explanation in terms of an embedding scheme for the Ewens-Pitman sampling model.See also Tavaré [26] for a similar embedding scheme in the Ewens sampling model.
In this paper we present conditional counterparts of the aforementioned asymptotic results.The problem of studying conditional properties of exchangeable random partitions has been first considered in Lijoi et al. [19].This problem consists in evaluating, conditionally on an initial sample (X 1 , . . ., X n ) from Pα,θ,ν , the distribution of statis- tics of an additional sample (X n+1 , . . ., X n+m ), for any m ≥ 1. Lijoi et al. [19] mainly focused on statistics of the so-called new X n+i 's, namely X n+i 's that do not coincide with observations in (X 1 , . . ., X n ).Note that, according to (1.1), for any α ∈ (0, 1) and θ > −α these statistics depend on (X 1 , . . ., X n ) via the sole K n .For α = 0 and θ > 0 these statistics are independent of K n .A representative example is given by the number K (n) m of new blocks generated by (X n+1 , . . ., X n+m ), given K n .As discussed in Lijoi et al. [18], this statistic has direct applications in Bayesian nonparametric inference for species sampling problems arising from ecology, biology, genetics, linguistics, etc.In such a statistical context the distribution P[K takes on the interpretation of the posterior distribution of the number of new species in the additional sample, given an observed sample featuring j species.Hence, the expected value with respect to P[K See, e.g., Griffiths and Spanò [14], Favaro et al. [6], Favaro et al. [7] and Bacallado et al. [2] for other contributions at the interface between Bayesian nonparametrics and species sampling problems.
For any m ≥ 1, let (X 1 , . . ., X n , X n+1 , . . ., X m ) be a sample from Pα,θ,ν .Within the conditional framework of Lijoi et al. [19], we investigate the large m asymptotic behaviour of the number T (n) m of blocks generated by (X n+1 , . . ., X n+m ), conditionally on the initial part (X 1 , . . ., X n ).With a slight abuse of notation, throughout the paper we write X | Y to denote the random variable whose distribution corresponds to the conditional distribution of X given Y .The random variable T the number R (n) m of old blocks, namely blocks generated by the X n+i 's that coincide with observations in (X 1 , . . ., X n ).Hence, differently from K In other words, K n does not provide a sufficient statistic for T asymptotic behaviour as n and m tends to infinity, respectively.This turns out to be the case in terms of fluctuation limits.But in terms of large deviations K n and T (n) m | (K n , N n ) have the same asymptotic behaviour.In order to detect the impact on large deviations of the given initial sample one may have to consider different limiting mechanisms.In Bayesian nonparametric inference for species sampling problems, large m conditional asymptotic analysis are typically motivated by the need of approximating quantities of interest from the posterior distribution.See Favaro et al. [6] for a thorough discussion.With this regards, our fluctuation limit provides a useful tools since, as we will see, computational burden for an exact evaluation of posterior distribution P[T becomes overwhelming for large j, n and m.
In Section 2 we introduce the random variable T and we present some distributional results for a finite sample size m.In Section 3 we study the large m asymptotic behaviour of T terms of fluctuation limits and large deviation principles.In Section 4 we discuss our results with a view toward Bayesian nonparametric inference for species sampling problems.Some open problems are also discussed.
In addition, let R m observations.Hence, we introduce which is the number of blocks generated by the additional sample.Hereafter we investigate the conditional distribution of T (n) m given the random partition (K n , N n ).We start by deriving falling factorial moments of T The resulting moment formulae are expressed in terms of noncentral generalized factorial coefficients C and noncentral Stirling numbers of the first kind s.Furthermore, we denote by S the noncentral Stirling numbers of the second kind.See Charalambides [3] for an account on these numbers.
Proposition 2.1.Let (X 1 , . . ., X n ) be a sample from Pα,θ,ν featuring K n = j blocks with frequencies N n = n.Then i) for any α ∈ (0, 1) and θ > −α ii) for α = 0 and θ > 0 where we defined Proof.The random variables K See Proposition 1 and Corollary 1 in Lijoi et al. [19] for details.Then, by a direct application of the Vandermonde identity, we can factorize the falling factorial moment as follows EJP 19 (2014), paper 21. and (2.7) Let α ∈ (0, 1) and θ > −α.We first consider the falling factorial moment of the number of new blocks.For any r ≥ 0, Equation 25 in Lijoi et al. [19] and Proposition 1 in [6] lead to where the last identity is obtained by means of Equation 2.57 and Equation 2.60 in Charalambides [3].With regards to the falling factorial moment of the number of old blocks, for any r ≥ 0, Equation 25 in Lijoi et al. [19] and Theorem 1 in Baccalado et al.
[2] lead to The proof of the part i) is completed by combining expressions (2.8) and (2.9) with (2.6) and then by integrating with respect to the distribution (2.7).Specifically, we can write where in the second equality the sum over the index 0 ≤ s ≤ m is obtained by exploiting the fact that (n for any x ≥ 0 and 0 ≤ y ≤ x, for any a > 0, b > 0, c > 0 and for any real number d.For α = 0 and θ > 0 the result follows by taking the limit of (2.10) as α → 0. Specifically, in taking such a limit we make use of Equation 2.63 in Charalambides [3].The proof is completed.
(n) m | (K n , N n ).Indeed, by exploiting the relationship between probabilities and falling factorial moments in the case of discrete distributions, formulae (2.4) and (2.5) lead to the following expressions and and by means of well-known relationships between falling factorial moments and moments.

Asymptotics for the conditional number of blocks
We start our conditional asymptotic analysis by establishing a fluctuation limit, as m tends to infinity, for T (n,j) α,θ be the product of independent random variables S α,θ+n and B j+θ/α,n/α−j .Then • for any α ∈ (0, 1) and θ > −α a.s. (3.1) • for α = 0 and θ > 0 As for the unconditional fluctuation limits in (1.3) and (1.4), weak convergence versions of (3.1) and (3.2) can alternatively be derived from general asymptotic results for EJP 19 (2014), paper 21.urn models.See Proposition 16 in Flajolet et al. [12] and Theorem 5 in Janson [16] for details.For any α ∈ (0, 1) and θ > −α, if n = j = 0 then we recover (1.3) as special case of (3.1).Note that the dependence on n and j in the limiting random variable S (n,j) α,θ indicates a long lasting impact of the given initial sample (X 1 , . . ., X n ) to fluctuations.Furthermore, it is clear from Theorem 3.1 that one has lim m→+∞ m −1 T almost surely.Hereafter we establish a large deviation principle associated with this limiting procedure.
The study of large deviations for m −1 T m | K n satisfy the same large deviation principle.As in Feng and Hoppe [10], we establish a large deviation principle We focus on α ∈ (0, 1) and θ > 0. For α = 0 and θ > 0 the random variables K (n) m and K n are independent and, therefore, the large deviation principle for m −1 K (n) m coincides with the large deviation principle for n −1 K n recalled in the Introduction.We start with two lemmas on the moment generating Lemma 3.2.Let (X 1 , . . ., X n ) be a sample from Pα,θ,ν featuring K n = j blocks.Then, for any α ∈ (0, 1) and θ > −α Proof.The proof reduces to a straightforward application of Proposition 2.1.Indeed, the right-hand side of (3.3) can be expanded in terms of falling factorial moments of where the falling factorial moment E[(K ) with r = t at the index i = 0.Then, by Equation 2.60 and Equation 2.15 in Charalambides [3], we can where the last equality is obtained by means of Equation (2.57) in Charalambides [3].The proof is completed by combining (3.4) with (3.5) and by standard algebraic manipulations.
Lemma 3.3.For any α ∈ (0, 1) and θ = 0, lim sup Proof.Let (a n ) n≥1 be a sequence of increasing positive numbers satisfying a n /n → 1 as n → +∞.Then we can find two increasing sequences of positive integers, say (b n ) n≥1 and (c n ) n≥1 , such that b n ≤ a n ≤ c n and lim n→+∞ b n /n = lim n→+∞ c n /n = 1.Then, by combining Lemma 3.1 with Equation (3.5) in Feng and Hoppe [10], for any 0 < α and x < 1 one obtains Consider the moment generating function G K (n) m (x; 0, α).Direct calculations one obtains the identity where C 0 (n, j, α, v) is uniformly bounded in v from above and below by positive constants.Then, lim sup where the last equality is obtained by a direct application of (3.6).The proof is completed.
Proposition 2.1, Lemma 3.2 and Lemma 3.3 are exploited in order to derive the large We can state the following theorem.Proof.We only need to prove the large deviation principle for m For any λ > 0, we start by considering G K (n) m (x; α, 0) and then we move to the general case θ > −α.For any n ≥ 1 and 1 ≤ j ≤ n let H m (x; α, 0) = 1 + v≥1 x v v j−n n+vα+m−1 n+m−1 .
If n = j, then H m (x; α, 0) can be estimated as in (3.6).On the other hand, for n > j the (n − j)-th order derivative of H m (x; α, 0) with respect to x coincides with the following expression where g(n, j) = (n − j)!/(n − j) n−j .For x ∈ (0, 1) and ε ∈ (0, x), integrating (n − j) times over (0, x) lead to where we used the monotonicity of the function v≥0 x v n+vα+m−1 n+m−1 in the last inequality.Also, since lim m→+∞ 1 m log g(n, j) Accordingly, by means of Lemma 3.3, the proof is completed for the case of α ∈ (0, 1) and θ = 0. Now we consider the case θ = 0. where C θ (n, j, α, v) is uniformly bounded in v from above and below and it has a strict positive lower bound.Accordingly, by choosing an ε small and two positive constants c 1 and c 2 such that xe ε < 1 and such that c 1 e −εv ≤ C θ (n, j, α, v)v j+ θ α −n ≤ c 2 e εv , it follows that The proof in completed by letting m → +∞ and ε → 0 in (3.9).Indeed by taking these limits we obtain lim m→+∞ log Then, the large deviation principle for m −1 K (n) m | K n follows by Gärtner-Ellis theorem.See Dembo and Zeitouni [4] for details.
According to Theorem 3.4, K n and its conditional counterparts T have the same in terms of large deviations.However, in terms of fluctuation limits, Theorem 3.1 shows that the initial sample (X 1 , . . ., X n ) has a long lasting effect.This is caused by the two different scalings involved, namely m −1 for large deviations and m −α for the fluctuations.Since the given initial sample leads to an estimation on the parameters, one would expect that the large deviation results will be dramatically different if the sample size n is allowed to grow and leads to large parameters.This kind of behaviour is discussed in Feng [8] where the parameter θ and the sample size n grow together and the large deviation result will depend on the relative growth rate between n and θ.
Note that, if m depends on n and both approach infinity then one can expect very dif- We intend to pursue this study further in a subsequent project.Here, we conclude by providing an explicit expression for (3.10).As in Lemma (3.2), this expression follows by applying Proposition 2.1.Lemma 3.5.Let (X 1 , . . ., X n ) be a sample from Pα,θ,ν featuring K n = j blocks with frequencies N n = n.Then i) for any α ∈ (0, 1) and θ > −α ii) for α = 0 and θ > 0 where we defined C j,0 = ∅ and C Proof.We expand the right-hand side of (3.10) in terms of falling factorial moments of T (n) m | (T n , N n ) and we apply Proposition 2.1 in which an expression for these moments is given.Specifically, For any α ∈ (0, 1) and θ > −α the falling factorial moment E[(T The proof of i) is completed by combining (3.11) with (3.12) and by standard algebraic manipulations.Finally, for α = 0 and θ > 0 the result in ii) follows by exploiting similar arguments.

Discussion
Our results contribute to the study of conditional properties of exchangeable random partitions induced by the Ewens-Pitman sampling model.While focusing on the number Under the Gibbs-type sampling model with α ∈ (0, 1), we derived an explicit expression for the distribution of T (n) m | (K n , N) and a fluctuation limit as m tends to infinity.The corresponding unconditional results for K n are known from Gnedin and Pitman [13] and Pitman [25].Work on unconditional and conditional large deviation principles is ongoing.For any α ∈ (0, 1) our conjecture is that n −1 K n and m −1 T (n) m | (K n , N) satisfies a large deviation principle with speed n and m, respectively, and with the same rate function I α in (1.5).In other words, we conjectured that large deviation principles for n −1 K n and m −1 T (n) m | (K n , N) are invariant in the class of Gibbs-type sampling models with α ∈ (0, 1).

m
the number of new blocks generated by the L (n) m observations and by X * Kn+1 , . . ., X * Kn+K (n) m their identifying labels.

1
{Si>0} be the number of old blocks detected among the m − L (n) m observations in the additional sample.These blocks are termed "old" to be distinguished from the new blocks detected among the L (n) y + c; d, a + b) = y; d, a)C (x − j, c; d, b).
For any α ∈ (0, 1) and θ > −α, lim m→+∞ n −α R (n) m | (K n , N n ) = 0 almost surely.Hence, the fluctuation limit for T (n) m | (K n , N n ) reduces to the fluctuation limit for K (n) m | K n ; such a fluctuation limit was established in Proposition 2 in Favaro et al. [6].Similarly, for α = 0 and θ > 0 one has lim m→+∞ (log m) −1 R (n) m | (K n , N n ) = 0 almost surely and, furthermore, K (n) m is independent of K n .Hence, the fluctuation limit for K (n) m coincides with the fluctuation limit for K n in (1.4).For any a, b > 0 let B a,b a random variable distributed according to a Beta distribution with parameter (a, b).Then, we can state the following theorem.Theorem 3.1.Let S

1 =
By means of arguments similar to (3.7) we can write j + θ α v↑1 v! n+θ+vα−1 n+θ−C θ (n, j, α, v)v j+ θ α −n EJP 19 (2014), paper 21.Page 10/15 ejp.ejpecp.org ferent behaviours in terms of law of large numbers and fluctuations.The large deviation principle for m −1 T (n) m | (K n , N n ) may not be easily derived from that of m −1 K (n) m | K n by a direct comparison argument.Hence, it is helpful to study the moment generating of

m
of new blocks generated by the additional sample, Lijoi et al. [19] left open the problem of studying the total number T (n) m of blocks generated by the additional sample.In this paper we presented a comprehensive analysis of distributional properties of T (n) m | (K n , N) for a finite sample size m and for large m.Hereafter we briefly discuss our results with a view toward Bayesian nonparametric inference for species sampling problems.As pointed out in the Introduction, the distribution of T (n) m | (K n , N) takes on the interpretation of the posterior distribution of the number of species generated by the additional sample, given an initial observed sample featuring K n species with frequencies EJP 19 (2014), paper 21.
i } (X n+l ), for any i = 1, . .., K n , are the frequencies of the blocks detected among the m − L (n) m observations in the additional sample.Specifically, (2.3) provides the updating for the EJP 19 (2014), paper 21.random