Notes on the occupancy problem with infinitely many boxes: general asymptotics and power laws

This paper collects facts about the number of occupied boxes in the classical balls-in-boxes occupancy scheme with infinitely many positive frequencies: equivalently, about the number of species represented in samples from populations with infinitely many species. We present moments of this random variable, discuss asymptotic relations among them and with related random variables, and draw connections with regular variation, which appears in various manifestations.


Introduction
We consider the classical multinomial occupancy scheme in which balls are thrown independently at a fixed infinite series of boxes, with probability p j of hitting the jth box. The frequencies (p j , j = 1, 2, . . .) are assumed nonincreasing, strictly positive and satisfying j p j = 1. As n balls are thrown, their allocation is captured by the array X n = (X n,j , j = 1, 2, . . .), where X n,j is the number of balls out of the first n that fall in box j.
In concrete applications, instead of boxes one has types or species of sampling units, and the sample array of types, X n , is of interest for what it reveals about the population frequencies (p j ). Such species sampling problems arise in ecology, to be sure, but also in database query optimization, where the sampling units may be entries in columns of a database while the species consist of all of distinct values appearing in the column [10]; in literature, where the sampling units may be words appearing in a given author's known works while the species consist of all words known to that author [14]; in disclosure risk limitation, where the sampling units may be people or firms listed in a microdata file, without names or other overtly identifying information, while the types are unique combinations of values of variables with which the people or firms might be implicitly identified [34]; and in many other areas [9]. Models positing infinitely many boxes or species may approximate sampling from large, finite populations, or they may be useful as models of 'superpopulations' from which both samples and background populations are notionally drawn.
A functional of X n which appears in many contexts is the number of nonempty boxes K n = #{j : X n,j > 0}.
K n is sometimes regarded as a measure of diversity of the sample. More detailed information is carried by the counts of boxes occupied by exactly r balls K n,r = #{j : X n,j = r} (r = 1, 2, . . .), so that K n = r K n,r and r r K n,r = n. The combinatorial object encoded into the array of counts (K n,1 , . . . , K n,n ) is a random partition of integer n; this partition has K n parts which correspond to positive entries of X n . The variables K n and K n,r 's have been intensively studied in the occupancy scheme with finitely many positive frequencies, which in some models may vary in a certain way with n. The most studied is, of course, the classical case of m equal frequencies (e.g. in the familiar 'birthday paradox' one is interested in the probability of the event K n,1 < n). Kolchin, Sevast'yanov and Chistyakov [29] identified five distinct asymptotic regimes for m = m(n) → ∞ to secure either a Poisson or a normal limit distribution for K n (more precisely, they discussed the number of empty boxes m − K n ); for example, when m = n, both the mean and the variance of K n grow approximately linearly with n, and the limit distribution is normal. See [27,26,29] for extensions, surveys and many references.
In contrast to that, the literature on the problem with infinitely many fixed positive frequencies is rare. For fixed frequencies (p j ) the asymptotic growth of each X nj (as n → ∞) is linear, and the growth of K n is always sublinear since the main contribution to K n occurs due to the boxes whose frequencies are arbitrarily small. More delicate asymptotics of the moments of K n and K n,r are determined by the way p j approach 0. The first systematic study of the asymptotics appeared in a remarkable paper by Karlin [28], in which he proved a central limit theorem under a condition of regular variation on the frequencies (p j ).
Recently, two new sources of interest to the infinite model emerged. On the one hand, it has been observed that 'power laws' for K n are quite common in partition-valued processes of coagulation and fragmentation (see [32,22,24,5]); these fall in the range of regular variation with positive index. On the other hand, the case of geometric sequence (p j ) (which is an instance of slow variation) was intensively studied in connection with the analysis of algorithms [2,30].
These notes present a kind of a survey which extends and updates Karlin's results on the infinite occupancy scheme. The results in Sections 2-6 and 10 are of general nature, while in Sections 7-9 we work under the assumption of regular variation. In particular, we record various guises of regular variation, and mention some recent developments. Some of the results are new and others are scattered in the literature and have been several times re-discovered (especially the results in Section 10).
Notation. Throughout c, c j denote positive constants whose values are not important and may change from line to line. We use f ∼ g, f ≪ g, f ≫ g and f ≍ g for f /g → 1, f /g → 0, f /g → ∞ and c 1 < f /g < c 2 , respectively. When one either of f and g is a random quantity, the notation f ∼ a.s g, f ≪ a.s g etc. means that the asymptotic relation holds with probability one. Convergence in distribution is denoted → d .

Moments and poissonization
We start by recalling the familiar fact that X n := (X n,1 , X n,2 , . . .) has a multinomial distribution with parameters (n, (p j )): From this the distribution of partition (K n,1 , . . . , K n,n ) is recovered by summation. Specifically, for (k 1 , . . . , k n ) a fixed partition of n P((K n,1 , . . . , K n,n ) = (k 1 , . . . , k n )) = n! n 1 ! · · · n k ! distinct where n 1 , . . . , n k is a sequence of length k = i k i , with k i terms equal to i for i = 1, . . . , n. The infinite sum is called the monomial symmetric function in the variables p j . Formulas for the distribution of K n and marginal distributions of K n,r 's follow by further summation over partitions of n. In terms of the generating function, the probability of K n = m is equal to the coefficient at x n y m /n! in the series expansion of the infinite product j (1 + y(e pj x − 1)) .
Formulas for the moments follow from the representation (where 1(· · · ) equals 1 if · · · is true and equals 0 otherwise). Denoting we easily see that These are related by the formulas where ∆ r is the rth iterate of the difference operator ∆Φ n = Φ n −Φ n−1 . Lengthy but straightforward computations yield formulas for the variance Formulas for higher moments and covariance can be derived in the same way, but they seem to be of little practical use in the explicit form. See [29, Section 3.1] for such moments computations, which are also valid in the case of infinitely many positive frequencies.
One major obstacle in the study of counts K n and K n,r 's is that the indicators in (1) are not independent. A common recipe to circumvent this difficulty is to first consider a closely related type of model in which the balls are thrown in continuous time at epochs of a unit rate Poisson process (P (t), t ≥ 0), which is independent of (X n , n = 1, 2, . . .). The advantage of this randomization is that one can exploit independence, since the balls fall then in the boxes according to independent Poisson processes (X j (t), t ≥ 0), at rate p j for box j. Once the properties of the Poisson allocation scheme are acquired, one still needs to translate them in the fixed-n results, by using a kind of depoissonization technique.
Remark. Poissonization liberates us from the constraints of the fixed-n scheme. The Poisson model is well defined for arbitrary positive rates p j which need not satisfy j p j < ∞ or even p j < 1. If j p j < ∞, a reduction to the normalized case j p j = 1 is maintained by the obvious time-change t → t/ j p j . If j p j = ∞, K(t) is infinite with probability one, though K r (t)'s can be still finite. One can also consider two-sided infinite sequences like geometric (p j = q j , j ∈ Z) (with 0 < q < 1), for which the K r (t)'s and the sum of variances The convention in this paper is that the quantities associated with the Poisson allocation scheme appear in the functional notation, while the lower-index notation is reserved for the fixed-n scheme. For instance, X j (t) = X P (t),j , and these are related via where Φ (r) (t) is the rth derivative. Formulas for the variance are simpler than (3) and (4) because the cross-terms disappear due to independence: Similarly, for integer r = s, using K r (t) + K s (t) = j 1(X j (t) ∈ {r, s}) the covariance is computed as For analytical reasons which will be soon clear it is convenient to encode the frequencies (p j ) into an infinite counting measure holds for arbitrary f ≥ 0. In particular, Φ r (t) = t r r! 1 0 x r e −tx ν(dx) (r = 1, 2, . . .).
Remarks. The formulas for expected values remain exactly the same when the frequencies (p j ) are random, in which case the 'intensity measure' ν is defined by taking expectation in the left-hand side of (8). See the recent work on composition structures [22,23,4] for more in this direction. The summability constraint j p j = 1 translates as 1 0 x ν(dx) = 1, and implies that ν can be also interpreted as a Lévy measure of some subordinator, which jumps by p j at rate 1 for each j. In this interpretation (11) has the meaning of a Laplace exponent [20].
Both Φ(t) and Φ n (considered as functions of a real or complex-valued argument) are Bernstein functions which uniquely determine ν, see [20]. They are related by the poissonization identity

Some estimates of the moments
Monotonicity is a key feature of K n . In fact, we have K n ↑ a.s. ∞ (as n → ∞) and K(t) ↑ a.s. ∞ (as t → ∞), because each box is eventually discovered by a ball. By monotone convergence, also Φ n ↑ ∞ and Φ(t) ↑ ∞. However, the growth is always sublinear: Φ n ≪ n (n → ∞) and Φ(t) ≪ t (t → ∞). Indeed, if we ignore the first J boxes, the mean number of discovered boxes among the remaining ones is at most n j>J p j (respectively, at most t j>J p j in the Poisson scheme). Thus Φ n < J +n j>J p j (respectively, Φ(t) < J +t j>J p j ), and selecting J arbitrarily large we see that lim sup Φ(n)/n = lim sup Φ(t)/t = 0. Aside from these two general features, the growth properties of K n can be fairly arbitrary.
The next lemma gives general estimates of closeness of the moments in the fixed-n scheme and the Poisson scheme. Lemma 1. For n → ∞ the following estimates hold: Proof. The first two bounds follow from the elementary inequality 0 The last two bounds follow from this and estimates of the cross-terms in (3) and (4), by using the expansion Taken together with Φ(t) ↑ ∞ the lemma implies Φ n ∼ Φ(n).
Remark. It seems plausible that the relations V (n) ∼ V n and Φ r (n) ∼ Φ n,r are true for arbitrary (p j ). However, the estimates in the lemma are not strong enough to entail such a conclusion in full generality, although it has been shown under various circumstances (see e.g. Lemma 4 below and Section 6) and no counterexamples are known. Hwang and Janson [25,Proposition 4.3(ii)] show that always V (n) ≍ V n . The difficulty is that V (t) and Φ r (t) may exhibit rather irregular oscillatory behaviour. For instance V (t), V n , Φ r (t) and Φ n,r may approach 0 arbitrarily closely for some n or t, (though they cannot converge to 0, as is seen by selecting a subsequence n j ≍ 1/p j or t j ≍ 1/p j , respectively).
We denote the right tail of ν. Note that there are at most m frequencies not smaller than As x ↓ 0 the integral at left increases to 1, entailing by monotonicity that the integral at right converges, and by extension that x ν(x) converges to a limit also. Since lim x ν(x) > 0 would force the integral at right to diverge, x ν(x) → 0.

Laws of large numbers
The mean number of occupied boxes satisfies Φ(t+τ )−Φ(t) < Φ(τ ) (for τ, t > 0). One way to justify this is by noting that the mean number of distinct boxes hit during any time interval [τ, τ + t] is Φ(t), but some of them have been discovered before time t and do not contribute to K(t+τ ). The same follows from concavity of Φ(t), which implies Similar inequalities hold for Φ n . Using these the variance can be bounded via expectation as Applying Chebyshev's inequality and recalling that Φ n ↑ ∞ and Φ(t) ↑ ∞, the bound on the variance allows one to conclude that both K(t)/Φ(t) and K n /Φ n converge to 1 in probability, which is a result due to Bahadur [3]. A similar analysis invoking (6), (4) and Lemma 1 shows that also K r (t)/Φ r (t) and K n,r /Φ n,r converge to 1 in probability, provided Φ r (t) → ∞. For K n and K(t) there is the following strengthening due to Karlin [28,Theorem 8].
Proof. The function Φ(t) is continuous, increasing and satisfies Φ ′ (t) < 1. Thus it is possible to select an increasing sequence (t m , m = 1, 2, . . .) such that m 2 < Φ(t m ) < m 2 + 1. We have then from the above estimate of the variance P( and by summability of the bound . This allows one to squeeze the ratio as where both sides converge to 1 almost surely, in consequence of the above and Φ(t m )/Φ(t m+1 ) → 1. The argument for K n is completely analogous.
Instead of using the Chebyshev inequality one can exploit a finer Bernsteintype large deviation bound for sums of independent bounded variables [16, p. 911]: where ǫ ′ depends on ǫ. This allows one to choose a subsequence {t m } with smaller gaps, so that Using monotonicity of r≥s K n,r , we obtain along the same lines for every fixed integer s r≥s K n,r ∼ a.s.
Remark. The relations K n,r ∼ a.s. Φ n,r , K r (t)∼ a.s. Φ r (t) may fail, simply because Φ r (t) need not go to ∞, while the counting processes have unit jumps. It is natural to conjecture that these laws of large numbers are true under the condition Φ r (t) → ∞, but we do not know if this has been proved in full generality.

CLT for K n
Recall the representation (5) of K(t) as a sum of independent indicators. Applying the Lindeberg-Feller condition, we see that (K(t) − Φ(t))/V (t) 1/2 converges to the standard normal distribution provided V (t) → ∞. The following depoissonization argument leading to the CLT for K n follows the line in Dutko [12]. Monotonicity of K n plays here a central role.
and if they hold, the law of (K n − a n )/b 1/2 n converges to the standard normal distribution, where Φ n or Φ(n) can be selected for the constant a n and V (n) or V n for b n .
The next lemma will imply that both choices for b n are good.
for every ǫ > 0. Thus by Cauchy-Schwarz (applied to the measure xν(dx)) and letting ǫ → 0 the first integral factor vanishes, which yields φ(t) 2 ≪ φ(2t). Using that φ is decreasing, as wanted. Now, if V (t) → ∞ then the statement of the lemma follows from the above and the third estimate in Lemma 1. (3) is negative, and by the first estimate in Lemma 1 also Φ(2t) − Φ(t) → ∞.
The rest of the argument for K n is as follows. From φ(t) 2 ≪ φ(2t) in the proof of Lemma 4 we get This implies provided V (t) → ∞. Indeed, which tends to 1 by (13) and (13) and Lemma 1, In the very same way, and also using (14) Therefore |K n±cn 1/2 − K n |/V (n) 1/2 converge to 0 in probability. Choosing c sufficiently large we have for the Poisson process n − cn 1/2 < P (n) < n + cn 1/2 and therefore K n−cn 1/2 < K(n) < K n+cn 1/2 with probability larger 1 − ǫ. If follows that (K n − K(n))/V (n) 1/2 converge to 0 in probability. The CLT for K n now follows from this and the CLT for K(n).
Remark. Hwang and Janson [25] have shown a more delicate local CLT for K n under V (t) → ∞. If Φ r (t) and Var [K r (t)] tend to ∞, then a CLT holds for K r (t), and one can naturally suspect that the same is valid for K n,r (this seems to have not been discussed in the literature). Mikhailov [31] proves a CLT for K n,r but assuming that (p j ) vary with n is a suitable way.

The variance
If V (t) does not go to ∞ then K(t) need not converge in distribution at all. Thus it is important to have criteria for V (t) → ∞ and to understand other possible modes of bahaviour of the variance. For various (p j ) the variance V (t) and V n can go to ∞, converge to a finite limit, oscillate within a bounded range, or even oscillate between 0 to ∞. In this section we sketch some recent results from [8]. The next lemma relates the variance with the mean number of singleton boxes.
Proposition 6. Each of the following conditions implies V (t) → ∞: Proof. Sufficiency of conditions (i) and (ii) is shown by rewriting (6) in the form For (iii) one exploits Lemma 5.
We turn next to conditions for bounded variance or converging to a finite limit. The case of geometric frequencies gives a clue.
then lim sup V (t) ≤ k, and this asymptotic bound is the best possible for frequencies satisfying (16) with given k.
Many examples of irregular behaviour of V (t) or Φ r (t)'s can be constructed using the following simple idea. Consider first a series of finitely many, say m, boxes with the same frequency q. In the Poisson scheme, the variance of the number of occupied boxes among these m is m(e −tq − e −2tq ) which is a unimodal function with the initial value 0, the maximum value m/4 assumed at q −1 log 2, and exponential decay for larger t. Similar properties has the mean number of boxes occupied by r balls, which is mtqe −tq . Note that m accounts for the maximum value, while varying q amounts to just rescaling the time. Now, selecting q 1 > q 2 > . . . and taking m i frequencies equal q i (i = 1, 2, . . .) we obtain a superposition of functions of the above type, thus creating oscillations with fairly arbitrary highs and lows. In the following examples we focus on the Poisson scheme, hence can ignore the normalization and only require j p j < ∞. Example 10. Choosing q i = q i with some 0 < q < 1/2 and m i = i we obtain a collection of frequencies (p j ) for which V (t) → ∞ (t → ∞) but ∇ν(x) oscillates between 0 and ∞. The example shows that condition (i) of Proposition 6 is not necessary for V (t) → ∞.
Example 11. [28, p. 384] Choosing q i = 2 −2 i and m i = i we obtain (p j ) for Thus V (t) oscillates between 0 and ∞, and the same applies to Φ 1 (t).
Example 12. [8] Choosing q i = 2 −2 i+1 and m i = 2 2 i we obtain (p j ) for which V (t) → ∞ and Φ 1 (t) → ∞, while lim inf Φ 2 (t) = 0 and lim sup Φ 2 (t) = ∞. Thus we have here a curious pathology, when the mean number of singleton boxes goes to ∞, but the mean number of doubletons does not.
Remark. The last example disproves the assertion of [28, Lemma 1] that if the convergence radius of the series j p j u j exceeds 1, then ν(x) is slowly varying for x → 0. Recall that slow variation means lim x→0 ν(cx)/ ν(x) = 1 for every c > 0. In the last example the convergence radius of j p j u j equals 2, because (p j ) 1/j oscillates between 1/4 and 1/2, as is readily checked. The multiplicity of each frequency q i = 2 −2 i is equal to the number of terms larger than this value (k = 1, 2, . . .), which together with c2 −2 k < 2 −2 k−1 (for fixed c > 1 and k = k(c) large enough) implies that hence slow variation fails.

Regularly varying frequencies
We shall establish equivalent forms of regular variation in the occupancy problem, in particular we translate them in terms of the growth of mean values of K n and K n,r 's. Following [28] we say that the frequencies (p j ) are regularly varying if for some 0 ≤ α ≤ 1 and a function ℓ slowly varying at ∞, i.e. satisfying ℓ(cy)/ℓ(y) → 1 as y → ∞, for every c > 0. The case α = 0 corresponds to slow variation, while in the case α = 1 we shall speak of rapid variation. In the case α = 1 the summability of frequencies forces the function ℓ(1/x) to approach 0 (as x ↓ 0) sufficiently fast. Define for r = 1, 2, . . . the measures The measure ν 1 is the distribution of the frequency of the first discovered box, also called the structural distribution or the law of the tagged fragment, especially when the frequencies are random [6,33].
To show the converse implication we use evaluate the integral by Karamata's theorem and note that the constant term c is dominated. Similarly, applying Karamata's theorem to the integral term in we conclude (19).
The cases of slow and rapid variation need special treatment.

Proposition 14.
For ℓ slowly varying the relation with ℓ 1 ≫ ℓ another function of slow variation defined for y > 1 by For r > 1 the relation (22) also implies In general, the relation (23) with some slowly varying ℓ 1 only implies and does not imply the regular variation of ν(x); however if the regular variation holds then ν(x) fulfills (22) with ℓ satisfying (24).
Proposition 15. For ℓ 0 slowly varying the relation with another slowly varying ℓ ≫ ℓ 0 defined for y > 1 by and also implies for r > 1 In general, the relation (28) with some ℓ only implies and does not imply the regular variation of ν 1 [0, x]; but if the regular variation holds, then (27) is satisfied with slowly varying ℓ 0 related to ℓ via (29).
Proof. The line of argument repeats the one in the previous proposition. Formula (30) is obtained by evaluating the terms in The last proposition shows that (28) (i.e. (17) with α = 0) is not strong enough to control ν r 's, but a slightly stronger assumption (27) is enough for that. The case of geometric frequencies demonstrates that (28) is indeed too weak.
We translate the above in terms of the mean values.
Proposition 17. For 0 < α < 1, condition (17) is equivalent to each of the following two relations and and for r ≥ 1 it implies Proof. Writing we see that the equivalence of (31) and (17) (19).
In the case of rapid variation we have: with ℓ 1 as in (24), and for r > 1 it implies Proof. Use Tauberian arguments and Proposition 14.
In the case of slow variation we have: where ℓ and ℓ 0 are related as in (29). Also, (27) is equivalent to the r = 1 instance of (36). The relation (28) Proof. Use Tauberian arguments and Proposition 15.
We summarize the above relations for Φ(t) and Φ r (t)'s (the relations for Φ n and Φ n,r 's are completely analogous): Corollary 20. In the case 0 < α < 1, for r ≥ 1 all Φ r (t) are of the same order of growth as Φ(t), and the ratios Φ r (t)/Φ(t) converge to (−1) r α r (these numbers comprise a probability distribution).
In the case of rapid variation Φ r (t)'s are of the same order for r > 1 but Φ(t) ∼ Φ 1 (t) ≫ Φ r (t) for r > 1, that is most of the occupied boxes are singleton.
In the case of slow variation under condition (27) we have r −1 Φ r (t) ∼ Φ 1 (t) ≪ Φ(t), meaning that all Φ r (t)'s are again of the same order but each of them is much smaller than Φ(t).
Remark. The results about mean values are true also when (p j ) are random, with ν understood as the intensity measure of the point process j δ pj , and ν 1 being the structural distribution. For instance, when (p j ) are Poisson-Dirichlet(θ) frequencies we have ν 1 (dx) = θ(1 − x) θ−1 , so we are in the case of slow variation (27) with ν 1 [0, x] ∼ θx, hence Φ r (t)/Φ 1 (t) → 1/r meaning that in the long run the mean number of balls in all rton boxes is approximately the same, for each r. The last fact has been long known, especially for θ = 1 in the context of random permutations which can be associated with a random sample from Poisson-Dirichlet(1), see [1].
We conclude as in [28]: Corollary 21. For 0 < α < 1 the condition of regular variation (17) is equivalent to any of the following conditions and it implies for r = 1, 2, . . .
Proof. Combine Proposition 2, the remarks after it, and Proposition 17.
We stress that in this situation the strong laws for K r (t)'s follow from the strong laws for increasing processes s>r K s (t) and the fact that Remark. Characterizations of regular variation through behaviour of ratios of integrals with distinct kernels are known as Mercerian Tauberian theorems [7]. For instance, Φ 1 (t)/Φ(t) → α for some constant 0 < α < 1 implies (17).
converges in distribution to a multivariate Gaussian array with zero mean and this covariance matrix. A similar result [28, theorem 5 ′ ] is valid also in the rapid variation case α = 1, but K n,1 requires a scaling different from the scaling of other K n,r 's with r > 1, since Var [K n,1 ] ∼ nℓ * (n) is of larger order than Var [K n,r ] ∼ c r nℓ(n) for r > 1.

Two uses of inversion
We recall further facts about regular variation. For function h : R + → R + regularly varying at infinity with index α > 0 there exists an asymptotic inverse function g which is regularly varying with index 1/α and satisfies h(g(y)) ∼ g(h(y)) ∼ y (y → ∞). The function g is unique up to the asymptotic equivalence, see [7, Proposition 1.5.12].

Lemma 22.
Let h and g be asymptotic inverses of one another, then for α > 0 and ℓ slowly varying Let ℓ * be the reciprocal of the function appearing in the inversion formula (39): Keep in mind that ℓ * depends on α. This function allows one to formulate the property of regular variation directly in terms of individual frequencies.
In view of the function h is the generalized left-continuous inverse of g, and because y ≤ h(g(y)) < y + 1, these functions are also asymptotic inverses of one another. By Lemma 22, the relation h(y) ∼ y α ℓ(y) is equivalent to g(y) ∼ y 1/α {ℓ 1/α (y 1/α )} # , which is the same as (40).
For fixed-n allocation scheme define N k := min{n : K n = k} to be the times when new boxes are discovered. The analogous poissonized quantity is T k := min{t : K(t) = k}. By the definition Next proposition gives the relations inverse to that.
Moreover, the following asymptotic relations hold: Proof. Immediate from the uniqueness of the asymptotic inverse and de Bruijn conjugate, and Lemma 39.

The cumulative frequency of empty boxes
Functionals K n , K n,r are of the form j ψ j (X n,j ), called 'separable statistics' by some authors [26,17]. Here, ψ j 's are some functions for which the sum is well defined. In this section we discuss one more instance of this kind, under the regular variation assumption (17) with 0 < α < 1. These have the meaning of the cumulative frequency of yet undiscovered boxes. Defining ( p k ) to be a random arrangement of frequencies in the order as the boxes are discovered, we can also write The sequence ( p k ) is called a size-biased permutation of the frequencies (p j ), see [33]. By (10) and (12), These mean values control the geometric number of balls (respectively, the exponential time) needed to discover yet another box after some time. Clearly, Proof. From S(t) = j 1(X j (t) = 0) p j using the same asymptotic evaluations as for (33) we compute the variance as Thus we have with some positive constants c 1 , c 2 . Using Chebyshev's inequality and the Borel-Cantelli lemma it is not hard to show that the convergence S(t m )/E [S(t m )] → a.s. 1 is secured along a sequence t m ∼ m 2/α . But then S(t)/E [S(t)] → a.s. 1 follows for t → ∞ by the usual sandwich argument, since S(t) and E [S(t)] are nonincreasing and t m /t m+1 → 1 as m → ∞. The rest of (45) follows by checking that with probability one S(n(1 + ǫ)) < S n < S(n(1 − ǫ)) holds for all sufficiently large n.
To pass from (45) to (46) we first note that for n k as in (42) as is easily seen from which in turn is the second identity in (38). By Proposition 24 the bounds n k(1−ǫ) < N k < n k(1+ǫ) hold for sufficiently large k with probability one. Finally, the first identity in (44) and monotonicity imply that for large k S n k(1+ǫ) < S N k = R k < S n k(1−ǫ) , and now (46) follows from (47) by letting ǫ → 0.
Remark. Comparing (46) with (41) we see that which supports the intuition that the ranked arrangement of frequencies (p j ) decays faster than the size-biased permutation ( p j ). However, despite the fact that the tail sums of the sequences (p j ) and ( p j ) decay with the same order, the analogue of (40) for ( p j ) does not hold. Indeed, by regular variation we have p i /p i+1 → 1 (i → ∞). Pick ǫ small, and let p i , p i+1 be the two largest frequencies smaller than ǫ, and such that p i /p i+1 is close to 1. In the Poisson setup, let a be the time needed to discover one of the two boxes with these frequencies p i , p i+1 , and b be the time to discover both of them. These can be expressed through independent exponential variables, showing that the ratio a/b assumes values smaller than, say, 1/2 with a probability bounded from zero. But this and the asymptotics of K(t) readily imply that, with probability bounded away from 0, the positions of p i and p i+1 in the re-arranged sequence ( p j ) are not asymptotic to one another, which could not happen if ( p j ) were asymptotic to a regularly varying sequence.

Pure power laws
As has been mentioned in the Introduction, processes of coagulation and fragmentation of random masses [6,33] are related to occupancy schemes where the frequencies (p j ) are random. In many situations [22,24,32,5] one encounters a relation K n ∼ a.s. D n α (n → ∞) for the number of blocks of partition induced by sampling from (p j ), where 0 < α < 1 and D is a strictly positive random variable (in this context, a measure of 'α-diversity' of the partition [33]). The asymptotics (48) say that, given D, K n is regularly varying with constant slow variation factor. Conditioning on the frequencies and applying Proposition 2 we always have hence, by Proposition 17, (48) is equivalent to and by Proposition 23 it is also equivalent to Furthermore, any of these implies for r = 1, 2, . . .
and for size-biased frequencies The relations (48),(49),(50),(51),(52) appear in the literature under the folk name 'power laws'. Typically the starting point is (49), from which one arrives to the conclusion that K n /n α and K n,r /n α converge almost surely to multiples of the same random variable. But other direction is also useful: from the asymptotics of K n , established by either the analysis of moments or some other method, one can make conclusions on the behaviour of small frequencies.
Remark. Interestingly, the distribution of K n does not converge to normal, although this is true for K n conditioned on (p j ). An explanation for this phenomenon is that the randomness of (p j ) dominates the variability due to random sampling. The CLT for K n does hold for sampling from Poisson-Dirichlet(θ) (in which case K n ∼ a.s. θ log n), and has been also shown for some instances of random (p j ) under more general assumptions of slow variation [4,19,23].

Strong laws for large parts
We collect here explicit distributional formulas and a few ways to describe the multivariate asymptotics of the 'large parts'.
For X ↓ n := (X ↓ n,j , j = 1, 2, . . .) the sequence of X n,j 's arranged in nonincreasing order we have P(X ↓ n,j = n j , j = 1, . . . , k) = n! k j=1 n j ! distinct j 1 ,...,j k p n1 j1 · · · p n k j k (n 1 ≥ . . . ≥ n k > 0, n = k j=1 n j ), where the sum expands over all k-tuples of distinct positive integers j 1 , . . . , j k . Let X n := ( X n,j , j = 1, 2, . . .) be the sequence which starts with positive terms of X ↓ n arranged in the order as the boxes are discovered by the balls, and ends with infinitely many zeroes. Similarly to the above, P( X ↓ n,j = n j , j = 1, . . . , k) = n! k j=1 (n j − 1)!(n j + n j+1 + . . . + n k ) distinct j 1 ,...,j k p n1 j1 · · · p n k j k , where n 1 > 0, . . . , n k > 0, n = n 1 + . . . + n k . Let ( p j , j = 1, 2, . . .) be the size-biased permutation of (p j ). In particular, X n,1 is the number of balls out of the first n which fall in the same box as the first ball, and p 1 is the frequency of this box. The latter distribution conditionally given ( p j ) is P( X ↓ n,j = n j , j = 1, . . . , k | ( p j )) = n! k j=1 (n j − 1)!(n j + n j+1 + . . . + n k ) To proceed to the asymptotics, recall that the succession of hits in box j undergoes a Bernoulli process with success probability p j , hence by the classical law of large numbers n −1 X n,j → a.s. p j as n → ∞. The following multivariate extensions of this result involve various arrangements of the boxes, and feature the behaviour of 'large' parts which grow linearly with n. These results are of fundamental importance in Kingman's theory of exchangeable partitions [6,33]. Proposition 26. As n → ∞ we have n −1 X n → a.s. (p 1 , p 2 , . . .), n −1 X ↓ n → a.s. (p 1 , p 2 , . . .), n −1 X n → a.s. ( p 1 , p 2 , . . .), where the convergence is understood in the product topology.
Proof. The first relation amounts to the marginal convergence. The second relation follows from j n −1 X n,j = 1 and the fact that ranking is a continuous mapping on the infinite-dimensional simplex {(x 1 , x 2 , . . .) : x j ≥ 0, j x j = 1}. The third relation is a known consequence of the second [33].
Many questions on random allocations also involve the ordering of the boxes. One can be interested in the last occupied box [13] or the first empty [11] etc. This theme lies outside the scope of this paper, and we only mention one generalization of (53) that appears in the theory of exchangeable ordered partitions [18,21].
Let ⊳ be a strict total order on positive integers, thought of as some 'arrangement' of boxes. With each box j we associate an open interval ]a j , b j [ of length p j , with endpoints a j = {i: i⊳j} p i and b j = a j + p j . Let C = [0, 1] \ (∪ j ]a j , b j [) be the complementary closed set. For each n define a random finite set C n comprised of 0 and of all distinct elements of the sequence ( {i: i⊳j} n −1 X n,i , j = 1, 2, . . .). Proof. This follows from (54) and the easily established convergence {i: i⊳j} n −1 X n,i → a.s. a j .

Proposition 27.
For instance, define the order ⊳ by choosing a sequence of distinct reals (y j , j = 1, 2, . . .) and setting i ⊳ j iff y i < y j . Consider the discrete distribution µ(dy) = j p j δ yj (dy) which places mass p j at point y j . For an independent sample of n elements from µ, the finite set C n encodes the nonzero counts of repeated values in the sample, in the natural (increasing) order of the values. The set C can be identified with the quantile transform of µ [18]. For example, ⊳ is the standard order on integers for y j = j (j = 1, 2, . . .), in which case C = {0, p 1 , p 1 + p 2 , p 1 + p 2 + p 3 , . . . , 1} and Proposition 27 amounts to (54). A more sophisticated example appears when {y j } is the set of all rational numbers, in which case C is a Cantor set.