Small counts in the infinite occupancy scheme

The paper is concerned with the classical occupancy scheme with infinitely many boxes, in which $n$ balls are thrown independently into boxes $1,2,...$, with probability $p_j$ of hitting the box $j$, where $p_1\geq p_2\geq...>0$ and $\sum_{j=1}^\infty p_j=1$. We establish joint normal approximation as $n\to\infty$ for the numbers of boxes containing $r_1,r_2,...,r_m$ balls, standardized in the natural way, assuming only that the variances of these counts all tend to infinity. The proof of this approximation is based on a de-Poissonization lemma. We then review sufficient conditions for the variances to tend to infinity. Typically, the normal approximation does not mean convergence. We show that the convergence of the full vector of $r$-counts only holds under a condition of regular variation, thus giving a complete characterization of possible limit correlation structures.


Introduction
In the classical occupancy scheme with infinitely many boxes, balls are thrown independently into boxes 1, 2, . .., with probability p j of hitting the box j, where p 1 ≥ p 2 ≥ . . .> 0 and ∞ j=1 p j = 1.The most studied quantity is the number of boxes K n occupied by at least one out of the first n balls thrown.It is known that for large n the law of K n is asymptotically normal, provided that Var[K n ] → ∞; see [6,7] for references and a survey of this and related results.In this paper, we investigate the behaviour of the quantities X n,r , the numbers of boxes hit by exactly r out of the n balls, r ≥ 1.
Under a condition of regular variation, a multivariate CLT for the X n,r 's was proved by Karlin [8].Mikhailov [12] also studied the X n,r 's, but in a situation where the p j 's vary with n.In this paper, we establish joint normal approximation as n → ∞ for the variables X n,r1 , . . ., X n,rm , centred and normalized, assuming only that lim n→∞ Var X n,ri = ∞ for each i.We also give examples to show that this condition is not enough to ensure convergence, since the correlation matrices need not converge as n → ∞.The asymptotic behaviour of the moments of the X n,r is thus of key importance, and we discuss this under a number of simplifying assumptions.
The behaviour of these moments, as also of those of K n = ∞ r=1 X n,r , depends on the way in which the frequencies p j decay to 0. In the case of power-like decay, p j ∼ cj −1/α with 0 < α < 1, it is known that, for each fixed k, the moments EX k n,r have the same order of growth with n for every r, and this is the same order of growth as that of EK k n ; moreover, the limit distributions of K n and of X n := (X n,1 , X n,2 , . ..) are normal [6,8].In contrast, for a sequence of geometric frequencies p j = cq j (0 < q < 1), there is no way to scale the X n,r 's to obtain a nontrivial limit distribution [10], and the moments of K n have oscillatory asymptotics.In a more general setting such that the p j 's have exponential decay, the oscillatory behaviour of Var[K n ] is typical [3].The spectrum of interesting possibilities is, however, much wider: for instance, frequencies p j ∼ ce −j β , with 0 < β < 1, exhibit a decay intermediate between power and exponential.
Karlin's [8] multivariate CLT for X n applies when the index of regular variation is in the range 0 < α < 1.We complement this by the analysis of the cases α = 0 and α = 1, showing that for each α ∈ [0, 1] there is exactly one possible normal limit.Finally, we prove that these one-parameter normal laws are the only possible limits of naturally scaled and centred X n .Specifically, we show that a regular variation condition holds if Var X n,r → ∞ for all r and if all the correlations {Corr (X n,r , X n,s ), r, s ≥ 1} converge.

Poissonization
As in much previous work, we shall rely on a closely related occupancy scheme, in which the balls are thrown into the boxes at the times of a unit Poisson process.The advantage of this model is that, for every t > 0, the processes (N j (t) , t ≥ 0), counting the numbers of balls in boxes j = 1, 2, . .., are independent.Let Y r (t) be the number of boxes occupied by exactly r balls at time t.In view of the representation with independent Bernoulli terms, it follows that if and only if Var[Y r (t)] → ∞.This suggests that normal approximation can be approached most easily through the Y r (t), provided that the de-Poissonization can be accomplished.We now show that this is indeed the case.Let L(•) denote the probability law of a random element, d TV the distance in total variation.

Lemma 2.1 For any
, Proof.We begin by noting that, in parallel to (2.1), where M n,j represents the number of balls out of the first n thrown that fall into box j.Our proof uses lower truncation of the sums (2.1) and (2.3) that define Y r (n) and X n,r .Since M n,j ∼ Binomial(n, p j ), it follows from the Chernoff inequalities [5] , since the p j are decreasing, and m ≤ 1 2 np k ; and the same bound holds also for N j (n) ∼ Poisson(np j ).Hence, defining it follows that But now, from an inequality of Le Cam [4] and Michel [11], we have and the X n,k,r are functions of {M n,j , j ≥ k + 1}, the Y k,r (n) of {N j (n), j ≥ k + 1}.The lemma now follows from (2.4),(2.5)and (2.6).
Proposition 2.2 Let k(n) be any sequence satisfying Then, for any sequence m(n (2.7) for each n, it follows that Lemma 2.1 can be applied for each n.Since k(n) → ∞, it follows that π k(n) → 0, so that the first element in its bound converges to zero; the second converges to zero also, by assumption.
Remark.Such sequences k(n) always exist.For instance, one can take For this choice, it is immediate that k(n) → ∞, and that np k(n) ≥ 20 log k(n) → ∞, entailing also that k(n)e −np k(n) /10 ≤ 1/k(n) → 0. Hence there are always sequences m(n) → ∞ for which (2.7) is satisfied.
Hence, in particular, any approximation to the distribution of a finite subset of the components of Y (n) = (Y 1 (n), Y 2 (n), . ..) (suitably scaled) remains valid for the corresponding components of X n , at the cost of introducing an extra, asymptotically negligible, error in total variation of at most where k(n) is any sequence satisfying the conditions of Proposition 2.2.

Normal approximation
As noted above, the distribution of Y r (t) is asymptotically normal as t → ∞ whenever Var Y r (t) → ∞.
Here, we consider the joint normal approximation of any finite set of counts Y r1 (t), . . ., Y rm (t) such that r i ≥ 1 and lim t→∞ Var Y ri (t) = ∞ for each 1 ≤ i ≤ m.We measure the closeness of two probability measures P and Q on R m in terms of differences between the probabilities assigned to arbitrary convex sets: and Σ R (t) denotes the m × m matrix with elements {Σ rs (t), r, s ∈ R := {r 1 , . . ., r m }}.Applying this result, we obtain the following theorem.
, where 1 ≤ r 1 < . . .< r m , then, as t and n tend to ∞, where k(n) is any sequence chosen as for Proposition 2.2 and satisfying max 1≤j≤m Proof.All that we need to do is to control the quantity β t .This in turn involves bounding the smallest eigenvalue of Σ R (t) away from 0. Now direct calculation shows that, for any column vector a ∈ R m , Using the definition of Y ′ l,r (t), this gives where p l,R (t) := r∈R p l,r (t) and, under the measure P l,R,t , U takes the value a j / V rj (t) with probability p l,rj (t)/p l,R (t), 1 ≤ j ≤ m.This in turn implies that and since . However, for each l, p l,R (t) ≤ 1 − p l,0 (t) − j>rm p l,j (t), and (p l,r (t), r ≥ 1) are just the Poisson probabilities (3.1).Hence 1 |x|, and hence, since taking expectations and adding over l ≥ 1 gives

Moments
For normal approximation, in view of Theorem 3.1, we are particularly interested in conditions under which V r (t) → ∞.
For the moments we have the formulas where, as above, p j,r = e −tpj (tp j ) r /r!.From (4.1) and (4.2) we obtain with k r > 0, as is seen from the inequalities hence, as long as only the convergence to infinity of V r (t) is concerned, we can deal with the simpler quantity Φ r (t).This facilitates the proof of the following theorem, showing how the asymptotic behaviour of V r (t) for different values of r is structured.
Theorem 4.1 The asymptotic behaviour of the quantities V r (t) as t → ∞ follows one of the following four regimes: 2. lim sup t→∞ V r (t) = ∞ for all r ≥ 1, and there exists an r 0 ≥ 1 such that lim inf t→∞ V r (t) = ∞ for all 1 ≤ r ≤ r 0 , and lim inf t→∞ V r (t) < ∞ for all r > r 0 ; Proof.Replacing V r with Φ r for the argument, the formula (4.1) yields For s < r, the ratio of the individual terms is given by Hence, for all s < r, It now follows that if, for some r, lim t→∞ V r (t) = ∞, then lim t→∞ V s (t) = ∞ for all 1 ≤ s ≤ r also; and that, if sup t V r (t) < ∞ for some r, then sup t V s (t) < ∞ for all s > r.Hence, to complete the proof, we just need to show that, if For this last part, write Φ r (t) = L r (t) + R r (t), where Suppose that sup t Φ r (t) = K < ∞.Then, for every t > 0, It thus remains to bound R 1 (t), which in turn can be reduced to finding a bound for Let a 0 ≥ a 1 ≥ . . .≥ 0 be any decreasing sequence such that a j /a j+h ≥ 2 holds for some h ≥ 1 and all j ≥ 1.Then a ih+m ≤ a m 2 −i for every i ≥ 0 and 0 ≤ m < h.Splitting the a j 's into h subsequences that are dominated by the geometric series, we thus have Now if, for some h ≥ 1, the frequencies p j satisfy p j /p j+h ≥ 2 for all j ≥ 1, (4.7) then applying the above result to the sequence a j = tp j+min{i:tpi<1} for any t yields the bound R 1 (t) < S(t) < 2h, since a 0 < 1.
On the other hand, if p j /p j+h < 2 for some j and h, then it follows from Thus, for any h such that e −2 (h + 1) > Kr! , we see that (4.7) must hold, since otherwise (4.6) would be violated for t = 2/p j .Hence it follows that R 1 (t) < S(t) < 2e 2 Kr! , and the final part of the lemma is proved.
In particular, in Theorem 3.1, the quantity min 1≤i≤m V ri (t) can thus be replaced in the error estimates by Φ rm (2t).
We now turn to finding conditions sufficient for distinguishing the asymptotic behaviour of the V r (t).To do so, introduce the measures Two special cases are ν 0 , a counting measure, and ν 1 , the probability distribution of a size-biased pick from the p j 's.For r > 0 write (4.1) as Comparing with standard gamma integrals, it is then immediate that lim inf This, together with Theorem 4.1, enables us to conclude the following conditions for the convergence to infinity of Φ r (t), and hence equivalently of V r (t), expressed in terms of the accessible quantities Lemma 4.2 (a) sup t≥0 Φ s (t) < ∞ for all s ≥ 1 if and only if, for some (and then for all) r ≥ 1, sup j ρ j,r < ∞.
Proof.If p j+1 ≤ x < p j then Hence (4.9) can be replaced by the inequalities Then it is immediate that 2 −r h(j) ≤ ρ j,r ≤ h * l≥1 2 −(l−1) = 2h * , so that h * < ∞ if and only if sup j ρ j,r < ∞ for some, and then for all, r ≥ 1.We now conclude the proof by showing that sup t≥0 Φ s (t) < ∞ for all s ≥ 1 if and only if h * < ∞.Defining L r (t) and R r (t) as in (4.5), we observe that, if so that Φ r (t) = L r (t) + R r (t) < ∞ for all r ≥ 1.On the other hand, L r (1/p j+h(j) ) ≥ e −2 h(j)/r!, implying that, if h * = ∞, then lim sup t→∞ Φ r (t) = ∞ for all r ≥ 1.
The familiar ratio test yields simpler sufficient conditions.Thus sup t Φ r (t) < ∞ for all r ≥ 1 if For instance, for p j = cq j , the geometric distribution with 0 < q < 1, we have p j+1 /p j = q; hence sup t Φ r (t) < ∞ for all r, and normal approximation is not adequate for any r.This illustrates possibility 4 in Theorem 4.1.For the Poisson distribution p j = cλ j /j! , we even have p j+1 /p j → 0, and so normal approximation is no good here, either.Continuing this line, we obtain a further set of conditions.[e −y y r /r!] → ∞.
As x decreases, the piecewise-constant function ν 0 (λx, x) may have downward jumps only at the values x ∈ {p j }, hence the assumption is equivalent to ν 0 (λp j , p j ) → ∞ (as j → ∞), which in turn is readily translated into (4.11).
For part (b), the same estimate with any 0 < λ < 1/2 shows that the condition (4.12) is necessary.In the other direction, suppose that p j+h /p j < 3/4 for all j ≥ J. Split (p j , j ≥ J) into h subsequences (p J+s+ih , i ≥ 0), with 0 ≤ s ≤ h − 1.Each of the subsequences has the property that the ratio of any two consecutive elements is at most 3/4.Hence, as above, the sum of the terms e −pj t (tp j ) r /r! along a subsequence yields a uniformly bounded contribution to Φ r .
Examples of irregular behaviour of moments may be constructed by breaking the sequence (p j , j ≥ 1) into finite blocks of sizes m 1 , m 2 , . .., and setting the p j 's within the i'th block all equal to some q i .We use the notation V (t) := Var r≥1 Y r (t) to denote the variance of the number of occupied boxes.
Then both V (t) and Φ 1 (t) oscillate between 0 and ∞, approaching the extremes arbitrarily closely.This illustrates possibility 3 in Theorem 4.1.
Example 2. As in [3,Example 4.4], take oscillates between 0 and ∞ as t varies; thus Y 1 (t) is asymptotically normal, but Y 2 (t) is not, and the ratios p j+1 /p j have accumulation points at 0 and 1.This illustrates possibility 2 in Theorem 4.1.We now extend this example, showing among other things that one can have any value for r 0 in behaviour 2 in Theorem 4.1.
Proof.Once again, we work with Φ r instead of V r , now writing For part (i), it is enough to consider the subsequence t l := 1/q l , l ≥ 1.
For part (ii), split R + into intervals Indeed, for t ∈ J l , taking just the term with i = l + 1 in (4.13), we obtain , 1 In fact, the Poisson sampling model makes sense for arbitrary p j 's, and the enumeration of small counts makes sense if where we write t = φ/q l with 1 ≤ φ ≤ q l /q l+1 ∼ m β(1+α) l+1 , and use the fact that φq l+1 /q l ≤ 1 in this range.For rβ(1 → ∞ as l → ∞.For rβ(1 + α) = 1, take also the term with i = l in (4.13), giving a combined contribution of at least m l e −φ φ r r! + Kφ r , for some K > 0. It is easily checked that the minimum value of this sum for φ > 1 goes to ∞ with l, hence, once again, lim l→∞ inf t∈J l Φ r (t) = ∞.
For rβ(1 + α) > 1, these two terms contribute an amount of order φ r {m to (4.13), which is small as l → ∞, for example, for φ = 2 log m l+1 .The sum of the terms in (4.13) for i ≥ l + 2 is of order ), where η > 0, and hence asymptotically smaller than the second element of (4.14).The sum of the terms in (4.13) for i ≤ l − 1 is of order at most , largest for φ = 1 for all l large enough, when it is of order asymptotically small as l → ∞.Hence, for t ′ l = 2q −1 l log m l+1 , it follows that lim l→∞ Φ r (t ′ l ) = 0, and therefore that Φ r (t) does not converge to infinity as t → ∞.
For part (iii), writing M i := i l=1 m l , we have ρ r,j ≥ q −r i l≥i+1 with equality for j = M i .Now For part (iv), we note that, for t = φ/q l , the quantity behaves asymptotically, as l becomes large, in the same way as for the Poisson occupancy scheme with a single block of m l boxes with equal frequencies q l .Computing the limit, where m l cancels because of the additivity of the moments.As φ varies, this limit value varies too, and hence, for r = s, the quantities Σ rs (t) do not converge as t → ∞.
It follows from parts (ii) and (iii) of Proposition 4.4 that the implication in part (b) of Lemma 4.2 cannot be reversed, and from part (iv) that the correlations between different components of Y (t) need not converge, even when their variances tend to infinity.Hence the approximation in Theorem 3.1 does not necessarily imply convergence.Yet another kind of pathology appears when Y 1 (t) is asymptotically independent of (Y r (t), r > 1), as in the following example.
Example 3. Suppose that the frequencies in the block construction satisfy k → ∞ for each r, we have lim j→∞ ρ j,r = ∞, and hence all the variances V r (t) go to ∞ by Lemma 4.2 (b).On the other hand, m i q i ∞ k=i+1 m k q k → 0, and it follows that Φ 1+s (2t) Φ 1 (t) = 2 s+1 i m i q i e −tqi {e −tqi t s q s i } (s + 1)! i m i q i e −tqi → 0 as t → ∞.Since Φ 1+s (2t)/Φ s (t) is bounded above by (4.4), we conclude that Σ 1,s (t) → 0 for s ≥ 2. It follows that every pair (Y ′ 1 (t), Y ′ s (t)), s ≥ 2, converges in distribution to the standard bivariate normal distribution with independent components.Because the variances go to ∞, Theorem 3.1 guarantees increasing quality of the normal approximation for any finite collection of components Y ′ ri (t).However, the full vector (Y ′ r , r = 1, 2, . ..) does not converge: see more on this example in Sections 5 and 6.
Part (ii) of Proposition 4.4 also demonstrates that lim inf j→∞ p j+1 /p j = 0 does not exclude that Φ r (t) → ∞, hence the condition (4.11) in Lemma 4.3 is not necessary.Finally, by [3,Eqn. 3.1], we have meaning that Φ 1 (t) is always of the same order as the variance of the number of occupied boxes V (t).The examples above show that this need not be the case for Φ r (t), when r ≥ 2.

Regular variation
We now henceforth assume that Φ r (t) → ∞ for all r ≥ 1.The CLT for each component of Y t then holds, as observed above, and normal approximation becomes progressively more accurate for the joint distribution of any finite collection of components.A joint normal limit for any collection of the standardized components also holds, provided that the corresponding covariances converge.From (4.3) we have The RHS converges to a nonzero limit for each pair r, s if, for each r, Φ r ≈ f ∈ R α , where R α denotes the class of functions regularly varying at ∞ with index α, and where, here and subsequently, we write a ≈ b if a(t)/b(t) → c as t → ∞ with 0 < c < ∞.If Φ r ∈ R α , then the index belongs to the range 0 ≤ α ≤ 1, because Φ r (t) cannot converge to 0, and because Φ r (t)/t → 0. The results in the next section show that, if the covariances converge for a sufficiently large set of pairs r, s, then this is in fact the only possibility.More formally, we say that then regular variation holds in the occupancy problem, meaning that, for some 0 ≤ α ≤ 1 and some rate function f ∈ R α , Φ r ≈ f for all r ≥ 2 . (5.2) This setting of regular variation extends the original approach by Karlin [8] in the special case α = 0, and, moreover, it covers all possible limiting covariance structures (Theorem 6.4).
Observe that the functions t thus, in particular, they are completely monotone.This taken together with the standard properties of regularly varying functions [2] implies that, if Φ r ∈ R α for some 0 ≤ α < 1 and r ≥ 1, then the same is true for all r ≥ 1, and we can choose the rate function f = Φ 1 .The case α = 1 is special.If Φ r ∈ R 1 for some r ≥ 2, then all Φ r for r ≥ 2 are of the same order of growth and Φ 1 ∈ R 1 , but Φ 1 ≫ Φ 2 (this motivates the choice r ≥ 2 in (5.2)).
A necessary condition for (5.2) is lim j→∞ p j+1 /p j = 1, as follows from the next lemma.
Lemma 5.1 If lim inf j→∞ p j+1 /p j < 1 then Φ r is not regularly varying for r ≥ 2, and Φ 1 is not regularly varying with index α < 1.
Proof.We have for any positive a < b.However, the assumption of the lemma allows to choose a < b < 1 such that ν 2 [ap j , bp j ] = 0 for infinitely many j = j k , so (5.4) fails for t = 1/p j k → ∞.The contradiction shows that t −2 Φ 2 (t) cannot be regularly varying.The assertions regarding r = 2 can be derived in the same way.
The example below shows that Φ r may be regularly varying for r = 1 alone.
Proof.A monotone density result which dates back to von Mises and Lamperti [9] says that the convergence tg ′ (t)/g(t) → β implies g ∈ R β (this holds for arbitrary β, including ±∞).This result applied to g(t) = t −r Φ r (t) yields the regular variation Φ r ∈ R α , with some 0 ≤ α ≤ 1.The rest follows from (5.3), monotonicity and the general behaviour of the regularly varying functions under integration and differentiation [2].
To apply the lemma, we need to pass from the convergence of covariances (5.1) to the convergence of a ratio as in (6.1).To this end, it is useful to exclude zero limits.Lemma 6.2 If lim sup t Φ s (t) = ∞ for any s ≥ 1, then no correlation Σ r,r ′ (t) with 2 ≤ r < r ′ can converge to zero.
Thus multivariate normal approximation is always good if the variances of the (unstandardized) components Y r (t) are large.However, convergence typically does not take place: see a series of examples in Proposition 4.4 below.