Normal approximation for isolated balls in an urn allocation model

Consider throwing $n$ balls at random into $m$ urns, each ball landing in urn $i$ with probability $p_i$. Let $S$ be the resulting number of singletons, i.e., urns containing just one ball. We give an error bound for the Kolmogorov distance from $S$ to the normal, and estimates on its variance. These show that if $n$, $m$ and $(p_i, 1 \leq i \leq m)$ vary in such a way that $\sup_i p_i = O(n^{-1})$, then $S$ satisfies a CLT if and only if $n^2 \sum_i p_i^2$ tends to infinity, and demonstrate an optimal rate of convergence in the CLT in this case. In the uniform case $(p_i \equiv m^{-1}) with $m$ and $n$ growing proportionately, we provide bounds with better asymptotic constants. The proof of the error bounds are based on Stein's method via size-biased couplings.


Introduction
Consider the classical occupancy scheme, in which each of n balls is placed independently at random in one of m urns, with probability p i of going into the ith urn (p 1 + p 2 + • • • + p m = 1).If N i denotes the number of balls placed in the ith urn, then (N 1 , . . ., N m ) has the multinomial distribution Mult(n; p 1 , p 2 , . . ., p m ).A special case of interest is the so-called uniform case where all the p i are equal to 1/m.
A much-studied quantity is the number of occupied urns, i.e. the sum i 1{N i > 0}.This quantity, scaled and centred, is known to be asymptotically normal as n → ∞ in the uniform case with m ∝ n, and a Berry-Esséen bound for the discrepancy from the normal, tending to zero at the optimum rate, was obtained for the uniform case by Englund [4], and for the general (nonuniform) case, with a less explicit error bound, by Quine and Robinson [12].More recently, Hwang and Janson [9] have obtained a local limit theorem.A variety of applications are mentioned in [9] ('coupon collector's problem, species trapping, birthday paradox, polynomial factorization, statistical linguistics, memory allocation, statistical physics, hashing schemes and so on').Also noteworthy are the monographs by Johnson and Kotz [10] and by Kolchin et al. [11]; the latter is mainly concerned with models of this type, giving results for a variety of limiting regimes for the growth of m with n (in the uniform case) and also in some of the non-uniform cases.There has also been recent interest in the case of infinitely many urns with the probabilities p i independent of n [1,6].
In this paper we consider the number of isolated balls, that is, the sum i 1{N i = 1}.This quantity seems just as natural an object of study as the number of occupied urns, if one thinks of the model in terms of the balls rather than in terms of the urns.For example, in the well-known birthday paradox, this quantity represents the number of individuals in the group who have a unique birthday.
In the uniform case, we obtain an explicit Berry-Esséen bound for the discrepancy of the number of isolated balls from the normal, tending to zero at the optimum rate when m ∝ n.In the non-uniform case we obtain a similar result with a larger constant, also finding upper and lower bounds which show that the variance of the number of isolated balls is Θ(n 2 i p 2 i ).The proof of these bounds, in Section 5, is based on martingale difference techniques and somewhat separate from the other arguments in the paper.
Our Berry-Esséen results for the number of isolated balls are analogous to the main results of [4] (in the uniform case) and [12] (in the non-uniform case) for the number of occupied urns.Our proofs, however, are entirely different.We adapt a method used recently by Goldstein and Penrose [8] for a problem in stochastic geometry (Theorem 2.1 of [8]).
Our method does not involve either characteristic functions, or first Pois-sonizing the total number of balls; in this, it differs from most of the approaches to problems of this type adopted in the past.As remarked in [9] 'almost all previous approaches rely, explicitly or implicitly, on the widely used Poissonization technique', and this remark also applies to [9] itself.One exception is Chatterjee [3], who uses a method not involving Poissonization to give an error bound with the optimal rate of decay (with unspecified constant) for the Kantorovich-Wasserstein distance (rather than the Kolmogorov distance, as here) between the distribution of the number of occupied urns and the normal, in the uniform case.
We believe that our approach can be adapted to the number of urns containing k balls, for arbitrary fixed k, but these might require significant amounts of extra work, so we restrict ourselves here to the case with k = 1.
Our approach is based on size-biased couplings.Given a nonnegative random variable W with finite mean µ = EW , we say W ′ has the W size biased distribution if Lemma 3.1 below, due to Goldstein [7], tells us that if one can find coupled realizations of W and W ′ which are in some sense close, then one may be able to find a good Berry-Esséen bound for W .It turns out that this can be done for the number of non-isolated balls.

Let
be a probability mass function on [m] := {1, 2, . . ., m}, with p (n) x > 0 for all x ∈ [m].Let X and X i , 1 ≤ i ≤ n be independent and identically distributed random variables with probability mass function p = p (n) (we shall often suppress the superscript (n)).Define Y = Y (n) by (2.1) In terms of the urn scheme described in Section 1, the probability of landing in Urn x is p x for each ball, X i represents the location of the ith ball, M i represents the number of other balls located in the same urn as the ith ball, and Y represents the number of non-isolated balls, where a ball is said to be isolated if no other ball is placed in the same urn as it is.Thus n − Y is the number of isolated balls, or in other words, the number of urns which contain a single ball.Let Z denote a standard normal random variable, and let Φ(t Given any random variable W with finite mean µ W and standard deviation σ W satisfying 0 < σ W < ∞, define the so-called Kolmogorov distance between the distribution of W and the normal.We are concerned with estimating D Y .We refer to the case where p x = m −1 for each x ∈ [m] as the uniform case.Our main result for the uniform case provides a normal approximation error bound for Y , which is explicit modulo computation of µ Y and σ Y , and goes as follows.
For asymptotics in the uniform case, we allow m = m(n) to vary with n.We concentrate on the case where m = Θ(n).In this case both µ Y and σ 2 Y turn out to be Θ(n) as n → ∞, and thus Theorem 2.1 implies D Y is O(n −1/2 ) in this regime.More formally, we have the following.Theorem 2.2.Suppose n, m both go to infinity in a linked manner, in such a way that n/m → α ∈ (0, ∞).Then with g(α) := (e −α −e −2α (α 2 −α+1)) 1/2 , we have in the uniform case that g(α) > 0 and In the case α = 1, the right hand side of (2.4), rounded up to the nearest integer, comes to 2236.Theorems 2.1 and 2.2 are proved in Section 4.
We now state our results for the general (non-uniform) case.Given n we define the parameters For the large-n asymptotics we essentially assume that γ(n) remains bounded, or at least grows only slowly with n; see Corollary 2.1 below.First we give a non-asymptotic result.
Theorem 2.3.It is the case that ) and also Corollary 2.1.Suppose sup n γ(n) < ∞.Then the following three conditions are equivalent: If these conditions hold, then Remarks.In the uniform case, Theorem 2.2 provides an alternative proof of the central limit theorem for Y when m = Θ(n) (see Theorem II.2.4 on page 59 of [11]), with error bounds converging to zero at the optimum rate.Corollary 2.1 shows that in the uniform case, if n 2 /m → ∞ and n/m remains bounded, then D Y = Θ((n 2 /m) −1/2 ).Corollary 2.1 overlaps Theorem III.5.2 on page 147 of [11] but is under weaker conditions than those in [11], and provides error bounds not given in [11].
The condition that γ(n) remain bounded, in Corollary 2.1, is also required by [12] for the analogous Berry-Esséen type result for the number of occupied boxes, though not by [9] for the local limit theorem for that quantity.In (2.7), which is used for the non-asymptotic bounds, the bound of 1  11 could be replaced by any constant less than 1  3 without changing anything except the constants in (2.9) and (2.12).
As always (see remarks in [9], [13], [1]), it might be possible to obtain similar results to those presented here by other methods.However, to do so appears to be a non-trivial task.In [11] the count of the number of isolated balls is treated separately, and differently, from the count of occupied urns or the count of urns with k balls, k = 0 or k ≥ 2. Poisson approximation methods might be of use in some limiting regimes (see [2], Chapter 6), but not when the ratio between E [Y ] and Var[Y ] remains bounded but is not asymptotically 1, which is typically the case here.
Exact formulae can be written down for the probability mass function and cumulative distribution function of Y .For example, using (5.1) on page 99 of [5], the cumulative distribution of Y may be written as with S j a sum of probabilities that j of the urns contain one ball each, i.e.
We shall not use this formula in obtaining our normal approximation results.

Lemmas
A key tool in our proofs is the following result, which is a special case of Theorem 1.2 of [7], and is proved there via Stein's method.
Lemma 3.1.[7] Let W ≥ 0 be a random variable with mean µ and variance σ 2 ∈ (0, ∞), and let W s be defined on the same space, with the W -size biased distribution. where Our next lemma is concerned with the construction of variables with size-biased distributions.Lemma 3.2.Suppose W is a sum of exchangeable indicator variables ξ 1 , . . ., ξ n , with P [W > 0] > 0. Suppose ξ ′ 1 , . . ., ξ ′ n are variables with joint distribution Then the variable W ′ = n i=1 ξ ′ i has the W size biased distribution.
Let Bin(n, p) denote the binomial distribution with parameters n ∈ N and p ∈ (0, 1).The following lemma will be used for constructing the desired close coupling of our variable of interest Y , and its size biased version Y ′ , so as to be able to use Lemma 3.1.
Our next lemma is a bound on correlations between variables associated with different balls in the urn model.Recall the definition of M i at (2.1) and a denotes the number of distinct values taken by x 2 , . . ., x k .We give a coupling of N 1 to another random variable N ′ 1 with the same distribution as N 1 that is independent of N 2 , for which we can give a useful bound on . Consider throwing a series of coloured balls so each ball can land in one of the three boxes, where the probabilities of landing in Boxes 1, 2, 3 are 1/m, a/m, (m − a − 1)/m respectively.First, throw n − k white balls and let N * 1 , N 2 , N * 3 be the number of white balls in Boxes 1, 2, 3 respectively.Then pick out the balls in Boxes 1 and 3, paint them red, and throw them again.Then throw enough green balls so the total number of green and red balls is n − 1. Finally take the red balls in Box 2 (of which there are of N 0 , say), paint them blue, and throw them again but condition them to land in Boxes 1 and 3 (or equivalently, throw each blue ball again and again until it avoids Box 2).Then (with obvious notation, superscripts denoting colours) set Combined with (3.4), and the fact that Next, we adapt Lemma 3.4 to the non-uniform setting.In this case, we need to allow ψ i to depend on the location as well as the occupation number associated with the ith ball.Consequently, some modification of the proof is required, and the constants in Lemma 3.4 are better than those which would be obtained by simply applying the next lemma to the uniform case.
x . (3.8) Proof.We first prove (3.8).Throw n balls according to the distribution p, with four of them distinguished as Ball 1, Ball 2, Ball 3 and Ball 4. For i = 2, 3, 4, let Z i be the location of Ball i and let N i be the number of other balls in the same urn as Ball i. Set A = ∪ 4 i=2 {Z i }, the union of the locations of Balls 2,3, and 4. Now suppose the balls in A are painted white.Let the balls not in A (including Ball 1 if it is not in A) be re-thrown (again, according to the distribution p).Those which land in A when re-thrown are painted yellow, and the others are painted red.Now introduce one green ball for each white ball, and if Ball 1 is white, let one of the green balls be labelled Ball G1.Throw the green balls using the same distribution p.Also, introduce a number of blue balls equal to the number of yellow balls, and if Ball 1 is yellow then label one of the blue balls as Ball B1.Throw the blue balls, but condition them to avoid A; that is, use the probability mass function (p x /(1 − y∈A p y ), x ∈ [m] \ A) for the blue balls.
Set Z 1 to be the location of Ball 1 (if it is white or red) or Ball B1 (if Ball 1 is yellow).Set Z ′ 1 to be the location of Ball 1, if it is red or yellow, or the location of Ball G1 (if Ball 1 is white).Let N w 1 , N r 1 , and N b 1 respectively denote the number of white, red, and blue balls at location Z 1 , not counting Ball 1 or Ball B1 itself.Let N y 1 , N r 1 , and N g 1 respectively denote the number of yellow, red, and green balls at location Z ′ 1 , not counting Ball 1 or Ball G1 itself.Set Now, Also, if N g denotes the number of green balls, not including Ball G1 if Ball 1 is green, then by (2.5), and also x , so that If N y denotes the number of yellow balls, other than Ball 1, then by (2.5), and by (2.7), so that Set W := 4 i=2 ψ i (Z i , N i ).By (3.9), (3.10), (3.11) and (3.13), ) with the same distribution as (X 1 , M 1 ), and and then (3.8) follows by (3.14).The proof of (3.
be the indicator of the event that ball i is not isolated.Then Y = n i=1 ξ i , and since {ξ i } are exchangeable, a random variable Y ′ with the size-biased distribution of Y can be obtained as follows; see Lemma 3.2.Let I be a discrete uniform random variable over [n], independent of X 1 , . . ., X n .Given the value of I, let To apply Lemma denotes the number of entries X ′ j of X ′ that are equal to X ′ I , other than X ′ I itself, then (i) given I and X I , M ′ I has the distribution of a Bin(n − 1, 1/m) variable conditioned to take a non-zero value, and (ii) given I, X ′ I and M ′ I , the distribution of X ′ is uniform over all possibilities consistent with the given values of I, X ′ I and M ′ I .Define the random n-vector X := (X 1 . . ., X n ).We can manufacture a random vector X ′′ = (X ′′ 1 , . . ., X ′′ n ), coupled to X and (we assert) with the same distribution as X ′ , as follows.
• Sample a value of I from the discrete uniform distribution on [n], independent of X.
• Sample a Bernoulli random variable B with • Sample a value of J from the discrete uniform distribution on [n] \ {I}.
• Define (X ′′ 1 , . . ., X ′′ n ) by Thus X ′′ is obtained from X by changing a randomly selected entry of X to the value of X I , if B = 1, and leaving X unchanged if B = 0. We claim that L(X ′′ ) = L(X ′ ).To see this define N := M I , and set Then N has the Bin(n−1, m −1 ) distribution, while N ′′ always takes the value either N or N + 1, taking the latter value in the case where B = 1 and also X J = X I .Thus for any k ∈ {0, 1, . . ., n − 1}, so by the definition (3.3) of π k , L(N ′′ ) = L(N|N > 0).This also applies to the conditional distribution of N ′′ given the values of I and X I .
Given the values of N ′′ , I and X ′′ I , the conditional distribution of X ′′ is uniform over all possibilities consistent with these given values.Hence, L(X ′′ ) = L(X ′ ).Therefore setting Proof.Let G be the σ-algebra generated by X.Then Y is G-measurable.By the conditional variance formula, as in e.g. the proof of Theorem 2.1 of [8], so it suffices to prove that For 1 ≤ i ≤ n, let V i denote the conditional probability that B = 1, given X and given that I = i, i.e.
Let R ij denote the increment in the number of non-isolated balls when the value of X j is changed to X i .Then where (i,j):i =j denotes summation over pairs of distinct integers i, j in [n].
For 1 ≤ i ≤ n and j = i, let Then we assert that R ij , the increment in the number of non-isolated balls when ball j is moved to the location of ball i, is given by Indeed, if X i = X j then S i is the increment (if any) due to ball i becoming non-isolated, while T j is the increment (if any) due either to ball j becoming non-isolated, or to another ball at the original location of ball j becoming isolated when ball j is moved to the location of ball i.
where we set Put a := E [V i ] (this expectation does not depend on i).Then by (4.4), Since (x + y) 2 ≤ 2(x 2 + y 2 ) for any real x, y, it follows that From the definitions, the following inequalities hold almost surely: and hence Thus for the first term in the right hand side of (4.6), we have For the second term in the right hand side of (4.6), set Vi := V i − a.By (4.7), −a ≤ Vi ≤ 1 − a, and |T i | ≤ 1. Hence by the case k = 4 of Lemma 3.4, By (4.8), we can always bound Cov( Vi T j , Vi ′ T j ′ ) by 1.Hence, expanding Var (i,j):i =j Vi T j in the same manner as with (6.25) below, yields .
Using this with (4.6) and (4.9) yields This completes the proof of Proposition 4.1, and hence of Theorem 2.1.
Proof of Theorem 2.2.Suppose n, m both go to infinity in a linked manner, in such a way that n/m → α ∈ (0, ∞).Then it can be shown (see Theorem II.1.1 on pages 37-38 of [11]) that E Y ∼ n(1 − e −α ), and Substituting these asymptotic expressions into (2.2) and using the fact that in this asymptotic regime, (nη(n, m)) → 16 + 24α(2 + 2α), yields (2.4). 5 The non-uniform case: proof of Theorem 2.4 For this proof, we use the following notation.Given n, m, and the probability distribution p on [m], let X 1 , X 2 , . . ., X n+1 be independent [m]-valued random variables with common probability mass function p.Given i ≤ j ≤ n+1, set X j := (X 1 , . . ., X j ) and Given any sequence x = (x 1 , . . ., x k ), set which is the number of non-isolated entries in the sequence x, so that in particular, Y = H(X n ).We shall use the following consequence of Jensen's inequality: for all k ∈ N, 2) We shall also use several times the fact that −t −1 ln(1 − t) is increasing on t ∈ (0, 1) so that by ( 2 (5.4) Proof of (2.11).We use Steele's variant of the Efron-Stein inequality [14].This says, among other things, that when (as here) X 1 , . . ., X n+1 are independent and identically distributed random variables and H is a symmetric Hence, by the case k = 2 of (5.2), , so is nonnegative and bounded by 21{M n ≥ 1}.Therefore, Proof of (2.12).Construct a martingale as follows.Let F 0 be the trivial σ-algebra, and for i ∈ [n] let F i := σ(X 1 , . . ., X i ) and write E i for conditional expectation given F i .Define martingale differences i=0 ∆ i , and by orthogonality of martingale differences, (5.5) We look for lower bounds for where Recall from (2.1) that for i < n, M i+1 denotes the number of balls in the sequence of n balls, other than ball i + 1, in the same position as ball i + 1.
Similarly, define M n+1 and M i k (for k ∈ [n + 1]) by so that M n+1 is the number of balls, in the sequence of n balls, in the same location as ball n + 1, while M i k is similar to M k , but defined in terms of the first i balls, not the first n balls. Set For taking E i+1 -conditional expectations, it is convenient to approximate h 0 (M i+1 ) and h 0 (M n+1 ) by h 0 (M i i+1 ) and h 0 (M i n+1 ) respectively.To this end, define (5.9) Set δ := (288γe 1.05γ ) −1 .We shall show that for i close to n, in the sense that n − δn ≤ i ≤ n, the variances of the terms on the right of (5.9), other than E i+1 [W i ], are small compared to the variance of the left hand side, essentially because E i+1 [h 0 (M i n+1 )] is more smoothed out than h 0 (M i i+1 ), while P [Z i = 0] is small when i is close to n.These estimates then yield a lower bound on the variance of First consider the left hand side h 0 (M i i+1 ).This variable takes the value 0 when M i i+1 = 0, and takes a value at least 1 when M i i+1 ≥ 1.Hence, (5.10) For i ≤ n, by (5.3) and (2.5), x . (5.11) For i ≥ (1 − δ)n we have i ≥ n/2, so by (5.4) and the fact that γ ≥ 1 by (2.5), x . (5.12) Since γ ≥ 1, and e −1.05 < 1 − e −0.5 , the lower bound in (5.11) is always less than that in (5.12), so combining these two estimates and using (5.10) yields Now consider the second term E i+1 [h 0 (M i n+1 )] in the right hand side of (5.9).Set N i x := i j=1 1{X j = x}, and for 1 while by Lemma 3.5 and (2.5), Combining the last two estimates on (5.14) and using assumption (2.8) yields We turn to the third term in the right hand side of (5.9).As discussed just before (5.8), when X n+1 = X i+1 we have ), and it is clear from the definitions (5.6) and (5.7) that if X n+1 = X i+1 then both W i and h 0 (M i i+1 ) − h 0 (M i n+1 ) are zero, and therefore by (5.8), By the conditional Jensen inequality, The random variable h 0 (M n+1 ) − h 0 (M i n+1 ) lies in the range [−2, 2] and is zero unless 2] and is zero unless X j = X i+1 for some j ∈ (i + 1, n].Hence, using (5.2) and the definition of x .
So in Theorem 2.3, the difficulty lies entirely in proving the upper bound in (2.9), under assumptions (2.7) and (2.8) which we assume to be in force throughout the sequel.By (2.8) we always have n ≥ 1661.
As before, set h 0 (k) := 1{k ≥ 1} + 1{k = 1}.Define for nonnegative integer k the functions The function h 1 (k) may be interpreted as the increment in the number of non-isolated balls should a ball in an urn containing k other balls be removed from that urn with the removed ball then deemed to be non-isolated itself.If q = 0 then the ball removed becomes non-isolated so the increment is 1, while if q = 1 then the other ball in the urn becomes isolated so the increment is −1.
The function h 2 (k) is chosen so that h 2 (k) + 2h 1 (k) (for k ≥ 1) is the increment in the number of non-isolated balls if two balls should be removed from an urn containing k − 1 other balls, with both removed balls deemed to be non-isolated.The interpretation of h 3 is given later.
We shall need some further functions h i which we define here to avoid disrupting the argument later on.For x ∈ [m] and k ∈ {0} ∪ [n − 1], let π k (x) be given by the π k of (3.3) when ν = n − 1 and p = p x .With the convention 0 • π −1 (x) := 0 • h i (−1) := 0, define For i = 0, 1, 2, 3, 6 define h i (k, x) := h i (k).For each i define hi (k, x) := h i (k + 1, x)/(k + 1).(6.1) Sometimes we shall write hi (k) for hi (k, x) when i ∈ {0, 1, 2, 3, 6}.Define h i := sup k,x |h i (k, x)| and hi := sup k,x | hi (k, x)|.Now we estimate some of the h i functions.Since π 0 (x) = 1 we have h 4 (0, x) = h 7 (0, x) = 0 for all x, which we use later.Also, by Lemma 3.3, Also, h 3 (3) = h 1 (2) = 0 so that by (6.2) The strategy to prove Theorem 2.3 is similar to the one used in the uniform case, but the construction of a random variable with the distribution of Y ′ , where Y ′ is defined to have the Y size biased distribution, is more complicated.As in the earlier case, by Lemma 3.2, if I is uniform over [n] then the distribution of the sum Y conditional on M I > 0 has the distribution of Y ′ .However, in the non-uniform case the conditional information that M I > 0 affects the distribution of X I .Indeed, for each i, by Bayes' theorem Therefore the conditional distribution of (X 1 , . . ., X n ), given that M i > 0, is obtained by sampling X i with probability mass function p and then sampling {X j , j ∈ [n]\{i}} independently with probability mass function p, conditional on at least one of them taking the same value as X i .Equivalently, sample the value of X i , then M i according to the binomial Bin(n − 1, p X i ) distribution conditioned to be at least one, then select a subset J of [n] \ {i} uniformly at random from sets of size M i , let the values of X j , j ∈ J be equal to X i , and let the values of X j , j / ∈ J be independently sampled from the distribution with the probability mass function of X given that X = X i .
Thus a random variable Y ′′ , coupled to Y and with the same distribution as Y ′ , can be obtained as follows.First sample X 1 , . . ., X n independently from the original distribution p, and set X = (X 1 , . . ., X n ); then select I uniformly at random from [n].Then sample a further random variable X 0 with the probability mass function p. Next, change the value of X I to that of X 0 ; next let N denote the number of other values X j , j ∈ [n] \ {I} which are equal to X 0 , and let π k = π k (X 0 ) be defined by (3.3) with ν = n − 1 and p = p X 0 .Next, sample a Bernoulli random variable B with parameter π N , and if B = 1 change the value of one of the X j , j ∈ [n] \ {I} (j = J, with J sampled uniformly at random from all possibilities) to X 0 .Finally, having made these changes, define Y ′′ in the same manner as Y in the original sum (2.1) but in terms of the changed variables.Then Y ′′ has the same distribution as Y ′ by a similar argument to that given around (4.1) in the uniform case.
Having defined coupled variables Y, Y ′′ such that Y ′′ has the Y size biased distribution, we wish to use Lemma 3.1.To this end, we need to estimate the quantities denoted B and ∆ in that lemma.The following lemma makes a start.Let G be the σ-algebra generated by the value of X, and for x ∈ [m] let N x := n i=1 1{X i = x} be the number of balls in urn x.
Lemma 6.1.It is the case that and

.7)
Proof.We have where E x [•|X] is conditional expectation given the value of X and given also that X 0 = x.The formula for E x [Y ′′ − Y |X] will depend on x through the value of N x and through the value of p x .We distinguish between the cases where I is selected with X I = x (Case I) and where I is selected with X I = x (Case II).If N x = k, then in Case I the value of N on which is based the probability π N (x) of importing a further ball to x is k − 1 whereas in Case II this value of N is k.The probability of Case I occurring is k/n.
The increment Y ′′ − Y gets a contribution of h 1 (M i ) from the moving of Ball i to x in Case II, and gets a further contribution of h 1 (M j ) + h 2 (M i )1{X i = X j } if X j is also imported to x from a location distinct from x. Finally, if N x = k the increment gets a further contribution of h 3 (k) from the fact that if there is originally a single ball at x, then this ball will no longer be isolated after importing at least one of balls I and J to x (note that π 0 (x) = 1 so we never end up with an isolated ball at x). Combining these contributions, we have (6.6), and also that for values of X, x with N where in the right hand side, the first sum comes from Case I and the other two sums come from Case II.Hence, if N Then by (6.8) we have (6.7).

2 Proof of Theorem 2 . 1 .
7) is similar, with the factors of 3 replaced by 1 in (3.10), (3.11) and (3.12).4 Proof of Theorems 2.1 and 2.Recall the definition (2.1) of M i and Y .Assume the uniform case, i.e. assume p = (m −1 , m −1 , . . ., m −1 ).Let 3.1 we need to find a random variable Y ′′ , coupled to Y , such that L(Y ′′ ) = L(Y ′ ) and for some constant B we have |Y ′ − Y | ≤ B (almost surely).To check L(Y ′′ ) = L(Y ), we shall use the fact that if M ′ I we have that L(Y ′′ ) = L(Y ′ ), i.e.Y ′′ has the size-biased distribution of Y .The definition of X ′′ in terms of X ensures that we always have |Y −Y ′′ | ≤ 2 (with equality if M I = M J = 0) ; this is explained further in the course of the proof of Proposition 4.1 below.Thus we may apply Lemma 3.1 with B = 2. Theorem 2.1 follows from that result, along with the following: Proposition 4.1.It is the case that Var(E [Y ′′ − Y |Y ]) ≤ η(n, m), where η(n, m) is given by (2.3).