Compound Poisson Approximation via Information Functionals

An information-theoretic development is given for the problem of compound Poisson approximation, which parallels earlier treatments for Gaussian and Poisson approximation. Let $P_{S_n}$ be the distribution of a sum $S_n=\Sumn Y_i$ of independent integer-valued random variables $Y_i$. Nonasymptotic bounds are derived for the distance between $P_{S_n}$ and an appropriately chosen compound Poisson law. In the case where all $Y_i$ have the same conditional distribution given $\{Y_i\neq 0\}$, a bound on the relative entropy distance between $P_{S_n}$ and the compound Poisson distribution is derived, based on the data-processing property of relative entropy and earlier Poisson approximation results. When the $Y_i$ have arbitrary distributions, corresponding bounds are derived in terms of the total variation distance. The main technical ingredient is the introduction of two"information functionals,"and the analysis of their properties. These information functionals play a role analogous to that of the classical Fisher information in normal approximation. Detailed comparisons are made between the resulting inequalities and related bounds.


Introduction and main results
Roughly speaking, the information-theoretic approach to the CLT and associated normal approximation bounds consists of two steps; first a strong version of Property (C) is used to show that J N (T n ) is close to zero for large n, and then Property (D) is applied to obtain precise bounds on the relative entropy D(f Tn φ).

Poisson approximation
More recently, an analogous program was carried out for Poisson approximation. The Poisson law was identified as having maximum entropy within a natural class of discrete distributions on Z + := {0, 1, 2, . . .} [14,30,16], and Poisson approximation bounds in terms of relative entropy were developed in [21]; see also [19] for earlier related results. The approach of [21] follows a similar outline to the one described above for normal approximation. Specifically, for a random variable Y with values in Z + and distribution P , the scaled Fisher information of Y was defined as, where λ is the mean of Y and the scaled score function ρ Y is given by, ρ Y (y) := (y + 1)P (y + 1) λP (y) − 1, y ≥ 0. (1.2) [Throughout, we use the term 'distribution' to refer to the discrete probability mass function of an integer-valued random variable.] As discussed briefly before the proof of Theorem 1.1 in Section 2 the functional J π (Y ) was shown in [21] to satisfy Properties (A-D) exactly analogous to those of the Fisher information described above, with the Poisson law playing the role of the Gaussian distribution. These properties were employed to establish optimal or near-optimal Poisson approximation bounds for the distribution of sums of nonnegative integer-valued random variables [21].

Compound Poisson approximation
This work provides a parallel treatment for the more general -and technically significantly more difficult -problem of approximating the distribution P Sn of a sum S n = n i=1 Y i of independent Z + -valued random variables by an appropriate compound Poisson law. This and related questions arise naturally in applications involving counting; see, e.g., [7,1,4,12]. As we will see, in this setting the information-theoretic approach not only gives an elegant alternative route to the classical asymptotic results (as was the case in the first information-theoretic treatments of the CLT), but it actually yields fairly sharp finite-n inequalities that are competitive with some of the best existing bounds.
Given a distribution Q on N = {1, 2, . . .} and a λ > 0, recall that the compound Poisson law CPo(λ, Q) is defined as the distribution of the random sum Z i=1 X i , where Z ∼ Po(λ) is Poisson distributed with parameter λ and the X i are i.i.d. with distribution Q, independent of Z.
Relevant results that can be seen as the intellectual background to the informationtheoretic approach for compound Poisson approximation were recently established in [18,33], where it was shown that, like the Gaussian and the Poisson, the compound Poisson law has a maximum entropy property within a natural class of probability measures on Z + . Here we provide nonasymptotic, computable and accurate bounds for the distance between P Sn and an appropriately chosen compound Poisson law, partly based on extensions of the information-theoretic techniques introduced in [21] and [19] for Poisson approximation.
In order to state our main results we need to introduce some more terminology. When considering the distribution of S n = n i=1 Y i , we find it convenient to write each Y i as the product B i X i of two independent random variables, where B i takes values in {0, 1} and X i takes values in N. This is done uniquely and without loss of generality, by taking In the special case of a sum S n = n i=1 Y i of random variables Y i = B i X i where all the X i have the same distribution Q, it turns out that the problem of approximating P Sn by a compound Poisson law can be reduced to a Poisson approximation inequality. This is achieved by an application of the so-called "data-processing" property of the relative entropy, which then facilitates the use of a Poisson approximation bound established in [21]. The result is stated in Theorem 1.1 below; its proof is given in Section 2.
Then the relative entropy between the distribution P Sn of S n and the CPo(λ, Q) distribution satisfies, Recall that, for distributions P and Q on Z + , the relative entropy, or Kullback-Leibler divergence, D(P Q), is defined by, Although not a metric, relative entropy is an important measure of closeness between probability distributions [10] [11] and it can be used to obtain total variation bounds via Pinsker's inequality [11], where, as usual, the total variation distance is In the general case where the distributions Q i corresponding to the X i in the summands Y i = B i X i are not identical, the data-processing argument used in the proof of Theorem 1.1 can no longer be applied. Instead, the key idea in this work is the introduction of two "information functionals," or simply "informations," which, in the present context, play a role analogous to that of the Fisher information J N and the scaled Fisher information J π in Gaussian and Poisson approximation, respectively.
In Section 3 we will define two such information functionals, J Q,1 and J Q,2 , and use them to derive compound Poisson approximation bounds. Both J Q,1 and J Q,2 will be seen to satisfy natural analogs of Properties (A-D) stated above, except that only a weaker version of Property (D) will be established: When either J Q,1 (Y ) or J Q,2 (Y ) is close to zero, the distribution of Y is close to a compound Poisson law in the sense of total variation rather than relative entropy. As in normal and Poisson approximation, combining the analogs of Properties (C) and (D) satisfied by the two new information functionals, yields new compound Poisson approximation bounds.
where P Sn is the distribution of S n , q = n i=1 p i λ q i , H(λ, Q) denotes the Stein factor defined in (1.4) below, and D(Q) is a measure of the dissimilarity of the distributions Q = (Q i ), which vanishes when the Q i are identical: Theorem 1.2 is an immediate consequence of the subadditivity property of J Q,1 established in Corollary 4.2, combined with the total variation bound in Proposition 5.3. The latter bound states that, when J Q,1 (Y ) is small, the total variation distance between the distribution of Y and a compound Poisson law is also appropriately small. As explained in Section 5, the proof of Proposition 5.3 uses a basic result that comes up in the proof of compound Poisson inequalities via Stein's method, namely, a bound on the sup-norm of the solution of the Stein equation. This explains the appearance of the Stein factor, defined next. But we emphasize that, apart from this point of contact, the overall methodology used in establishing the results in Theorems 1.2 and 1.4 is entirely different from that used in proving compound Poisson approximation bounds via Stein's method.
For general Q and any λ > 0, the Stein factor H(λ, Q) is defined as: (1.4) Note that in the case when all the Q i are identical, Theorem 1.2 yields, where q is the common mean of the Q i = Q, whereas Theorem 1.1 combined with Pinsker's inequality yields a similar, though not generally comparable, bound, See Section 6 for detailed comparisons in special cases. The third and last main result, Theorem 1.4, gives an analogous bound to that of Theorem 1.2, with only a single term in the right-hand-side. It is obtained from the subadditivity property of the second information functional J Q,2 , Proposition 4.3, combined with the corresponding total variation bound in Proposition 5.1.
where each X i has distribution Q i on N with mean q i , and each B i ∼ Bern(p i ). Assume all Q i have have full support on N, and let λ = n i=1 p i , Q = n i=1 p i λ Q i , and P Sn denote the distribution of S n . Then, where Q * 2 i denotes the convolution Q i * Q i and H(λ, Q) denotes the Stein factor defined in (1.4) above.
The accuracy of the bounds in the three theorems above is examined in specific examples in Section 6, where the resulting estimates are compared with what are probably the sharpest known bounds for compound Poisson approximation. Although the main conclusion of these comparisons -namely, that in broad terms our bounds are competitive with some of the best existing bounds and, in certain cases, may even be the sharpestis certainly encouraging, we wish to emphasize that the main objective of this work is the development of an elegant conceptual framework for compound Poisson limit theorems via information-theoretic ideas, akin to the remarkable information-theoretic framework that has emerged for the central limit theorem and Poisson approximation.
The rest of the paper is organized as follows. Section 2 contains basic facts, definitions and notation that will remain in effect throughout. It also contains a brief review of earlier Poisson approximation results in terms of relative entropy, and the proof of Theorem 1.1. Section 3 introduces the two new information functionals: The size-biased information J Q,1 , generalizing the scaled Fisher information of [21], and the Katti-Panjer information J Q,2 , generalizing a related functional introduced by Johnstone and MacGibbon in [19]. It is shown that, in each case, Properties (A) and (B) analogous to those stated in Section 1.1 for Fisher information hold for J Q,1 and J Q,2 . In Section 4 we consider Property (C) and show that both J Q,1 and J Q,2 satisfy natural subadditivity properties on convolution. Section 5 contains bounds analogous to that Property (D) above, showing that both J Q,1 (Y ) and J Q,2 (Y ) dominate the total variation distance between the distribution of Y and a compound Poisson law.

Size-biasing, compounding and relative entropy
In this section we collect preliminary definitions and notation that will be used in subsequent sections, and we provide the proof of Theorem 1.1.
The compounding operation in the definition of the compound Poisson law in the Introduction can be more generally phrased as follows. [Throughout, the empty sum For example, the compound Poisson law CPo(λ, Q) is simply C Q Po(λ), and the com- Next we recall the size-biasing operation, which is intimately related to the Poisson law. For any distribution P on Z + with mean λ, the (reduced) size-biased distribution P # is, Recalling that a distribution P on Z + satisfies the recursion, if and only if P = Po(λ), it is immediate that P = Po(λ) if and only if P = P # . This also explains, in part, the definition (1.1) of the scaled Fisher information in [21]. Similarly, the Katti-Panjer recursion states that P is the CPo(λ, Q) law if and only if, see the discussion in [18] for historical remarks on the origin of (2.2). Before giving the proof of Theorem 1.1 we recall two results related to Poisson approximation bounds from [21]. First, for any random variable X ∼ P on Z + with mean λ, a modified log-Sobolev inequality of [9] was used in [21, Proposition 2] to show that, as long as P has either full support or finite support. Combining this with the subadditivity property of J π and elementary computations, yields [21, Theorem 1] that states: If T n is the sum of n independent B i ∼ Bern(p i ) random variables, then, where P Tn denotes the distribution of T n and λ = n i=1 p i .
Proof of Theorem 1.1. Let Z n ∼ Po(λ) and T n = n i=1 B i . Then the distribution of S n is also that of the sum Tn i=1 X i ; similarly, the CPo(λ, Q) law is the distribution of the sum Z = Zn i=1 X i . Thus, writing X = (X i ), we can express S n = f (X, T n ) and Z = f (X, Z n ), where the function f is the same in both places. Applying the data-processing inequality and then the chain rule for relative entropy [11], and the result follows from the Poisson approximation bound (2.4).

Information functionals
This section contains the definitions of two new information functionals for discrete random variables, along with some of their basic properties.

Size-biased information
For the first information functional we consider, some knowledge of the summation structure of the random variables concerned is required.
where the score function r 1 is defined by, For simplicity, in the case of a single summand S = Y 1 ∼ P 1 = C Q R we write r 1 (·; P, Q) and J Q,1 (Y ) for the score and the size-biased information of S, respectively. [Note that the score function r 1 is only infinite at points x outside the support of P , which do not affect the definition of the size-biased information functional.] Although at first sight the definition of J Q,1 seems restricted to the case when all the summands Y i have distributions of the form C Q i R i , we note that this can always be achieved by taking p i = Pr{Y i ≥ 1} and letting R i ∼ Bern(p i ) and Q i (k) = Pr{Y i = k|Y i ≥ 1}, for k ≥ 1, as before.
We collect below some of the basic properties of J Q,1 that follow easily from the definition.
1. Since E[r 1 (S; P, Q)] = 0, the functional J Q,1 (S) is in fact the variance of the score r 1 (S; P, Q). 4. In general, writing F (i) for the distribution of the leave-one-out sum j =i Y i , Hence within the class of ultra log-concave R i (a class which includes compound Bernoulli sums), since the moments of R i are no smaller than the moments of R # i with equality if and only if R i is Poisson, the score r 1 (·; P, Q) ≡ 0 if and only if the R i are all Poisson, i.e., if and only if P is compound Poisson.

Katti-Panjer information
Recall that the recursion (2.1) characterizing the Poisson distribution was used as part of the motivation for the definition of the scaled Fisher information J π in (1.1) and (1.2). In an analogous manner, we employ the Katti-Panjer recursion (2.2) that characterizes the compound Poisson law to define another information functional.
Definition 3.2. Given a Z + -valued random variable Y ∼ P and an arbitrary distribution Q on N, the Katti-Panjer information of Y relative to Q is defined as, where the score function r 2 is, and where λ is the ratio of the mean of Y to the mean of Q.
From the definition of the score function r 2 it is immediate that, In the special case when Q is the unit mass at 1, the Katti-Panjer information of Y ∼ P reduces to, where λ, σ 2 are the mean and variance of Y , respectively, and I(Y ) denotes the functional, proposed by Johnstone and MacGibbon [19] as a discrete version of the Fisher information (with the convention P (−1) = 0). Therefore, in view of (3.1) we can think of J Q,2 (Y ) as a generalization of the "Fisher information" functional I(Y ) of [19]. Finally note that, although the definition of J Q,2 is more straightforward than that of J Q,1 , the Katti-Panjer information suffers the drawback that -like its simpler version I(Y ) in [19] -it is only finite for random variables Y with full support on Z + . As noted in [20] and [21], the definition of I(Y ) cannot simply be extended to all Z + -valued random variables by just ignoring the points outside the support of P , where the integrand in (3.2) becomes infinite. This was, partly, the motivation for the definition of the scaled scored function J π in [21]. Similarly, in the present setting, the important properties of J Q,2 established in the following sections fail unless P has full support, unlike for the size-biased information J Q,1 .

Subadditivity
The subadditivity property of Fisher information (Property (C) in the Introduction) plays a key role in the information-theoretic analysis of normal approximation bounds. The corresponding property for the scaled Fisher information (Proposition 3 of [21]) plays an analogous role in the case of Poisson approximation. Both of these results are based on a convolution identity for each of the two underlying score functions. In this section we develop natural analogs of the convolution identities and resulting subadditivity properties for the functionals J Q,1 and J Q,2 .

Subadditivity of the size-biased information
The proposition below gives the natural analog of Property (C) in the the Introduction, for the information functional J Q,1 . It generalizes the convolution lemma and Proposition 3 of [21].
and hence, Proof. In the notation of Definition 3.1 and the subsequent discussion, writing F (i) = P 1 * . . . * P i−1 * P i+1 * . . . * P m , so that P (i) = F (i) * C Q i R # i , the right-hand side of the projection identity (4.1) equals, as required. The subadditivity result follows using the conditional Jensen inequality, exactly as in the proof of Proposition 3 of [21].

Corollary 4.2. Under the assumptions of Proposition
Proof. Consider Y = BX, where B ∼ R = Bern(p) and X ∼ Q. Since R # = δ 0 then C Q (R # ) = δ 0 . Further, Y takes the value 0 with probability (1 − p) and the value X with probability p. Thus, Consequently, and using Proposition 4.1 yields, as claimed.

Subadditivity of the Katti-Panjer information
When S n is supported on the whole of Z + , the score r 2 satisfies a convolution identity and the functional J Q,2 is subadditive. The following Proposition contains the analogs of (4.1) and (4.2) in Proposition 4.1 for the Katti-Panjer information J Q,2 (Y ). These can also be viewed as generalizations of the corresponding results for the Johnstone-MacGibbon functional I(Y ) established in [19].
and hence, Proof. Write r 2,i (·) for r 2 (·; P i , Q i ), and note that E(Y i ) = p i q i , for each i. Therefore, E(S n ) = i p i q i which equals λ times the mean of Q. As before, let F (i) denote the distribution of the leave-one-out sum j =i Y j , and decompose the distribution P Sn of S n as P Sn (s) = x P i (x)F (i) (s − x). We have, proving the projection identity. And using the conditional Jensen inequality, noting that the cross-terms vanish because E[r 2 (X; P, Q) = 0] for any X ∼ P with full support (cf. the discussion in Section 3.2), the subadditivity result follows, as claimed.

Information functionals dominate total variation
In the case of Poisson approximation, the modified log-Sobolev inequality (2.3) directly relates the relative entropy to the scaled Fisher information J π . However, the known (modified) log-Sobolev inequalities for compound Poisson distributions [32,22], only relate the relative entropy to functionals different from J Q,1 or J Q,2 . Instead of developing subadditivity results for those other functionals, we build, in part, on some of the ideas from Stein's method and prove relationships between the total variation distance and the information functionals J Q,1 and J Q,2 . (Note, however, that Lemma 5.4 does offer a partial result showing that the relative entropy can be bounded in terms of J Q,1 .) To illustrate the connection between these two information functionals and Stein's method, we find it simpler to first examine the Katti-Panjer information. Recall that, for an arbitrary function h : Z + → R, a function g : Z + → R satisfies the Stein equation for the compound Poisson measure CPo(λ, Q) if, [See, e.g., [13] for details as well as a general review of Stein's method for Poisson and compound Poisson approximation.] Letting h = I A for some A ⊂ Z + , writing g A for the corresponding solution of the Stein equation, and taking expectations with respect to an arbitrary random variable Y ∼ P on Z + , Then taking absolute values and maximizing over all A ⊂ Z + , Noting that the expression in the expectation above is reminiscent of the Katti-Panjer recursion (2.2), it is perhaps not surprising that this bound relates directly to the Katti-Panjer information functional: Proposition 5.1. For any random variable Y ∼ P on Z + , any distribution Q on N and any λ > 0, where H(λ, Q) is the Stein factor defined in (1.4).
Proof. We assume without loss of generality that Y is supported on the whole of Z + , since, otherwise, J Q,2 (Y ) = ∞ and the result is trivial. Continuing from the inequality in (5.2), where the first inequality follows from rearranging the first sum, the second inequality follows from Lemma 5.2 below, and the last step is simply the Cauchy-Schwarz inequality.
The following uniform bound on the sup-norm of the solution to the Stein equation (5.1) is the only auxiliary result we require from Stein's method. See [5] or [13] for a proof.

Size-biased information dominates total variation
Next we establish an analogous bound to that of Proposition 5.1 for the size-biased information J Q,1 . As this functional is not as directly related to the Katti-Panjer recursion (2.2) and the Stein equation (5.2), the proof is technically more involved. Proof. For each i, let T (i) ∼ F (i) denote the leave-one-out sum j =i Y i , and note that, as in the proof of Corollary 4.2, the distribution F (i) is the same as the distribution P (i) of the modified sum S (i) in Definition 3.1. Since Y i is nonzero with probability p i , we have, where, for A ⊂ Z + arbitrary, g A denotes the solution of the Stein equation (5.1) with h = I A . Hence, summing over i and substituting in the expression in the right-hand-side of equation (5.2) with S in place of Y , yields, By the Cauchy-Schwarz inequality, the first term in (5.3) is bounded in absolute value by, and for the second term, simply bound g A ∞ by H(λ, Q) using Lemma 5.2, deducing a bound in absolute value of Combining these two bounds with the expression in (5.3) and the original totalvariation inequality (5.2) completes the proof, upon substituting the uniform sup-norm bound given in Lemma 5.2.
Finally, recall from the discussion in the beginning of this section that the scaled Fisher information J π satisfies a modified log-Sobolev inequality (2.3), which gives a bound for the relative entropy in terms of the functional J π . For the information functionals J Q,1 and J Q,2 considered in this work, we instead established analogous bounds in terms of total variation. However, the following partial result holds for J Q,1 : where B ∼ Bern(p) and X ∼ Q on N. Then: Proof. Recall from (4.3) that J Q,1 (Y ) = p 2 1−p . Further, since the CPo(p, Q) mass function at s is at least e −p pQ(s) for s ≥ 1, we have, D(C Q Bern(p) CPo(p, Q)) ≤ (1 − p) log(1 − p) + p, which yields the result.

Comparison with existing bounds
In this section, we compare the bounds obtained in our three main results, Theorems 1.1, 1.2 and 1.4, with inequalities derived by other methods. Throughout, where the B i and the Y i are independent sequences of independent random variables, with B i ∼ Bern(p i ) for some p i ∈ (0, 1), and with X i ∼ Q i on N; we write λ = n i=1 p i . There is a large body of literature developing bounds on the distance between the distribution P Sn of S n and compound Poisson distributions; see, e.g., [13] and the references therein, or [29, Section 2] for a concise review.
We begin with the case in which all the Q i = Q are identical, when, in view of a remark of Le Cam [24, bottom of p.187] and Michel [27], bounds computed for the case X i = 1 a.s. for all i are also valid for any Q. One of the earliest results is the following inequality of Le Cam [23], building on earlier results by Khintchine and Doeblin, d TV (P Sn , CPo(λ, Q)) ≤ n i=1 p 2 i . (6.1) Barbour and Hall (1984) used Stein's method to improve the bound to Roos [28] gives the asymptotically sharper bound where θ = λ −1 n i=1 p 2 i . In this setting, the bound (1.6) that was derived from Theorem 1.1 yields The bounds (6.2) -(6.4) are all derived using the observation made by Le Cam and Michel, taking Q to be degenerate at 1. For the application of Theorem 1.4, however, the distribution Q must have support the whole of N, so Q cannot be replaced by the point mass at 1 in the formula; the bound that results from Theorem 1.4 can be expressed as (6.5) Illustration of the effectiveness of these bounds with geometric Q and equal p i is given in Section 6.2.
For non-equal Q i , the bounds are more complicated. We compare those given in Theorems 1.2 and 1.4 with three other bounds. The first is Le Cam's bound (6.1) that still remains valid as stated in the case of non-equal Q i . The second, from Stein's method, has the form where We illustrate the effectiveness of these bounds in Section 6.3; in our examples, Roos's bounds are much the best.

Broad comparisons
Because of their apparent complexity and different forms, general comparisons between the bounds are not straightforward, so we consider two particular cases below in Sections 6.2 and 6.3. However, the following simple observation on approximating compound binomials by a compound Poisson gives a first indication of the strength of one of our bounds. 2. If p < 1/2, then the bound of Theorem 1.1 is stronger than the bound (6.2); 3. If 0.012 < p < 1/2 and n > ( √ 2p(1 − p)) −1 are satisfied, then the bound of Theorem 1.1 is stronger than all three bounds in (6.1), (6.2) and (6.3).
Proof. The first two observations follow by simple algebra, upon noting that the bound of Theorem 1.1 in this case reduces to ; the third is shown numerically, noting that here θ = p.
One can also examine the rate of convergence of the total variation distance between the distribution P Sn and the corresponding compound Poisson distribution, under simple asymptotic schemes. We think of situations in which the p i and Q i are not necessarily equal, but are all in some reasonable sense comparable with one another; we shall also suppose that jQ(j) is more or less a fixed and decreasing sequence. Two ways in which p varies with n are considered: Regime I. p = λ/n for fixed λ, and n → ∞; Regime II. p = µ n , so that λ = √ µn → ∞ as n → ∞.
Under these conditions, the Stein factors H(λ, Q) are of the same order as 1/ √ np. Table 1 compares the asymptotic performance of the various bounds above.
The poor behaviour of the bound in Theorem 1.2 shown above occurs because, for large values of λ, the quantity D(Q) behaves much like λ, unless the Q i are identical or near-identical.

Example. Compound binomial with equal geometrics
We now examine the finite-n behavior of the approximation bounds (6.1) -(6.3) in the particular case of equal p i and equal Q i , when Q i is geometric with parameter α > 0, Q(j) = (1 − α)α j−1 , j ≥ 1.

Example. Sums with unequal geometrics
Here, we consider finite-n behavior of the approximation bounds (6.1), (6.6) and (6.7) in the particular case when the distributions Q i are geometric with parameters α i > 0. The resulting bounds are plotted in Figures 4 and 5.
In this case, it is clear that the best bounds by a considerable margin are those of Roos [29] given in (6.7).  Figure 2: Bounds on the total variation distance d TV (C Q Bin(p, Q), CPo(λ, Q)) for Q ∼ Geom(α) as in Figure 1, here plotted against the parameter α, with n = 100 and λ = 5 fixed.  Figure 3: Bounds on the total variation distance d TV (C Q Bin(p, Q), CPo(λ, Q)) for Q ∼ Geom(α) as in Figure 1, here plotted against the parameter λ, with α = 0.2 and n = 100 fixed.  Figure 4: Bounds on the total variation distance d TV (P Sn , CPo(λ, Q)) for Q i ∼ Geom(α i ), where α i are uniformly spread between 0.15 and 0.25, n varies, and p is as in regime I, p = 5/n. Again, bounds based on J Q,1 are plotted as •; those based on J Q,2 as △; Le Cam's bound in (6.1) as ▽; the Stein's method bound in (6.6) as ×, and Roos' bound from Theorem 2 of [29] as ⊠. The true total variation distances, computed numerically in each case, are plotted as ⋄.  Figure 5: Bounds on the total variation distance d TV (P Sn , CPo(λ, Q)) for Q i ∼ Geom(α i ) as in Figure 4, where α i are uniformly spread between 0.15 and 0.25, n varies, and p is as in Regime II, p = 0.5/n.