A Non-uniform Bound for Translated Poisson Approximation

: Let X 1 ; : : : ; X n be independent, integer valued random variables, with p th moments, p > 2, and let W denote their sum. We prove bounds analogous to the classical non-uniform estimates of the error in the central limit theorem, but now, for approximation of L ( W ) by a translated Poisson distribution. The advantage is that the error bounds, which are often of order no worse than in the classical case, measure the accuracy in terms of total variation distance. In order to have good approximation in this sense, it is necessary for L ( W ) to be su–ciently smooth; this requirement is incorporated into the bounds by way of a parameter ﬁ , which measures the average overlap between L ( X i ) and L ( X i +1) ; 1 • i • n:


Introduction
Let X 1 , . . . , X n be independent, integer valued random variables. For each 1 ≤ i ≤ n, we centre X i by requiring that 0 ≤ IEX i = µ i < 1, and we assume that IE|X i | p < ∞, where p ∈ (2, ∞) is the same for all i; we write Var X i = σ 2 i . The Berry-Esseen theorem then bounds the error in the approximation to W := n i=1 X i by a normal distribution with the same mean and variance as W : where µ := n i=1 µ i = IEW , σ 2 := n i=1 σ 2 i = Var W , Γ s := σ −s n i=1 IE{|X i − µ i | s } for any 0 < s ≤ p, and Φ denotes the standard normal distribution function. However, for integer valued random variables, approximation by a distribution which, like the normal, has a smooth probability density is not necessarily the most natural choice. It was shown using Stein's method in Barbour and Xia (1999) and inČekanavičius and Vaitkus (2001) that there are probability distributions on the integers, the simplest of which is the translated Poisson distribution, which allow an approximation whose error, with respect to the stronger, total variation distance, is of essentially the same order (Barbour and Xia 1999, Corollary 4.5) as in (1.1); in such results, an extra condition is required to exclude the possibility that W is nearly concentrated on a lattice of span greater than 1 and this is also reflected in the bounds obtained. Analogous asymptotic expansions, with error again measured in total variation norm, were established in Barbour andČekanavičius (2002), Theorem 5.1.
In this paper, we also consider approximation in total variation, but in a non-uniform sense. For sets A ⊂ [µ + xσ, ∞) ∩ Z, x > 0, we show in Theorem 2.1 that, under mild conditions, the error in approximating the probability IP[W ∈ A] by the corresponding probability for a suitably chosen translated Poisson distribution is of order O({Γ 3∧p + Γ p }(1 + x p ) −1 ), becoming smaller as x increases. This result is a natural analogue of the non-uniform bounds for the error in the usual normal approximation (Bikelis (1966), Petrov (1975, Theorem 13, p.125) , Chen and Shao (2001)), but now once again with respect to total variation distance. The translated Poisson distribution is chosen to have the same mean as W , and also to have variance close to that of W ; because only translations by integer amounts are appropriate, an exact match of both moments cannot usually be achieved.
We prove our result using Stein's method. We are able to make use of much of the standard theory associated with Poisson approximation by way of the Stein-Chen approach, but there are a number of significant differences. First, we show that the solutions to the Stein equation for sets A as above take very small values for arguments k µ + xσ, as do their first and second differences. For values of k comparable to µ + xσ, there are no better bounds than the standard bounds used in the Stein-Chen approach, and in order to get results of the required accuracy, it is necessary instead to use some smoothness of the distribution of W , expressed in the form of a bound on d T V (L(W ), L(W + 1)). Here, the procedure is much more delicate than for uniform bounds, since it must be shown that this smoothness is preserved well enough into the tails of the distribution of W . We do so using an argument based on the Mineka coupling, in Lemma 3.5.
There is another approach to total variation approximation for sums of integer valued random variables, that of Chen and Suan (2003). Their argument, although based on Stein's method, is entirely different. They take Stein's method for normal approximation as their starting point, and give bounds on approximation with a discretized normal distribution. It seems likely that their approach will yield results comparable to ours in the setting of this paper; in the context of sums of dependent random variables, the existence of the two different methods may prove to be very useful.

Notation and results
We begin by specifying the translated Poisson distribution to be used in our approximation. To do this, we first define 0 ≤ δ < 1 by We then set γ := µ − σ 2 = µ − σ 2 − δ, and note that W − γ is an integer valued random variable whose mean and variance are almost equal: indeed, IE(W − γ) − Var (W − γ) = δ. This makes W −γ a good candidate for Poisson approximation; our choice is to approximate it by the Poisson distribution Po (λ) with mean λ := σ 2 +δ, having the same mean as W −γ, and variance larger by the amount δ.
We also need some way to ensure that the distribution of W has sufficient smoothness; clearly, if the distribution of W is concentrated on the odd integers, total variation approximation by a translated Poisson distribution has no hope of success, even though approximation with respect to Kolmogorov distance may be good. We therefore define and set The quantity 1 − d i measures the overlap of the distribution of X i and its translate by 1, and the average value α of this overlap appears as an ingredient in the bounds; the larger the value of α, the smoother the distribution of W . In order to get errors of approximation of the same order as the classical results for normal approximation, the quantity α should not become small with n. Note that, by combining successive summands, it can typically be arranged that d i < 1 for all i, though the value of n is then correspondingly reduced. For neatness, we assume that λ ≥ n, which can again be achieved by grouping summands if need be. Now consider the Stein equation for Poisson approximation with Po (λ). For each A ⊂ Z + , there exists a unique function g A : Z + → R such that with g A (0) = 0, which satisfies ∆g A ≤ min{1, λ −1 }, and which, for w ≥ 1, is given by where Z ∼ Po (λ), Po (λ)(A) = P (Z ∈ A) and Po (λ){m} = P (Z = m); see Barbour, Holst and Janson (1992, Chapter 1, (1.18)). We extend the definition of g A to the whole of Z by setting g A (w) = 0 for w < 0; hence We also define h A by so that Recalling the definitions of λ and γ, we now have where W i := W − X i is independent of X i . We write Newton's expansion in the following form: for l ∈ Z, (2.10) By taking f = ∆h A , w = W i and l = X i in (2.8), and by the independence of W i and X i , we have and similarly, with f = h A in (2.9) and (2.8), so that, in particular, After putting all the pieces together, we thus have This, and a similar expression based on (2.11), are used to prove the following theorem.
Theorem 2.1 With the assumptions and definitions preceding (2.2), and supposing also that Γ p ≤ 1 and that for some universal constant C 2.1 (p), uniformly in x ≥ 8 and λ ≥ n.
Remark. In proving Theorem 2.1 we have not attempted to take advantage of a situation where the random variables X i take mostly the value 0 and only occasionally other values, as was done, for instance in Barbour and Xia (1999).

Bounding the differences
Our main tool for proving Theorem 2.1 is expression (2.12). In order to exploit it, we first need to bound the differences IE∆h A (W ) and IE∆ 2 h A (W i + j), 1 ≤ i ≤ n. We devote this section to doing so. We shall from now on consider only subsets when bounding IE∆ 2 h A (W i + j), we restrict attention to the range −1 ≤ j ≤ √ λ + 1. We proceed by way of a series of lemmas, starting with the simple observation that, if Throughout the section, the constant C may change from one use to another.
which is the statement of the lemma. 2 Then, for any r ≥ 1, where ∆ r G is the rth forward difference of G and Z ∼ Po (λ).
Proof. For r = 1, we calculate where we use the Stein identity λIEf (Z + 1) = IEZf (Z) in the third equality. Induction combined with analogous manipulations finishes the proof. 2 uniformly in λ ≥ 1 and x ≥ 4.
For r = 0, we apply Lemma 3.1 to get For r = 1, from Lemma 3.2, we have Since, for m ≥ λ, where we use a result of Barbour and Jensen (1989, p.78) in the last inequality, it follows that uniformly for all A ⊂ [λ + x √ λ, ∞) and x ≥ 8.
Proof. Since ∆h A (w) = 0 for w − γ < 0 and since ∆h A ≤ λ −1 , it follows from (3.5) that uniformly in 1 ≤ l ≤ n and in j ≤ √ λ + 1; c p is a constant implied by the Rosenthal inequality. The lemma follows. 2 then there is a constant C 2 < ∞ such that Proof. Observing that ∆h A (w) = 0 for w − γ < −1 and that j ≤ where In order to estimate the first term in (3.8), we use the Mineka coupling (Lindvall 1992, Section II.14). We construct two partial sum processes S = S (j) and S = S (j) having S 0 = j, S 0 = j + 1 and, for 1 ≤ k ≤ n, Letting T = min{k : S k = S k } be the first time that S and S meet, it is then immediate that implying that of essential importance when bounding (3.8).
The random variables Y i and Y i are coupled as follows. Setting We do this by first defining independent indicator random variables I i with where d i is as in (2.1): note also that We then define three mutually independent sequences of independent random variables Z i , Z i and ε i , 1 ≤ i ≤ n, independent also of the I i , with , and with each ε i taking the values ±1 with probability one half. If I i = 1, we set With this construction, the event {T > n} is realized by the event We write J := {1 ≤ i ≤ n : I i = 1}, and note that, conditional on J, the event E is independent of {(Z i , Z i ), 1 ≤ i ≤ n}. We now use these considerations to bound (3.9). Recalling that ∆h A ≤ λ −1 and noting that √ λ + 1 ≤ 1 4 x √ λ in x ≥ 8 and λ ≥ 1, it follows for any j ≤ √ λ + 1 that, again where the (ε i , 1 ≤ i ≤ n) have the same distribution as the ε i 's, but are independent of everything else. They are introduced so that conditional expectation in the first term of (3.10) factorizes. As a result, we find that We now want to take expectations over J in (3.11), in order to bound Before doing so, it is convenient to collect a few useful facts. The first is that (3.14) The inequality is an easy consequence of Chebyshev's and Rosenthal's inequalities. Then, from the reflection principle, we find that, on {|J| = s}, IP i∈J ε i ≥ r E, J = Bi (s, 1/2){ 1 2 (s + r), 1 2 (s + r + 1)} Bi (s, 1/2){s/2, (s + 1)/2} ≤ C exp{−r 2 /(s + 1/2)}, (3.16) and since |J| = n i=1 I i , it follows that again from Chernoff bounds. The same bound can also be used for IP i∈Jε i > 1 8 x √ λ . Putting these bounds into (3.14), it follows that, for Γ p ≤ 1, We actually require a bound for each W l , 1 ≤ l ≤ n, but not for W itself. However, the argument above applies essentially unchanged, if all sums are taken with the index l omitted; (3.15) is still true as a bound for IP[W l − IEW l > 1 8 x √ λ], and the only alteration to the bounds arises in (3.12 replaces the corresponding estimate, which lacked the factor 2e. In this argument, the first step was to replace ∆h A (·) by the upper bound λ −1 for its norm, and it was in fact λ −1 IE{I[S n > 1 + γ + λ(x)] I[T > n]} that was then treated. Hence the same bound (3.18) can also be used for because, on T > n, S n < S n . To complete the estimate of |R| in (3.8), it therefore remains only to bound the term However, it follows from Lemma 3.3 that |∆h A (1 + γ + λ(x))| ≤ λ −1 (1 + x)q 0 (x, λ). For the second factor, we use the representation of W + j as S n (j) . This shows that, conditional on the I i 's, Z i 's and Z i 's, the largest point probability taken by S n (j) cannot exceed the largest point probability of the distribution of i∈J ε i , which is itself at most C(nα) −1/2 whenever |J| ≥ nα/2, from the properties of the binomial distribution. Hence, once again using the bound IP[|J| ≤ nα/2] ≤ exp{− 1 2 nα(1 − log 2)} (with an extra factor of 2e if W is replaced by W l ) to deal with the possibility of small values of |J|, it follows that of no larger order than the bound in (3.18). Collecting the various terms, it thus follows that , (3.19) uniformly in j ≤ √ λ + 1, and the same is true for |IE∆ 2 h A (W l + j)|, for each 1 ≤ l ≤ n. Finally, if 8 ≤ x ≤ 2n, it follows by assumption that completing the proof of the lemma. The proof is based on (2.12). All its constituents have been bounded in the previous section, if only values of j ≤ √ λ+1 are allowed in the sum. This is all that is necessary if we suppose that X i ≤ √ λ + 1 a.s. for all 1 ≤ i ≤ n, so we begin by assuming this to be true. In that case, we have from Lemma 3.4. Then, easily, from Chebyshev's inequality and Lemma 3.1. For the remainder, if p ≥ 3, we have the uniform bound for |IE∆ 2 h A (W i + j)| given by Lemma 3.5 whenever 8 ≤ x ≤ 2n, since X i ≤ √ λ + 1 a.s. for all i, and in (2.12) this multiplies at most since 0 ≤ µ i < 1 for all i and √ λΓ 3 ≥ λ/n ≥ 1, this last because λ ≥ n. Note also that, because X i ≤ √ λ + 1 for all i, P [W > n( √ λ + 1)] = 0, so we are only interested in x satisfying λ + x √ λ ≤ n( √ λ + 1), and such values of x indeed satisfy x ≤ 2n. Combining these results, we find under the conditions of Theorem 2.1 that, for p ≥ 3, as required, provided that X i ≤ √ λ + 1 for all i. We note that (4.1) is a wasteful bound if the X i take only the values 0 or 1.
For 2 < p < 3, we bound as before, and then use the more detailed expression in (2.11), combined with Lemmas 3.4 and 3.5. We write As before, from Lemma 3.5, we have and this latter derived from (4.2); but then, taking (4.3), we can use Lemma 3.4 to give .
Hence it follows that and since, by Hölder's inequality, we deduce from (4.4) and (4.5) that , (4.6) proving the required estimate when 2 < p < 3, again provided that X i ≤ √ λ + 1 for all i, since λ ≥ n and √ λΓ p ≥ λ 1/2 n −(p−2)/2 ≥ 1. To remove the restriction on the distributions of the X i , we argue as in Chen and Shao (2003). We begin by defining and W * = n i=1 X * i , so that W * satisfies the previous assumptions. We now note that, for Taking z = IEW + y, it follows that By Chebyshev's and Rosenthal's inequalities and by the independence of W i and X i , we have Summing over i and replacing y by x √ λ, it follows that, for any uniformly in x, so that probabilities of sets A ⊂ [IEW + x √ λ, ∞), calculated using W * in place of W , are equal up to the required order. This is not, however, the full story, since the mean and variance of W * are not exactly the same as those of the original W ; we need also to show that probabilities calculated with the translated Poisson distribution appropriate to W * are equivalent to those for the translated Poisson distribution appropriate to W . The parameters for the translated Poisson distributions are calculated from the respective first and second moments. Their differences can be bounded as follows. For the first moments, and then, writing For the random variable W and for any A ⊂ [µ + x √ λ, ∞), our current approximation shows that where µ * := IEW * and λ * := Var W * + δ * , with this is because µ * ≤ µ and λ * ≤ λ + 1, and hence also A ⊂ [IEW * + x √ λ * , ∞) with x = x λ/(λ + 1) ≥ x/ √ 2. We thus need to compare Po (λ * ){λ * −µ * +A} with Po (λ){λ−µ+A}. However, for each k ∈ A, and it follows from Lemmas A1 and A2 that thus from (4.7) and (4.8). Hence it follows, for x ≤ √ λ, that and the sum over all larger k is exponentially small in λ, which, for such x, is also of small enough order. For x ≥ √ λ, both Po (λ * ){λ * − µ * + A} and Po (λ){λ − µ + A} are of order O(exp{−cx √ λ}) = O(Γ p (1 + x p ) −1 ). Hence, whatever the value of x ≥ 8, we can replace Po (λ * ){λ * − µ * + A} with Po (λ){λ − µ + A} with an error of at most O(Γ p (1 + x p ) −1 ). Thus the theorem remains true without the restriction that X i ≤ √ λ + 1 a.s. for all i. 2 For sets A ⊂ (−∞, λ − x √ λ] in the lower range, we can simply replace W by −W , centre anew, and use the theorem as it stands. However, there is a difference between the distributions obtained by translating a Poisson and translating the negative of a Poisson. To show that this difference is also of small enough order, so that the same translated Poisson distribution is good for both ranges at once, Lemma A3 can be used in an argument similar to that above.
We now use simple bounds for the Taylor remainders in the logarithmic series (with |s|/n ≤ ε ≤ 1/4) to give