INCLUSION{EXCLUSION REDUX

We present a reordered version of the inclusion{exclusion principle, which is useful when computing the probability of a union of events which are close to independent. The advantages of this formulation are demonstrated in the context of 3 classic problems in combinatorics


Introduction
The inclusion-exclusion principle is one of the fundamental results of combinatorics.If A is the union of the events A 1 , A 2 , . . ., A n then, writing p i for the probability of A i , p ij for the probability of A i ∩ A j , p ijk for the probability of A i ∩ A j ∩ A k etc, the probability of A is given by The inclusion-exclusion principle tells us that if we know the p i , p ij , p ijk . . .then we can find P(A).In practice, though, we are unlikely to have full information on the p i , p ij , p ijk . ... We are then faced with the highly nontrivial task of approximating P(A) taking into account whatever partial information we are given.The difficulty in this is that knowledge of any of the probabilities p i , p ij , p ijk . . .places constraints on the others; these constraints have been extensively studied for about 150 years [1].Recent work of Kahn, Linial, Nisan and Samorodnitsky [2] has shown that if n is large, and we are given all the probabilities p i , p ij , p ijk . . .with up to r indices, then in general we will not be able to make any firm predictions about P(A) unless r is at least O( √ n).On the other hand, in certain cases where the events A i are in some sense close to being independent, then there are a number of known results bounding P(A), such as the Lovász local lemma [3] and Janson's inequality [4] (see [5] for an exposition), and these bounds use just the p i and the p ij .So although little can be said in general, in the case that the events A i are close to independent we might hope that inclusion-exclusion can be used to give good estimates for P(A).The aim of this paper is to propose a method for this.A first guess for approximating P(A), if we are given just the "low order" probabilities (the p's with few indices), might be simple truncation of the series in (1).In other words, if we know just the p i we might take just the first term, if we know both the p i and the p ij we might take just the first two terms etc.This approach works poorly in general, but if the events A i are close to mutually exclusive (i.e. the probabilities of the multiple intersections A i1 ∩A i2 ∩• • •∩A ir drop rapidly as r increases), then such truncations do give good approximations.Of course, if the events A i actually are mutually exclusive, then any truncation of (1) will be exact.The approach we give to approximating P(A) when the events A i are close to being independent is based on a reordering of terms in the inclusion-exclusion formula, with the property that any truncation of the reordered formula is exact if the events A i are independent.Our reordering of the inclusion-exclusion principle is presented in section 2. In section 3 we present results from use of our approximation schemes on 3 classic problems in combinatorics, which are encouraging.In section 4 we discuss further questions arising out of our work.An appendix discusses the relationship with Janson's inequality.As a last note in this introduction, we refer the reader to the work of Naiman and Wynn [6], who have shown that under certain circumstances there can be significant simplifications in the inclusion-exclusion principle, for example whem it is used to calculate the volume with respect to some measure of a finite union of balls in d-dimensional Euclidean space.
Proof.We count the number of times each factor occurs in the product of the q's in (2).For fixed r, P( Ār ) appears once in the product of the q i , n−1 times in the denominator in the product of the q ij , n−1 2 times in the numerator of the product of the q ijk etc.Thus in the full product of the q's the number of factors of P( Ār ) is Likewise for fixed r, s and n ≥ 2, the factor P( Ār ∩ Ās ) appears once in the product of the q ij , n−2 times in the denominator in the product of the q ijk , n−2 2 times in the numerator of the product of the q ijkl etc.Thus in the full product of the q's the number of factors of Continuing this way we see that the full product of the q's is simply Thus the q i can be written in terms of the p i (in fact q i = 1 − p i ), the q ij can be written in terms of the p i and the p ij , the q ijk can be written in terms of the p i , the p ij and the p ijk etc.With the q i , q ij , q ijk , . . .written this way in terms of the p i , p ij , p ijk , . .., we call (2) the reordered inclusion-exclusion principle.If all the products were to be multiplied out it would reduce to the standard inclusion-exclusion principle (1).But for approximation purposes, at least when the events A i are close to independent, the form (2) turns out to be much more useful.In particular the reader will be able to verify the following result: Proposition 2. The A i are independent if and only if for all i, j, k, . .., In other words, for independent events A i we can "truncate" the product in (2) after as many terms as we wish, and the result will be exact.The q ij , q ijk , . . .provide a measure of the dependence of the events.In addition to the formulae given above for the q ij , q ijk , . . .we note .
From these formulae (and their generalizations with more indices) we see that if the event A i is independent of the set of events A j1 , . . ., A jr (in the sense that any information on A j1 , . . ., A jr does not affect A i ), then What we have written up to here is probably sufficient to justify using truncations of (2), with the q i , q ij , q ijk , . . .written out in terms of the p i , p ij , p ijk , . .., as a method for approximation of probabilities for a union of events that are close to independent.But before moving to examples, we mention another property that reinforces the link between this and the standard inclusion-exclusion principle.
Definition.We say a function f of the p i , p ij , p ijk , . . . is homogeneous of order n if and, in greater generality, of order n if With this definition of order, the standard inclusion-exclusion principle writes P(A) as a sum of terms which are, respectively, homogeneous of order 1,2,3,...The reordered inclusionexclusion principle has a similar property: Before we explain the proof of this we demonstrate it explicitly in the first few cases.For the one index case it is obvious since For the two index case we have For the three index case, it can be checked that where β is clearly of order 3.
Proof of Proposition 3. We consider the following perturbative approach to the reordered inclusion-exclusion principle.For independent events the first truncation is exact.In general, ( 6) is not exact, but it is still correct to first order, that is if we ignore terms of order greater than 1 on each side.Suppose now that we try to modify (6) to make it correct to second order by looking at formulae of the form where α (2) ij is homogeneous of order 2. A brief calculation shows this can be done if (and only if) we choose α With this choice made, we can proceed to try to modify (7) to make it correct to order 3.This requires adding in two extra terms, both homogeneous of order 3, where α We can continue to modify in this way to make the formula correct to arbitrary order; furthermore in the second product all correction terms will only depend on p i , p ij , in the third product all correction terms will only depend on p i , p ij , p ijk etc.In this manner we build up the reordered inclusion-exclusion principle order-by-order, and in particular we deduce proposition 3. •

Examples
In the three examples below we approximate the probability of events using the first, second and third truncations of (2), i.e. the approximations where q i , q ij , q ijk are given by (3)- (5).
3a.The Derangement Problem.Suppose that in any pack of cards there are k copies of n different cards (for conventional cards k = 4, n = 13).Two players each take a pack of cards, and draw a card at random.We say a "match" occurs if they draw cards of the same kind.The players continue to draw cards at random and compare until their packs are used up.What is the probability that in the process there will be at least one match?
No closed form solution is known for this problem, but it is possible to derive an expansion for the probability in powers of 1 n , the first few terms of which are see for example [7].
To apply our methods, let A i , i = 1, . . ., nk, be the event that the players draw the same card on draw number i. Clearly p i = 1 n for every i, so using the first reordered approximation (8) we obtain In the limit of large n we recover the correct limit 1 − e −k .We emphasize that this comes from using just the first reordered approximation, i.e. just using the p i ; in contrast, if we were to take just the first term of the standard inclusion-exclusion formula we would obtain the absurd answer P(A) ≈ k.Note that expanding (12) in powers of 1 n gives and we see that the first reordered approximation in fact gives more than just the leading order term in 1 n : For large k we obtain the correct dominant contribution to the coefficients of both the O 1 n and O 1 n 2 terms.This is a first hint that our methods work well not just in the large n limit.Moving to the second reordered approximation, there are a total of 1  2 nk(nk −1) events A i ∩A j .In 1  2 nk(k − 1) of these player 1 draws identical cards on draws i and j, and we have p ij = k−1 n(nk−1) .In the remaining 1 2 n(n − 1)k 2 , player 1 draws distinct cards on draws i and j, and we have p ij = k n(nk−1) .Thus (9) gives Expanding this in powers of 1 n gives where now the coefficient of 1 n is exact, and the coefficient of 1 n 2 is improving (at least for large k).The necessary information to apply the third reordered approximation (10), is as follows: Type of events Player 1 draws 2 cards of one type and 1 of another type Due to its length we do not write down explicitly the formula obtained by putting this information into (10).As might by now be expected, if we expand the answer in powers of 1 n we obtain exact agreement with all three coefficients given in (11).In practice it seems the third reordered approximation outperforms the first three terms of the large n expansion (11).For conventional cards, n = 13 and k = 4, the third reordered approximation gives 1 − P(A) ≈ 0.01623287, which has an error of less than 5% that of (11), which gives 1 − P(A) ≈ 0.01622939.(we compare against the reported value 1 − P(A) ≈ 0.01623273 given in [8]).For the case n = 3 and k = 8, the second and third reordered approximations give 1 − P(A) ≈ 7.47 × 10 −5 and 1 − P(A) ≈ 7.83 × 10 −5 respectively, the latter being in good accord with the result of a computer simulation of 10 9 trials, in which we observed 78119 cases of "no matches".In this case, with k > n, we do not expect (11) to do well, and it gives 1 − P(A) ≈ 9.09 × 10 −5 .In the case k = 1 the general derangement problem reduces to the famous "hatcheck" problem.In numerous probability textbooks, see for example [9], the full inclusion-exclusion principle is used to show that as n → ∞ the probability of a match tends to 1 − e −1 .We emphasize again that we have obtained this from just the first reordered approximation, using just the values of the p i .3b. Success Runs in Coin Flips.In a series of x coin flips, what is the probability that there will be at least one run of n successive "heads"?Again no closed form solution is known to this problem, but if we denote the probability P x n then by conditioning on the possible outcomes of the first n events we obtain the recursion Using this, along with the starting values P x n = 0, x = 0, 1, . . ., n − 1, it is easy (for given n) to generate the probabilities numerically.Alternatively, writing P with starting values R x n = 2 x , x = 0, 1, . . ., n − 1.For fixed n one can solve numerically for the roots λ 1 , λ 2 , . . ., λ n of the characteristic equation of the recursion, and also determine constants It is known [10] that (13) has one root, say λ 1 , close to 2 and all other roots inside the unit circle, which for large x will give very small contributions to the P x n .Thus when x is large where λ 1 and C 1 depend on n alone and λ 1 is close to 2. It is possible, with substantial effort, to obtain expansions of λ 1 and C 1 in powers of 2 −n .As we will see, our methods produce such results with ease.To apply reordered inclusion-exclusion to this problem, define A i to be the event that there is a run of n heads starting on the ith coin flip, i = 1, 2, . . ., x − n + 1.Here by "starting" we mean strictly starting, i.e. except when i = 1 there is a tail on the (i − 1)th flip.Thus Applying (8) we have This approximation is of the form (14), with Here in the last line we have indicated the start of the expansion of C 1 in powers of 2 −n .
To implement the second reordered approximation we need to look at the events A i ∩ A j .Because we have defined A i to be the event that there is a success run starting strictly on the ith flip, A i and A j will be mutually exclusive if |i − j| ≤ n.But for |i − j| > n A i and A j are independent.As explained in the introduction, when the events A i are close to being mutually exclusive, we expect the standard form of the inclusion-exclusion principle to reliable approximations, and when the A i are close to independent we expect the reordered forms to be better.In the success run problem, for x ≤ O(n) most pairs of events are mutually exclusive, but for x n most pairs of events are independent.So we expect truncation of the standard inclusion-exclusion principle to work well for low x and the reordered approximations to work well for high x.This prediction is born out, and in fact improved upon, in practice.Here, however, we just report some results using the second reordered approximation.This is which is of the form (14), with and In Figure 1 we show the (absolute) errors in the second reordered approximation for n = 5 and n = 8.Even the low x errors are reasonably small.At the peak of error P x n is about 0.7 in the case n = 5 and 0.6 in the case n = 8. 3c.The Birthday Problem.Assume there are D days in a year, and there is equal probability of being born on any particular day of the year.What is the probability P that in a group of N people (0 ≤ N ≤ D) there are (at least) two with the same birthday?This problem has closed form solution but it is nevertheless interesting to see what can be done with the reordered approximations to the inclusion-exclusion principle.Let A i , i = 1, . . ., 1  2 N (N − 1), be the event that pair i share the same birthday.Clearly p i = 1 D .Thus the first reordered approximation gives .
The events A i are pairwise independent, so the second reordered approximation is the same as the first.There are, however, correlations between triples of events A i : if persons 1 and 2 share a birthday and so do persons 2 and 3 then necessarily so do persons 1 and 3.There are 1 6 N (N − 1)(N − 2) triples of events correlated in this way, and thus the third reordered approximation reads .
In Figure 2 we show the error in the first and third reordered approximations for D = 365.

Comments and Further Directions
In this article we have given an introduction to the reordered inclusion-exclusion principle and shown it can be useful in approximating probabilities.We are hopeful that the set of applications we have presented here will be enlarged, and that also the necessary theoretical developments will appear to justify the approximation scheme.Central to this is a better understanding of the quantities q ijk , q ijkl , . ... We suspect that the condition q i1...ir = 1 might be interpretable as a criterion that all dependence between the r events A i1 , . . ., A ir is determined through the dependence of subsets of r − 1 events.In [11] Savit and Green considered an ordered sequence of dependent events A i1 , . . ., A ir and proposed the condition P(A i1 |A i2 ∩ . . .∩ A ir ) = P(A i1 |A i2 ∩ . . .∩ A ir−1 ) as a suitable definition of "(r − 2)-lag dependence" of the sequence, in the sense that event A i1 depends on the (r − 1) events before it "only through" the the (r − 2) events before it.The condition q i1...ir = 1 seems to be a symmetric version of this condition.
In addition to understanding the meaning of the q ijk , q ijkl , . .., it is necessary to compute the constraints on the "higher order" q's induced by knowledge of the "lower order" q's.This should allow estimation of the error in truncation of the reordered inclusion-exclusion principle.In suitable circumstances we expect the results of truncation of the reordered inclusionexclusion principle to give bounds, not just approximations, for probabilities (see the appendix on the relationship with Janson's inequality).It would be extremely interesting if the Penrice inequalities for the derangement problem, as presented in [8], could be shown to come from a general result in probability.
A final open problem is whether the reordered inclusion-exclusion can be generalized, like the standard one [9], to give expressions for the probability of m amongst n events occurring.

Figure 1 :
Figure 1: Absolute errors in the second reordered approximation for the success run problem, n = 5 and n = 8.

Figure 2 :
Figure 2: Absolute errors in the first and third reordered approximations for the birthday problem, D = 365.