The BK inequality for pivotal sampling a.k.a. the Srinivasan sampling process

The pivotal sampling algorithm, a.k.a. the Srinivasan sampling process, is a simply described recursive algorithm for sampling from a finite population a fixed number of items such that each item is included in the sample with a prescribed desired inclusion probability.The algorithm has attracted quite some interest in recent years due to the fact that despite its simplicity, it has been shown to satisfy strong properties of negative dependence, e.g. conditional negative association.In this paper it is shown that (tree-ordered) pivotal/Srinivasan sampling also satisfies the BK inequality.This is done via a mapping from increasing sets of samples to sets of match sequencesand an application of the van den Berg-Kesten-Reimer inequality.The result is one of only very few non-trivial situations where the BK inequality is known to hold.


Introduction
Let n be a positive integer and let S = {0, 1} n with the usual coordinatewise partial order.For ω ∈ S and K ⊆ [n], let ω K = (ω k ) k∈K .Define the subset [ω] K as [ω] K = {α ∈ S : The operation on pairs of subsets of S is given by Loosely speaking, A B is the set of ω's for which A and B occur disjointly.When [ω] K ⊆ A we will sometimes say that ω K guarantees A. Note that if A and B depend on disjoint sets of indices, then A B = A ∩ B. A subset A of S is said to be increasing if for all α, ω ∈ S, we have α ∈ A, α ≤ ω ⇒ ω ∈ A.
Let X = (X 1 , . . ., X n ) be a family of binary random variables and let µ(•) = P(X ∈ •) be its law.We say that X (or µ) is BK, or that X (or µ) satisfies the BK inequality, if for every pair of increasing events, A and B, P(X ∈ A B) ≤ P(X ∈ A)P(X ∈ B). ( Recall also that X is said to be negatively associated (NA) whenever ( 1) holds for all A and B which depend on disjoint sets of indices (i.e.whenever P(X ∈ A ∩ B) ≤ P(X ∈ A)P(X ∈ B) for all such A and B).Hence BK is trivially a stronger property than NA.
The BK inequality is known to hold when the X k 's are independent; this is the classical BK inequality of van den Berg and Kesten [3], a result which has turned out to be of fundamental importance in e.g.percolation theory and reliability theory, see e.g.[10].In fact, when the X k 's are independent, (1) holds for all sets A and B. This was a long standing open problem until finally solved by David Reimer [13] (2000).Consequently, this fact is now known as the van den Berg-Kesten-Reimer inequality, a result that will be of fundamental importance in this paper.
Clearly, if a family has the BK property, this means that it is negatively dependent in some sense.For example, as noted above, any BK family is NA.In recent years, it has become a challenge to understand how the BK property fits into the theory of negative dependence.The chase for a theory of negative dependence started out a decade or so ago with the pioneering papers [12] and [9].A major step forward was taken by Borcea, Brändén and Liggett in [4].Their work was based on an algebraic/analytic approach involving the zeros of the generating polynomials.This approach in turn was based on a series of papers of Borcea and Brändén, see the bibliography of [4].The generating polynomial approach is powerful, see e.g.[5], where a number of important sampling techniques were easily shown to satisfy a strong form of negative dependence, the strong Rayleigh property, This property implies in particular CNA, i.e. that the conditional distribution, given any subset of the variables, of the remaining variables is NA.However, the BK property has so far resisted the analytic approach and it is unclear how it would fit into this framework.Markström [11] gave examples that showed that the BK property is neither closed under conditioning, nor under external fields.He also showed that there are examples of NA families which are not BK.If CNA is sufficient for BK, remains an open question.
A moments thought reveals that the van den Berg-Kesten-Reimer inequality can only be satisfied for product measures.However it is intuitively clear that (1) should hold for all increasing events for many classes of negatively dependent binary random variables.Until quite recently however, the BK inequality was not known to hold for any substantial classes of measures apart from product measures, despite the efforts of several researchers (oral communication).The first substantial new contribution came in [2], where uniform samples of, say, k items from a population of size n, where shown to be BK.This was also shown to be true for weighted versions of uniform k-out-of-n samples and products of such measures.The still more recent work [1] proves that the anti-ferromagnetic Ising model satisfies the BK inequality.It is also shown that if (1) is modified in a natural way, then it holds also for the ferromagnetic Ising model.
In this paper, we show that the important pivotal sampling procedure, also known as Srinivasan sampling, satisfies the BK inequality.As in [2], this is done via an application of Reimer's results.A difference however, is that here we apply the van den Berg-Kesten-Reimer inequality directly, whereas [2] refers to the key ingredient of Reimer's proof, namely the set-theoretic fact known as Reimer's Butterfly Theorem.
The rest of the paper is organized as follows.In Section 2, we shortly introduce the sampling process and state the main result.Section 3 is then devoted to the proof.

Notation and statements
Pivotal sampling is an important algorithm in sampling theory.It was introduced by Deville and Tillé [7] in 1998.In the computer science community, which was generally not aware of the Devill-Tillé paper, the method was independently rediscovered and introduced by Srinivasan [14] and is consequently known there as Srinivasan sampling.The pivotal/Srinivasan algorithm is an efficient method for picking fixed size samples with the exact right pre-specified inclusion probabilities, that despite its simplicity enjoys all the virtues of negative association.One drawback is that the entropy of the resulting sample is fairly low.For example, in a population of n items, there will typically be n − 1 pairs of items such that the two items in a given pair either cannot both be included in the sample or cannot both be outside the sample.
The algorithm is recursive and works as follows.Suppose that we have a population of n items from which we want to draw a sample of exactly k items, in such way that for each item i, the probability that this item is included in the sample is exactly a pre-specified number π i .(To make this possible, we of course need that i π i = k.)Order the items linearly as item 1, item 2, ..., item n in some arbitrary way.Suppose that π 1 + π 2 ≤ 1. Play a "match" between items 1 and 2, with 1 as the winner with probability π 1 /(π 1 + π 2 ) and 2 as the winner with the remaining probability π 2 /(π 1 + π 2 ).The loser is now ruled out from being included in the sample (i.e. one sets X 1 = 0 if item 1 lost the match and X 2 = 0 otherwise), whereas the winner gets the new inclusion probability π 1 = π 1 + π 2 and is relabelled as item 1 in a new population.Let π i = π i+1 and relabel item i + 1 as i in the new population, i = 2, . . ., n − 1.Now apply the algorithm recursively and independently from the result of the first match, on the new population 1 , 2 , . . ., n − 1 with inclusion probabilities π 1 , . . ., π n−1 .In case π 1 + π 2 > 1, declare instead 1 the winner with probability (1 − π 2 )/(2 − π 1 − π 2 ) and declare 2 the winner with the remaining probability (1 − π 1 )/(1 − π 1 − /π 2 ).The winner is now given a secure place in the sample (i.e. one sets X 1 = 1 if item 1 won the match and X 2 = 1 otherwise), whereas the loser plays on, as above, in a new population with the new inclusion probability π 1 = π 1 + π 2 − 1 and the new label 1 For the remaining items, the new labels are of course 2 3 , . . ., n − 1 and the new inclusion probabilities are π i = π i+1 , i = 2, . . ., n − 1.
Note that the i'th match of the process is always played between the item initially labelled i + 1 and one item with a lower initial label, whose identity is determined by the results of the previous i − 1 matches.In particular the final sample is determined by the n − 1 independent matches.
That this indeed produces a sample of exactly k items and with the desired inclusion probabilitites follows from a short induction argument.Indeed, by induction it follows that it suffices to note that the inclusion probability of item 1 is π 1 under the induction hypothesis that the algorithm works as claimed for populations of size n − 1.This however, is obvious.
One variant of the pivotal sampling algorithm, which may raise the entropy , is to replace the linear order of the items with a tree order.I.e.place the items at the leafs of a binary tree (i.e. a tree where all vertices have degree 1 or 3) with n leafs, in some deterministic way.Then play the first match between two predetermined vertices at two leafs with a common neighbor.Place the winner or the loser, depending on the total probability of the given match, at the common neighbor and erase the two leafs.Then repeat recursively as above.(Another variant, which stands out naturally, is to order the items linearly in a uniform random way.Then, of course, our results apply given the order, but unfortunately they do not apply to the whole procedure including the randomness in order.) Pivotal sampling/Srinivasan's process was shown to be CNA under linear order in [8].This was extended to the tree-ordered case in [6].These results were further strengthened in [5], where it was shown that pivotal sampling is in fact strongly Rayleigh.Here we prove that it is also BK: Theorem 2.1 Let X = (X 1 , . . ., X n ) be the indicator random variables of a pivotal sample on n items, either linearly ordered or tree-ordered.I.e.let π 1 , . . ., π n be the given inclusion probabilities (satisfying i = k, k ∈ [n]) and let X i = 1 if item i gets included in the sample and X i = 0 otherwise.Then X is BK.

Proof of Theorem 2.1
The sample X is determined by n − 1 matches.Let the i'th match be denoted by m i and let M = {m 1 , . . ., m n } be the set of matches.Let Y = (Y 1 , . . ., Y n−1 ) be the binary random variables given by Y i = 1 if match number i is won by the item with the smallest label and Y i = 0 otherwise.Let f : {0, 1} M → S k = {ω ∈ S : i ω i = k} be the function given by letting f (y) be the pivotal sample that results from Y = y.In particular X = f (Y ).(Note that f is neither injective nor surjective.E.g. if π 1 + π 2 > 1, then f (y) 1 + f (y) 2 ≥ 1 for all y and it is easily seen that f (y) 1 = f (y) 2 = 1 for at least two different y's).For an event A ⊆ S, let A := f −1 (A) = {y ∈ {0, 1} M : f (y) ∈ A}.The key result is the following lemma.be two sets of matches such that f (z) I ≡ 1 for all z ∈ [y] K and f (z) J ≡ 1 for all z ∈ [y] L (such sets K and L can always be found since one can always take them both to be the set of all matches).If K and L are disjoint, then y ∈ A B and we are done, so assume that there exists a match m ∈ K ∩ L. From y M \m we can read off which two items play against each other in match m.Denote these by r and s, with r the winner of m.
Observe now that for any sequence z ∈ {0, 1} M of matches, changing a given bit z w has no effect on f (z) other than, possibly, interchanging the values f (z) u and f (z) v for the two items playing each other in match w.By this observation it is immediate that if r, s ∈ I ∪ J or r, s ∈ (I ∪ J) c , then f (z) I ≡ 1 for all z ∈ [y] K\m and f (z) J ≡ 1 for all z ∈ [y] L\m .
If s ∈ I, r ∈ (I ∪ J) c and z ∈ [y] L , then changing z m does not change f (z) J and hence f (z) J ≡ 1 for all z ∈ [y] L\m .Also, since changing z m can only increase f (z) I , we have f (z) I ≡ 1 for all z ∈ [y] K\m .Obviously, the same conclusions hold for s ∈ J and r ∈ (I ∪ J) c .Finally, if s ∈ (I ∪ J) c and r ∈ I, then changing z m does not change f (z) J .Hence f (z) J ≡ 1 for all z ∈ [y] L\m .(In this case however, we cannot drop m from K since changing the result of m may result in losing r from I.) Analogously, if s ∈ (I ∪ J) c and r ∈ J, then f (z) I ≡ 1 for all z ∈ [y] K\m .
In all cases, whenever K ∩ L = ∅, a match can removed from at least one of K or L in this way.Iterating the argument, it follows that there are disjoint subsets K ⊆ K and L ⊆ L such that f (z) I ≡ 1 for all z ∈ [y] K and f (z) J ≡ 1 for all z ∈ [y] L , i.e.where the second inequality is the van den Berg-Kesten-Reimer's inequality.This completes the proof. 2 Remark.It may be tempting to believe that A B = A B. However the inclusion A B ⊆ A B fails.The following example is due to an anonymous referee.Let n = 4, π 1 = π 2 = 1/3 and π 3 = π 4 = 2/3.Let A be the event that X 2 + X 3 + X 4 ≥ 2 and let B be the event that X 1 + X 2 + X 3 ≥ 2. Then A B is the event that X 1 = X 2 = X 2 = X 4 = 1 and since all match sequences result samples of size 2, A B = ∅.However any match sequence z = (z 1 , z 2 .z 3 ) with z 1 = 0 entails X 1 = 0 and hence A occurs.Analogously z 3 = 1 implies B. Hence A B ⊇ {(0, 0, 1), (0, 1, 1)}.

Lemma 3 . 1
Let A and B be two increasing subsets of {0, 1} n .Then A B ⊆ A B. Proof.Pick an arbitrary y ∈ A B. Let x := f (y).By definition we have x ∈ A B. Hence there are two disjoint index sets I, J ⊆ [n] such that [x] I ⊆ A, [x] J ⊆ B and x I ≡ x J ≡ 1.We want to show that y ∈ A B. Let K, L ⊆ M [y] K ⊆ A and [y] L ⊆ B. This finishes the proof that A B ⊆ A B. 2Now the proof of the main result is very short.Since the Y i 's are independent,P(X ∈ A B) = P(Y ∈ A B) ≤ P(Y ∈ A B) ≤ P(Y ∈ A)P(Y ∈ B) = P(X ∈ A)P(X ∈ B),