Rapid mixing of dealer shuffles and clumpy shuffles

A famous result of Bayer and Diaconis [2] is that the Gilbert-Shannon-Reeds (GSR) model for the riffle shuffle of n cards mixes in 3 2 log2 n steps and that for 52 cards about 7 shuffles suffices to mix the deck. In this paper, we study variants of the GSR shuffle that have been proposed to model more realistically how people actually shuffle a deck of cards. The clumpy riffle shuffle and dealer riffle shuffle differ from the GSR model in that when a card is dropped from one hand, the conditional probability that the next card is dropped from the same hand is higher/lower than for the GSR model. Until now, no nontrivial rigorous results have been known for the clumpy shuffle or dealer shuffle. In this paper we show that the mixing time is O(log n).


Introduction
Mixing times for Markov chains is a subject of great importance, both from a theoretical point of view and because of its applicability, and has attracted much attention over the last decades.A very prominent subclass of mixing time problems is card shuffling, that is, Markov chains on the symmetric group S n of permutations of n items that one can think of as the cards of a deck.Perhaps the most famous of card shuffles is the Gilbert-Shannon-Reeds (GSR) model for the riffle shuffle for which Bayer and Diaconis [2] proved a remarkably exact result; there is a sharp cutoff at 3   2 log 2 n shuffles after which the deck is well mixed and for a standard deck of 52 cards, about 7 shuffles suffices for mixing.Prior to that, Aldous and Diaconis [1] had proved, via a striking strong uniform time argument, that 2 log 2 n shuffles is an upper bound on the mixing time.
The riffle shuffle is, together with the inefficient overhand shuffle which mixes in order n 2 log n steps (see [7] and [4]), the most common way in which people actually shuffle a deck of cards.The model for one step of the GSR shuffle is the following.First the deck is cut into two packets of which one goes into your right hand and the other into your left hand.The number of cards that go into your right (or left if you like) hand is a binomial random variable with parameters n and 1/2.Then the cards are dropped from the two hands in such a way that whenever there are A cards remaining in your right hand and B cards remaining in your left hand, the probability that the next card is dropped from your right hand is A/(A + B).
An equivalent description of the GSR shuffle is as follows.At each step 1. generate a uniform random binary sequence of length n; 2. if the binary sequence has k zeros and n − k ones, cut the deck so that the left pile has k cards and the right pile has n − k cards, and then interleave the two piles by reading the binary sequence from left to right, and dropping from the left pile with each zero and from the right pile with each one.
For example, if n = 6 and the binary sequence is 001110, then we first cut the deck into two equal piles, then interleave the piles by dropping the first two cards from the left pile, the next three cards from the right pile, and the last card from the left pile again.
Note that according to the GSR model, when you drop from your right hand, you drop a single card with probability 1/2, a pair of cards with probability 1/4, a triple of cards with probability 1/8, and so on.However, if one analyzes riffle shuffles of a fresh deck of cards in practice, one finds that the shuffles are finer.Cards tend to be dropped in a more alternating fashion, especially with experienced dealers; see Remark (e) and open problem (i) of [1].Such shuffles are named dealer riffle shuffles in [3] and we stick with this term.On the other hand, when the deck has been used for a long time and become sticky, the opposite tends to occur, namely that cards are dropped in clumps.Hence we call these shuffles clumpy riffle shuffles.
A model that includes both the dealer and clumpy shuffles as special cases is the Markovian model, which appears in the "open problems" section of [3].The Markovian model is driven by a two-state Markov chain with transition matrix p 00 p 01 p 10 p 11 and the transition rule is as follows.At each step 1. run n steps of the two-state Markov chain in stationarity to generate a binary sequence of length n; 2. if the binary sequence has k zeros and n − k ones, cut the deck so that the left pile has k cards and the right pile has n − k cards, and then interleave the two piles by reading the binary sequence from left to right, and dropping from the left pile with each zero and from the right pile with each one.
Note that the Markovian model includes the GSR model as a special case.It is natural to assume a symmetric cut (that is, p 01 = p 10 , so that the left and right piles have the same expected size) and we shall do this in the present paper.For p ∈ (0, 1) consider the two-state Markov chain with transition matrix We shall call this Markov chain the two-state chain with parameter p (or simply the two-state chain) and we define the p-riffle shuffle as the shuffle driven by this chain.When p < 1 2 we call the shuffle dealer and when p >

The time-reversed shuffle and mixing time
Recall that the mixing time of an (aperiodic irreducible) Markov chain is defined in terms of the total variation distance between the distribution at a given time and the stationary distribution: if X t is the state of the Markov chain at time t and π is the stationary distribution, then the total variation distance is given by where S is the state space and P is the underlying probability measure.The mixing time is then defined by As with the GSR shuffle before it, it turns out that the analysis of the p-riffle shuffle is more conveniently carried out for the time-reversed shuffle.Since the GSR shuffle and p-riffle shuffle are random walks on groups (see [8]) each has the same mixing time as its time reversal.
For the GSR shuffle, the time reversal can be described as follows.First give each card an independent 0 or 1 mark, each with probability 1/2.Then put all cards marked 0 above the cards marked 1, without changing the internal order among cards with the same mark.If we repeat this process and keep track of all the markings that have been given to each card, then after k shuffles each card has an independent iid sequence of 0/1 marks of length k.A moment's thought reveals that the first time, τ , when all the cards have distinct mark sequences is a strong uniform time, i.e., X τ is uniformly distributed and independent of τ .Since τ is highly concentrated around 2 log 2 n, this implies a O(log n) upper bound for the mixing time.This argument, which first appeared in [1], relies heavily on the independence between the marks for different cards.The same goes for the more detailed analysis in [2].
For the p-riffle shuffle, the time reversal has the following transition rule.First, generate marks by running n steps of the two state Markov chain in stationarity.That is, the first card is given a mark according to a fair coin flip, and subsequent cards are given the same mark as the previous card with probability p and the opposite mark with probability 1 − p. Then put all cards marked 0 above the cards marked 1, without changing the internal order among cards with the same mark.
Our main result is: Theorem 2.1.Fix p ∈ (0, 1).The mixing time τ mix for the p-riffle shuffle satisfies Remark.Other models for finer riffle shuffles have been proposed.The most prominent is perhaps the Thorp shuffle, for which the best known upper bound to date is of order log 4 n and due to Morris [5].In the special case n = 2 d , there is an upper bound of O(log 3 n), also due to Morris [6].Both of these papers rely on the same entropy technique from [5] as we do here.

Proof of Theorem 2.1
The proof of Theorem 2.1 relies on the entropy technique introduced in [5], so let us first review the parts needed.For two probability measures ν and π on a finite space S, ECP 20 (2015), paper 20. the relative entropy of ν with respect to π is given by Here we will only be concerned with the case when π is uniform.In that case one just speaks of the relative entropy of ν and drops π from the notation, so that For a random variable X, we write ENT(X) for ENT(L(X)), where L(X) is the law of X.The notation ENT(X|Y = y) then of course stands for the entropy of the conditional law of X given Y = y and ENT(X|Y ) is the random variable that equals ENT(X|Y = y) when Y = y.The following lemma relates relative entropy to total variation.It can be proved by using Schwarz inequality and solving a standard optimization problem.
Lemma 3.1.Let π be the uniform measure on S. Then Next, recall the chain rule for relative entropies: for each i ∈ [n].Note that the last term in the sum is just ENT(X n ).We will be concerned with the case when X is a random permutation of n cards.We will write X(j) for the position of card j (i.e. the card that started in position j) after applying X. Consequently X −1 (j) is the initial position of the card in position j after applying X. Writing E j := E[ENT(X −1 (j)|F j+1 )], where F j := σ(X −1 (j), X −1 (j + 1), . . ., X −1 (n)), the chain rule takes on the form In particular The key result of [5] states that applying random permutations that involve collisions decreases relative entropy by a certain factor.For a, b ∈ [n], we write c(a, b) for the random permutation that equals id with probability 1/2 and (a, b) with probability 1/2 and refer to this random permutation as a collision of positions a and b.For permutations X and Y we write XY for Y • X.Let Y be a random permutation that can be written as where Z is a random or fixed permutation, the a i 's and b i 's all distinct and the c(a i , b i )'s mutually independent given Z. (However, the identities of the a i 's and b i 's and the ECP 20 (2015), paper 20.
number of collisions typically depend on Z.) Let Y 1 , Y 2 , . . .be independent copies of Y and write Y (t) = Y 1 Y 2 . . .Y t for t = 1, 2, . . . .We say that the cards x and y collide at time t if there are two positions i and j, such that Y −1 (t−1) (i) = x, Y −1 (t−1) (j) = y and Y t contains the collision c(i, j).Fix t and let T ∈ [t] be a random time independent of the Y i 's.For a given card x, let b(x) = y if y is the first card that x collides with in [T, t].If also b(y) = x, then let m(x) = y (in which case we will also have m(y) = x).Otherwise set m(x) = x.
For the present paper it suffices to note that if x and y collide at time T then m(x) = y.
where C is a universal constant.
We will actually use Theorem 3.2 to analyze the time reversed p-riffle shuffle.In order to do this we need to generate a step of the shuffle using collisions, and for this we need the following key fact.For binary sequences M = (M 1 , . . ., M n ), let be the probability of generating M as a trajectory of the two-state chain.If we divide M into M 4 blocks of length 4, plus possibly one additional smaller block, then reversing any block of the form ab(1 − b)a (e.g., 1011) does not change p(M ).Furthermore, the effect of such a change in markings is to interchange the final positions of the middle two cards in the reversed block.
Let M be the random binary sequence generated for a step of the shuffle.We say that positions j and j + 1 interact if Let C = {j : j interacts with j + 1}.Note that if Z is the permutation generated from M , then the permutation Y defined by Y := j∈C c(j, j + 1) Z has the same distribution as Z, so we can define a step of the shuffle to be the permutation Y .Now partition the positions in the deck as l = 1, 2, . . ., log 2 (n + 1) .For each l, let T = T l be the random time for which P(T = 1) = 2 −l+1 and P(T = l + 1 − r) = 2 −r , r = 1, . . ., l − 1, so that l + 1 − T is a truncated geometric(1/2) random variable.Now let t = log 2 n and let Y 1 , Y 2 , . . .be independent copies of Y .The following lemma ensures that we can apply Theorem 3.2.Lemma 3.3.In the above notation, with l fixed and T = T l , there is a constant c > 0 independent of l and n such that for all x ∈ I l and all y < x.
The proof of Lemma 3.3 is deferred to Section 4.
Proof of Theorem 2.1 assuming Lemma 3.3.Let X be a random permutation independent of the Y i 's.Use the chain rule to write Since there are at most log 2 n + 1 ≤ 2 log 2 n of the I m 's, we must have that where l * is the l that maximizes the inner sum.Recall that for each x we define Applying Lemma 3.3 with l = l * shows that there is a constant c > 0 that depends only on p such that A i ≥ c for i ∈ I l * .Thus Theorem 3.2 gives ENT(X).
Now iterating this for X = id, X = Y (t) , X = Y (2t) , . . .and taking γ = Cc/2 shows that for K ≥ 1 we have as soon as, say, Kγ ≥ 2.Then, by Lemma 3.1, we have 4 The thinning process and the proof of Lemma 3.3 Recall that the time reversal of the p-riffle shuffle has the following transition rule.First, generate marks by running n steps of the two state Markov chain in stationarity.Then put all cards marked 0 above the cards marked 1, without changing the internal order among cards with the same mark.
Fix two cards x and y with x < y.Note that if x and y are given the same marks then their distance will typically decrease by a factor of roughly one half after the shuffle.
ECP 20 (2015), paper 20.Page 6/11 ecp.ejpecp.orgstate will be successful or not.)We call W the good sequence in the construction of V from V .
The main idea of the proof is to use the second moment method to show that, under the assumptions of the Lemma, if we condition on the event that the deciding coin repeatedly lands heads (that is, the good sequence W is chosen repeatedly instead of W ) then with probability bounded away from zero we have 0 < L t < C.
Fix a state V of the thinning process, let L = |V |, and let W be the good sequence in the contruction of the next state V from V .The key step of the proof is to bound the mean and second moment of S := |W |.We claim that E(S) ≥ L/2, for a constant c that depends only on p.
The first sum can be trivially bounded above by L. For the second sum, note that if T ≤ i then W i = W j = 1 only if Z i = Z j = Z L−1 , which occurs with probability where q = 1 − p. (Recall that the probability that a coin of bias q has an even number of heads after m flips is 1 2 + 1 2 (p − q) m .)Combining this with the fact that P( shows that the terms of the second sum in (4.3) are at most Summing this over i and j with 0 ≤ i < j < L gives at most L 2 4 + c L, for a constant c that depends only on p.This verifies (4.2).Now let V 0 , V 1 , . . .be a thinning process constructed using deciding coins and let E be the event that the deciding coin lands heads for each step up to time t.We write P and E for the conditional probability and expectation, respectively, given E. Hence, induction and the fact that f is concave imply that where f k is the kth iterate of f .Another straightforward calculation and induction imply that ECP 20 (2015), paper 20.
where h(x) = x + B √ x for a sufficiently large constant B ≥ c (e.g.B = 3c 2 suffices), provided that x/4 k ≥ 1.It follows that since l 0 /2 k ≥ 1 as k ≤ t < log 2 l 0 .Finally, note that combining (4.1) with induction gives Combining this with (4.9) and the definition of h gives (4.12) Let T k be the forget time in the construction of V k+1 from V k .Recall that on the event E, the step is successful unless T k ≥ L k .Hence, on E, the step is unsuccessful only if B k occurs, where where we write a k for l0 2 k .Combining this with the fact that L t = 0 only if B t occurs, we get that P(G c t ∪ {L t = 0}) is at most Since T k is a geometric random variable with parameter α := 1 − |1 − 2p|, we have for a universal constant γ > 0.
Finally, we use Lemma A to prove Lemma 3.3.
Proof of Lemma 3.3.Suppose x ∈ I l and y < x and define d = x − y + 1.It suffices to find a lower bound for the probability that x and y collide at time T , since this implies that m(x) = y.For k = 0, 1, . . ., let S k be the set of cards in the set consisting of y, x and the cards in between them after k shuffles have been performed.Note that we can couple {S k : k ≥ 0} with a thinning process {V k : k ≥ 0} in such a way that if V k is successful then |S k | = |V k |.It follows that if l and C are the constants appearing in the statement of Lemma A, then Lemma A implies that the probability that x and y are within a distance C from each other after log 2 d − l) steps is at least γ d for a universal constant γ.Furthermore, if x and y are within distance C of each other, there is probability bounded away from 0 that in the next step all the cards in between them with be removed and that x and y will collide in the step following that.Since P(T l − 2 = log 2 d − l) = 2 log 2 d − l−l+1 , it follows that the probability that x and y collide at time T l is at least for a universal constant γ > 0. This expression is at least c x for a constant c that depends only on p.

First, we
verify (4.1).Since • the sequence W 0 , . . ., W T −1 has at least as many ones as zeros; • given T = k where k ≤ L − 1, the value of L−1 i=k W i has the same distribution as the number of ones in the first L − k states of the two-state chain starting from 1; equation (4.1) follows.Next we verify (4.2).Note that Finally, note that (4.9) implies that E(L 2 t ) ≤ β for a constant β that depends only on p.Choosing C > 2β 1/2 gives P(L t ≥ C) = P(L 2 t ≥ C 2 )