Markov chains on finite fields with deterministic jumps

We study the Markov chain on $\mathbf{F}_p$ obtained by applying a function $f$ and adding $\pm\gamma$ with equal probability. When $f$ is a linear function, this is the well-studied Chung--Diaconis--Graham process. We consider two cases: when $f$ is the extension of a rational function which is bijective, and when $f(x)=x^2$. In the latter case, the stationary distribution is not uniform and we characterize it when $p=3\pmod{4}$. In both cases, we give an almost linear bound on the mixing time, showing that the deterministic function dramatically speeds up mixing. The proofs involve establishing bounds on exponential sums over the union of short intervals.


Introduction
Recent work of Chatterjee and Diaconis studied how Markov chains can be sped up by interspersing a deterministic function between steps of the Markov chain [5]. For "most" bijective functions, the mixing time becomes logarithmic in the size of the state space. They asked for concrete examples where this speedup takes place. Currently, one of the only known examples is the Chung-Diaconis-Graham process, which is a Markov chain on the finite field F p given by where the ε n are uniform on {−1, 0, 1} and independent. While the walk X n+1 = X n + ε n+1 mixes in order p 2 steps, for certain a ∈ F p and almost all p, the walk defined by (1.1) mixes in order log(p) steps [11]. The purpose of this paper is to study the mixing times of non-linear analogues of (1.1). In particular, for bijections which are extensions of rational functions on F p , we are able to show that the mixing time is faster than p 1+ε for any ε > 0. This provides more examples of Markov chains which mix faster after adding deterministic jumps. We also consider the case when f (x) = x 2 . We consider a slightly different version of the Markov chains considered in [5], and so we also explain how to apply our methods to the Markov chains appearing there.
1.1. Main results. Let us now define the class of functions f that we will work with. For a prime p and d ∈ N, let B(p, d) denote the set of functions f : F p → F p which are bijections, and for which there exist coprime P, Q ∈ F p [x] polynomials of degree at most d such that f = P/Q except at the zeroes of Q, and such that P/Q is not a linear or constant function. Remark 1. 1. Some examples of functions f to keep in mind are permutation polynomials, which are polynomials P ∈ F p [x] that are also bijections, and the function f (x) = x −1 for x = 0 and f (0) = 0. A specific family of permutation polynomials is given by f (x) = x 3 for primes p > 3 with (3, p − 1) = 1.
Theorem 1.2. Let p be a prime and d ∈ N. Let f ∈ B(p, d), and let γ ∈ F p be non-zero. Let ε i denote independent random variables uniform on ±γ. Consider the lazy Markov chain defined by X n+1 = f (X n ) + ε n+1 with probability 1/2, X n with probability 1/2.
Then if π denotes the uniform distribution on F p , for any ε > 0, there is some constant C = C(ε, d) such that sup X 0 ∈Fp P(X n ∈ ·) − π T V ≤ e −C n p 1+ε .
Remark 1.3. The mixing time t mix (ε) of an ergodic Markov chain P is the minimum n such that sup where π is the stationary distribution. The total variation bound in Theorem 1.2 (and the corresponding bounds in Theorems 1.8 and 4.5) imply that t mix (ε) is O(p 1+δ ) for any δ > 0, assuming that f is the extension of rational functions of bounded degree. We also remark that the O(p 1+δ ) could be improved to O(p log C p) for some large constant C (probably C = 100 would work). Since the bound we obtain is most likely far from optimal (see Remark 1.6), we do not bother to explicitly state the stronger bound.
Remark 1. 4. The assumption that the ε i are uniform on ±γ is easily relaxed. By considering the bijection f + γ 0 , we can choose the ε i to be supported on any two point set. Since the bound from Cheeger's inequality can only get worse as edges are removed, this extends to any distribution µ for the ε i as long as µ is supported on at least two points, with the constant depending on min γ∈supp µ µ(γ).
The laziness can also be removed at the cost of some assumptions on the distribution of the ε i , see Theorem 4.5.
Remark 1.5. This random walk is a slight modification of the walks considered in [5], which always applies the function f , but possibly adds 0 instead of ±γ. A small modification to the argument allows a similar result to be established in this setting as well, see Theorem 4.5. This gives a large class of examples where adding a deterministic bijection dramatically speeds up the mixing time. Finding such examples was a question raised in [5].
Remark 1.6. By an entropy argument, it's easy to see that at least order log(p) steps are necessary in order for the random walk to be close to uniform. This leaves a large gap between the upper and lower bounds and determining the correct order of the mixing time seems to be a challenging problem.
The proof of Theorem 1.2 involves applying a version of Cheeger's inequality for directed graphs to reduce the problem to giving a lower bound for the number of solutions to y = f (x) ± γ with x ∈ S and y ∈ S c . This bound is obtained using some ideas from analytic number theory, namely exponential sum bounds, with a key ingredient being the Weil bound. While applications of the Weil bound have previously appeared in the study of expanders (see [6] for example), our approach does not require the eigenvalues to be exactly identified. This provides a more flexible approach, but gives a weaker bound.
The main technical tools are exponential sum bounds of the form where the I j are disjoint intervals of length L, and C is some constant depending on f , along with bounds for linear exponential sums. These bounds appear to be new, although the proof of (1.2) closely follows ideas of Browning and Haynes [4], who proved the bound when f (x) = 1/x. Finally, we also give some results for the square-and-add Markov chain, which is the case when f (x) = x 2 . This is obviously not a bijection, but when p = 3 (mod 4), the stationary distribution can be determined and a similar mixing time bound can be established. Theorem 1.7. Let p be a prime, p = 3 (mod 4), and let γ ∈ F p be non-zero. Let ε i denote independent random variables uniform on ±γ. Define a Markov chain on F p by X n+1 = X 2 n + ε n+1 . Then the chain has a unique recurrent communicating class, and thus a unique stationary distribution. The stationary distribution is given by Furthermore, the Markov chain is aperiodic if γ = 1.
Then if π denotes the stationary distribution on F p , for any ε, there is some constant These results partially answer some questions raised in [10]. The case of p = 1 (mod 4) seems to be more mysterious and even the stationary distribution is not well-understood. Some heuristics and a conjecture are given in Section 2.2.
1.2. Related work. The Markov chains considered in this paper can be viewed as a non-linear analogue of the Chung-Diaconis-Graham process, defined by X n+1 = aX n + ε n+1 on F p (or more generally on Z/(p) for composite p). This was first introduced in [8], and studied in [3,[15][16][17]20]. Recent work of Eberhard and Varjú established cutoff for this Markov chain for many values of a [11]. Specifically, they show that when a = 2, the Markov chain has cutoff at c log 2 p for an explicit c ≈ 1.01136 for almost all p, and extend this to many other values of a. These works all rely heavily on the linear nature of the problem.
Chatterjee and Diaconis studied how deterministic jumps can increase the speed of convergence for Markov chains [5]. They showed that for a state space of size n, applying a (fixed) random bijection gives a mixing time of order log(n) with high probability, but noted that specific examples were hard to find. The random walks treated in this paper provide some progress in this direction. This fits into a broader theme of finding ways to accelerate the convergence of Markov chains to their stationary distribution [1,2,14].
The Markov chain when f (x) = x 2 studied in this paper can be viewed as a random version of a discrete dynamical system on F p . The dynamical system defined by iterating the map x → x 2 + c has been well-studied although many questions remain, see [18,21,22].
The exponential sum bounds are obtained using arguments that closely follows that of [4], who considered the case when f (x) = 1/x. Their work is based on an argument Heath-Brown [13]. These arguments work for any finite field, not just prime fields. It would be interesting to see if these ideas could be useful in studying the mixing times of random walks on F q , for q a prime power.
1.3. Outline. Section 2 establishes some basic properties of the square-and-add Markov chain, including the proof of Theorem 1.7. In Section 3, we prove the exponential sum bounds needed, which are the main technical tool. Finally, in Section 4, we apply the bounds from Section 3 to prove Theorem 1.2, and explain how to extend the arguments to establish Theorem 1.8 and to study some of the random walks defined in [5].
1.4. Notation. Let e p (x) = exp(2πix/p). We will use C to denote a constant that may change from line to line.

The square-and-add Markov chain
In this section, we establish some basic properties of the Markov chain when f (x) = x 2 . The reason this case is more involved is that f is not a bijection, and so even the stationary distribution is not obvious. While the case of p = 1 (mod 4) seems wild, surprisingly the case of p = 3 (mod 4) is tractable.
2.1. The case p = 3 (mod 4). The reason that the case of p = 3 (mod 4) is much easier is that in this case, exactly one of α and −α is a quadratic residue mod p. Using this fact, we are able to prove all essential properties about the Markov chain.
If β = ±γ, the only difference is there is an extra term |{ρ | ρ 2 = ±γ}| = 2 in the left hand side of (2.1) corresponding to α = 0 which is not paired. But of course this corresponds to the solution to ρ 2 = 0 on the right hand side of (2.1).
Lemma 2.2. The Markov chain defined in Theorem 1.7 has a unique recurrent communicating class, and the restriction to this component is aperiodic if γ = 1.
Proof. Any recurrent state must be of the form α 2 ± γ, and so must be contained in supp π. Conversely, since π is a stationary distribution, every element of supp π is recurrent. It thus suffices to show that supp π contains a single communicating class. We do so as follows.
We claim for any α, β ∈ supp π, we can find a sequence of moves α → −α and α → α ± 2γ taking us from α to β, while staying entirely inside supp π. We'll then show that these moves stay within a single communicating class, from which we can conclude that supp π contains a single communicating class.
To see that we can go from α to β using these moves while staying in supp π, we argue as follows. At least one of α and −α lie in supp π, since either α + γ or −α − γ is a quadratic residue as p = 3 (mod 4). Similarly, either α, α + 2γ ∈ supp π, or −α, −α − 2γ ∈ supp π, as one of α + γ and −α − γ will be a quadratic residue. A similar statement holds for α and α − 2γ. We thus partition F p into sets {±α} and {0}, and note that each set contains an element of supp π. Starting from α ∈ supp π, we can then move to an element in {±(α+2γ)} while staying inside supp π, by either moving directly from α to α + 2γ if it lies in supp π, or moving from α to −α, and then −α−2γ which are both guaranteed to lie in supp π if α+2γ does not. Similarly, we can move to an element in {±(α − 2γ)}. In this way, we can reach an element in any set of the form {±(α + 2kγ)}. But this exhausts F p , since 2γ is non-zero as p = 2, so we can reach either β or −β. If we reach −β, then as β by assumption also lies in supp π, we can move from −β to β. Now we show the claims that if α, −α ∈ supp π, then α and −α belong to the same communicating class, and that if α, α + 2γ ∈ supp π, then they belong to the same communicating class.
Suppose that α and −α both lie in supp π. Then as there is a path from α to α 2 + γ, and α ∈ supp π so it must be recurrent, there must be a path from α 2 + γ to α. But then there is a path from −α to α 2 + γ, and then α. Thus, α and −α must belong to the same communicating class. Now suppose that α and α + 2γ both lie in supp π. If α + γ is a quadratic residue, then note that one of its square roots lies in supp π (it was already shown that one of α and −α lie in supp π for any α). This square root must then be recurrent, and so from α, we must be able to reach it, after which we can move to α + 2γ. If α + γ is not a quadratic residue, then −α − γ is a quadratice residue, and so both −α and −α−2γ lie in supp π and by the same argument lie in the same communicating class. We can then use the fact that −α and α lie in the same communicating class, as well as −α − 2γ and α + 2γ, to conclude that α and α + 2γ lie in the same communicating class.
To see that the random walk is aperiodic if γ = 1, first note that there is a cycle of length 2, namely 0 → 1 → 0, and so it suffices to find an odd cycle. Now if a path from α 2 + 1 to α were of even length, we would immediately be done because then after taking the step α → α 2 + 1, we would obtain an odd cycle, so assume otherwise. That is, assume for all α that there is are only paths of odd length from α 2 + 1 to α. Then moving from −α to α 2 + γ, and then back to α, is a path of even length. If α + 1 and is a quadratic residue, we obtain an odd length path going from α + 2 to √ α + 1 and then moving to α gives an even length path from α + 2 to α. If −α − 1 were a quadratic residue, we would obtain an even length path from −α − 2 to −α. We can similarly assume paths from α 2 − 1 to α are odd length, and obtain even length paths from either α − 2 to α or −α + 2 to −α. But then using these we can construct an even length path from α 2 + 1 back to α, since we proved above that we can move from any element in supp π to any other using these moves. Thus, there must be an odd cycle. and it can be checked that the numerators are indeed given by counting solutions to β 2 ± 1 = α.

2.2.
The case p = 1 (mod 4). The case when p = 1 (mod 4) is more wild, but there is some evidence to suggest that much of the behavior is similar to that of a random directed graph with certain degree constraints.
The following heuristic was suggested by Alex Cowan (private communication). For convenience, we work with the graph whose edges are given by (α, β) for β = (α ± 1) 2 . Let π ′ denote the stationary distribution of this walk. It's clear that the stationary distribution of the original walk can be recovered from the new one by The idea is that the degree distribution of the graph is determined, with about half the vertices having indegree 0 and half having indegree 4 (we just ignore 0, 1, and −1). Now the key observation is that the vertices with indegree 0 come in pairs, as if α is not a quadratic residue, then neither is −α. This means after removing all vertices with indegree 0, if the edges were random subject to this constraint then we have 1/4 of the vertices have indegree 0, 1/2 have indegree 2 and 1/4 have indegree 4. If we assume that the resulting graph is random, then by applying work of Cooper and Frieze [9], we can obtain a precise conjecture for the size of the support.
This conjecture agrees with computer computations done by Steve Butler, which suggests that asymptotically the support of π contains 58% of the elements in F p .

Counting solutions with exponential sum bounds
To obtain lower bounds on the Cheeger constant, after a reduction we will need to show that f (x) ± γ = y has many solutions for x ∈ S and y ∈ S ′ , where S and S ′ are a disjoint union of arithmetic progressions. This will be shown following the argument of Browning-Haynes [4], who actually proved the needed result in the case of f (x) = x −1 (the sum was restricted to F × p but this is unimportant). We give the modifications needed to generalize to the class of functions B(p, d) as well as the function f (x) = x 2 . The key is the Weil bound for exponential sums. We use the following convenient formulation given in [12,Lemma 3].
where the constant C depends only on the degree of P and Q.
We now give bounds for averaged exponential sums over intervals. A separate elementary argument is given when f (x) = x 2 . Let f ∈ B(p, d). Then for any interval I, and any k ∈ F × p , p n=1 x∈I where the constant C depends only on d.
Proof. We have p n=1 x∈I where the second equality follows from the fact that the sum over α enforces the These averaged estimates feed into an argument originally due to Heath-Brown [13], to give the following bound. We include the proof for completeness.
where the constant depends only on d.
Proof. By replacing f (x) with f (δx) (or in the case of f (x) = x 2 , absorbing δ 2 into k), we may work with intervals I j instead of arithmetic progressions. The proof then proceeds exactly as in the proof of Theorem 2 in [4] (this argument is originally due to Heath-Brown [13]). Let S(n, h) = n+h x=n+1 e p (kf (x)). Let x j denote the first element in the interval I j . Then the sum we are interested in is simply Then by Cauchy's inequality, and summing over j gives where we use the fact that the intervals are disjoint. Now pick t an integer so that 2L ≤ 2 t ≤ 4L, and for each n ∈ F p , pick l * depending on n so that |S(n, l * )| = max l≤2L |S(n, l)|. Then writing the binary expansion l * = s∈D 2 t−s , we have where v n,s = u∈D,u<s 2 s−u < 2 s .
Then we have Finally, we can sum over n and use Lemma 3.2 or Lemma 3.3 to obtain We now prove a bound for linear exponential sums over a union of intervals. for some constant C > 0.
Proof. First, note that the sum is the same if we replace the arithmetic progressions I j by intervals I j /δ, so we may assume that δ = 1 and the I j are all intervals. Let x j denote the first element in each interval I j , so I j = I 0 + x j for I 0 = {1, 2, . . . , L}. By relabeling the intervals if necessary, we will assume that the x j are ordered. Since the intervals are disjoint, if j = j ′ then |x j − x j ′ | ≥ L.
The idea is to now break the sum over k up into intervals of length p/L, and take advantage of the different scales at which the oscillations for the sums over j and where we use x∈I 0 e p (kx) ≤ min(L, p/2|k|). Now note that for any interval I ′ of length p/L, where x j − x j ′ is the representative mod p between −p/2 and p/2. But now by the spacing condition and the ordering of the x j , we have |x j − x j ′ | ≥ |j − j ′ |L (where again j − j ′ is the representative between −J/2 and J/2 mod J). Thus, and combined with (3.2) this gives Finally, this together with (3.1) gives the desired inequality.
Using the exponential bounds established in this section, we can prove the following estimate for the number of solutions to f (x) = y with x ∈ S and y ∈ S ′ when S and S ′ are the union of arithmetic progressions. Proposition 3.6. Let S ⊆ F p be a disjoint union of J arithmetic progressions I j of length L and common difference δ ∈ F × p , and let S ′ be a disjoint union of J arithmetic progressions of length L ′ and common difference δ. Suppose that f ∈ B(p, d) or f (x) = x 2 , and suppose that JLL ′ ≥ p 3/2+ε for some ε > 0. Then for large enough p, where c > 0 is some constant depending only on ε and d.

Mixing time bounds
The mixing time bounds for the cases when f ∈ B(p, d) and when f (x) = x 2 are both established using the same overall argument. This involves using Cheeger's inequality for directed graphs to reduce to a problem of counting edges between two sets, and then utilizing the results in Section 3 to show that there are many edges. The argument for the case when f (x) = x 2 is complicated by the fact that the stationary distribution does not have full support, and we explain how to work around this.
Let P be an ergodic Markov chain on X with stationary distribution π. Define the Cheeger constant h(P ) by h(P ) = min S⊆X x∈S,y∈S c π(x)P (x, y) min(π(S), π(S c )) , where S ranges over all non-trivial subsets of X. Note that if P is simple random walk on a k-regular (i.e. both in and out-degrees are k) directed graph G, then this definition reduces to where e(S, S c ) is the number of edges going from S to S c in G. When f is a bijection, our random walk falls into this case. Note also that because π is the stationary distribution, x∈S,y∈S c π(x)P (x, y) = y∈S,x∈S c π(x)P (x, y), and in particular for a random walk on a regular directed graph, e(S, S c ) = e(S c , S). Thus, we may consider edges in both directions up to a factor of 2.
The key tool for the bounds on the mixing times is Cheeger's inequality for nonreversible Markov chains. The following theorem follows from some standard facts about the spectral theory of Markov chains. See for example, page 23 of [19], noting that (I + P )/2 is lazy, and h((I + P )2) = h(P )/2 (see also [7] for the special case of random walk on a directed graph).
Theorem 4.1. Let P be an ergodic Markov chain on X with stationary distribution π. Consider the lazy chain X n with transition matrix (I + P )/2, and starting from some deterministic X 0 . Then if n ≥ 4h(P ) −2 (max x∈X log(π(x) −1 ) + 2c) for some c > 0, we have sup X 0 ∈X P(X n ∈ ·) − π T V ≤ e −c .

4.1.
Proof of Theorem 1.2. We show that in the supremum over S, the set S can be assumed to have some structure, at the cost of a constant. Since we are only interested in the order of the mixing time, the loss of a constant is okay.
First, we define a decomposition of any set S ⊆ F p into certain types of arithmetic progressions. We will call an arithmetic progression with difference d a d-AP. A 2γ-AP decomposition of S is a decomposition S = I k where the I k are 2γ-APs, the number of I k 's is minimal. Such a decomposition always exists and is unique.
It is easy to see that if S has a 2γ-AP decomposition into J 2γ-APs, then the same is true of S c .
We now show that when computing min S e(S,S c ) min(|S|,|S c |) , we may assume that there are at most |S|p −1/2−ε/2 2γ-APs in the 2γ-AP decomposition of |S|. Thus, let P denote the set of S whose 2γ-AP decomposition contains at most |S|p −1/2−ε/2 2γ-APs. .
Proof. Assume that |S| ≤ |S c |. Write S = I k the 2γ-AP decomposition. Note that each right endpoint of a 2γ-AP contributes an edge from either S to S c or from S c to S, because if x is a right endpoint, then x + 2γ ∈ S c and x and x + 2γ are both connected to f −1 (x + γ). If there were more than |S|p −1/2−ε/2 many 2γ-APs, then this would imply e(S, S c ) min(|S|, |S c |) ≥ p −1/2−ε/2 .
The 2γ-AP decomposition of sets in P have arithmetic progressions of different lengths. The following lemma lets us reduce to the case when all arithmetic progressions have the same length. Lemma 4.3. Let S = J j=1 I j ⊆ F p be a disjoint union of arithmetic progressions with common difference δ ∈ F × p . Let L = |S|/J denote the average length of the I j . Then S contains J disjoint arithmetic progressions I ′ j with common difference δ and length ⌊L/4⌋.
Proof. First, note that it suffices to prove the result for intervals. We have |I j |≥L/2 |I j | ≥ |S|/2, and by splitting the intervals I j with |I j | ≥ L/2 up into intervals of length exactly ⌊L/4⌋, we throw away at most JL/4 points. Thus, we obtain disjoint intervals of length ⌊L/4⌋, whose total length is at least |S|/4, and so there must be at least J such intervals.
We are now in a position to prove Theorem 1.2 using the bounds given by Proposition 3.6.
Proof of Theorem 1.2. By Cheeger's inequality, it suffices to show that the Cheeger constant is bounded from below by p −1/2−ε/2 . By Lemma 4.2, it suffices to give a lower bound for min S∈P e(S, S c ) min(|S|, |S c |) of order p −1/2−ε/2 (in fact when S ∈ P we'll give a constant order lower bound). Thus, take S ∈ P and assume that |S| ≤ |S c |.

4.2.
Proof of Theorem 1.8. We now sketch the adjustments that have to be made to handle the case of f (x) = x 2 . The main difficulty lies in the fact that the support of the stationary distribution supp π p is not all of F p . To adapt the argument from the bijective case, we take advantage of the fact that when p = 3 (mod 4), at least one of α and −α lies in supp π, and that π is close to uniform on its support. Consider now the random walk restricted to supp π. Since the walk enters this set after a single step, it suffices to bound the mixing time for the restricted walk. Now although the walk is not a simple random walk on a regular graph, the transition probabilities are of constant order in p and by Theorem 1.7 the stationary distribution is essentially constant up to a factor of 4. Thus, (4.1) holds up to some constant factor, and so we provide a lower bound for e(S, S c )/ min(|S|, |S c |).
We now adapt the 2γ-AP decomposition to this setting. A set S ⊆ supp π is symmetric if it is of the form S ∩ supp π for a set S ⊆ F p with S = − S. A symmetric 2γ-AP is a set of the form (J ∪ −J) ∩ supp π, where J is a 2γ-AP contained in {0, 1, . . . , (p − 1)/2}. A 2γ-AP decomposition of a symmetric set S ⊆ supp π is a decomposition S = I k where the I k are of the form I k = supp π ∩ (J k ∪ −J k ), and the J k are 2γ-APs contained in {0, 1, . . . , (p − 1)/2}, such that the number of I k 's is minimal. It is clear that this exists and is unique, and moreover the 2γ-APs J k are uniquely determined as well.
The idea is to use this as a replacement for the 2γ-AP decomposition used in the proof for f ∈ B(p, d).
Let P denote the set of subsets S ⊆ supp π which are symmetric, and whose 2γ-AP decomposition contains at most J ≤ |S|p −1/2−ε/2 2γ-APs. Proof. Assume that |S| ≤ |S c |, which can be done since e(S, S c ) and e(S c , S) differ by at most a multiplicative constant, since π is essentially constant on supp π up to a factor of 4. Note that if α ∈ S and −α ∈ S c , then as both are connected to α 2 ± γ, this contributes 2 edges from S to S c . Thus, if there are more than |S| 1/2−ε/2 such pairs, we have e(S, S c ) min(|S|, |S c |) Otherwise, replace S with the set S ′ = S ∪ {α ∈ S c | − α ∈ S}. Let n = |S ′ | − |S|, and note that n ≤ |S| 1/2−ε/2 and e(S, S c ) ≥ 2n. For large enough p, we have This implies that the minimum can be restricted to symmetric S, at the cost of the 1/16 factor.
As in the proof of Lemma 4.2, each right endpoint of a 2γ-AP appearing in the decomposition of S contributes an edge from S to S c . This is because if x is the right endpoint of a 2γ-AP J appearing in the decomposition, one of x + γ and −x − γ is a quadratic residue, and thus either x ∈ S and x + 2γ ∈ S c contributes an edge, or −x ∈ S and −x − 2γ ∈ S c contributes an edge.
With this replacement for Lemma 4.2, we can now prove Theorem 1.8.
Proof of Theorem 1.8. As in the proof of Theorem 1.2, it suffices to give a lower bound for e(S, S c )/|S| when S ∈ P and |S| ≤ |S c |. A small extension of Lemma 4.3 then reduces to sets S 1 and S 2 both of comparable size to S and S c respectively, with S 1 and S 2 being the disjoint union of symmetric 2γ-APs of lengths L 1 and L 2 respectively.
Note that if J is a 2γ-AP appearing in the decomposition of S 1 , then if x ∈ J, at least one of x and −x belong to S 1 . If there is an edge from x ∈ J to S 2 , there is also an edge from −x to S 2 , so if S 1 = (J j ∪ −J j )∩ supp π, then e(S 1 , S 2 ) ≥ e( J j , S 2 ). Further, if S 2 = (K j ∪ −K j ) ∩ supp π, then any edge landing in K j must also land in supp π, because supp π contains all elements of the form α 2 ± γ, and so e( J j , S 2 ) ≥ e( J j , K j ).
Since the sets have been written as a disjoint union of 2γ-APs, we can now apply Proposition 3.6 and proceed as before.

4.3.
A variant of the lazy chain. Here, we sketch the adjustments needed to deal with the original Markov chains appearing in [5]. In the Markov chains considered there, the function f is always applied, but ε n = 0 with positive probability. Note that the arguments in [5] only work when f is bijective, even though the Markov chain when f (x) = x 2 still makes sense, so we restrict to the setting where f ∈ B(p, d).
Theorem 4.5. Let p be a prime and d ∈ N. Let f ∈ B(p, d), and let γ ∈ F p be nonzero. Let ε i denote independent random variables uniform on {−γ, 0, γ}. Consider the Markov chain defined by Then if π denotes the uniform distribution on F p , for any ε, there is some constant C = C(ε, d) such that sup X 0 ∈Fp P(X n ∈ ·) − π T V ≤ e −C n p 1+ε .
By the arguments of Section 2 in [5], it suffices to bound the second largest eigenvalue of a symmetrization using Cheeger's inequality, which reduces to showing that e(S, S c ) min(|S|, |S c |) is large in the graph determined by the random walk P f −1 P 2 f P , where P is the transition matrix for taking a random step of either 0 or ±γ with equal probability, and f is the permutation matrix corresponding to the bijection f . Actually to make an argument later work, we instead work with P f P 2 f −1 P , which has the same (non-zero) spectrum. Now certainly, it cannot hurt to throw away edges, and so we instead consider the bijective function g(x) = f (f −1 (x) + γ), and consider the graph with edges connecting x to g(x) ± γ (we are forcing a lazy first step from P , and after appling f −1 one lazy step from P and then adding γ). Now from the arguments in Section 4.1, if the set S has a 2γ-AP decomposition with too many intervals, then e(S, S c ) will be large since every interval contributes an edge, and otherwise we can apply the exponential sum bounds of Section 3, with an analogous version of Lemma 3.2. We first need the following result to apply the Weil bound in this situation. Lemma 4.6. Let P, Q ∈ F p [x] be coprime and of degree at most p/4, and suppose that P (x)/Q(x) is not constant or linear. Then for all α, β, γ ∈ F p with β, γ = 0, the function αP (x + γ)/Q(x + γ) + βP (x)/Q(x) is not a constant function.