The q-voter model on the torus

In the $q$-voter model, the voter at $x$ changes its opinion at rate $f_x^q$, where $f_x$ is the fraction of neighbors with the opposite opinion. Mean-field calculations suggest that there should be coexistence between opinions if $q<1$ and clustering if $q>1$. This model has been extensively studied by physicists, but we do not know of any rigorous results. In this paper, we use the machinery of voter model perturbations to show that the conjectured behavior holds for $q$ close to 1. More precisely, we show that if $q<1$, then for any $m<\infty$ the process on the three-dimensional torus with $n$ points survives for time $n^m$, and after an initial transient phase has a density that it is always close to 1/2. If $q>1$, then the process rapidly reaches fixation on one opinion. It is interesting to note that in the second case the limiting ODE (on its sped up time scale) reaches 0 at time $\log n$ but the stochastic process on the same time scale dies out at time $(1/3)\log n$.


Introduction
In the linear voter model, the state at time t is ξ t : Z d → {0, 1}, where 0 and 1 are two opinions. The individual at x changes opinion at a rate equal to the fraction f x of its neighbors with the opposite opinion. For the last decade physicists have studied the q-voter model, in which the flip rate at x is f q x . When q is an integer, the dynamics may be thought of as: select q neighbors of x uniformly, and change the opinion of x if all q neighbors disagree with x. However, there is no reason to restrict q to be an integer. Abrams and Strogatz [1] introduced this system in 2003 as a model of language death, and argued based on data on languages in 42 regions that q = 1.31 ± 0.25. In the physics literature there have been many studies of the system on lattices, complex networks, and even on graphs that co-evolve with the state of individuals. See [6,17,21,24,25,27,28] and references therein. According to [24], for finite but large systems, the process with q < 1 can remain in a dynamically active phase for observation times that grow exponentially with n, while for q > 1 the transition into an absorbing state is 'abrupt'.
The difference between q < 1 and q > 1 is due to the different types of frequency dependence in the two models. When q < 1, rare opinions spread more rapidly compared to the voter model, while for q > 1, they spread more slowly. A more quantitative viewpoint is provided by mean field theory. This analysis is often done by writing an equation by pretending sites are always independent of each other. Here, we will instead consider the system on the complete graph in which each site interacts equally with all the others. In this case, the frequency of 1's, u, satisfies du/dt = −u(1 − u) q + (1 − u)u q = u(1 − u)g (u) where g(u) = u q−1 − (1 − u) q−1 . This system has three fixed points: 0, 1/2 and 1.
• If q < 1, g(u) decreases from ∞ to −∞ as u increases from 0 to 1. So the fixed points 0 and 1 are unstable and the interior one is attracting. In this case it is expected that coexistence occurs.
• If q > 1, g(u) increases from −1 to 1 as u increases from 0 to 1. So the fixed points 0 and 1 are stable and the interior one is unstable. In this case it is expected that clustering occurs. That is, we will see larger and large regions occupied by one type.
For more on the heuristics that lead to these conclusions, see the 1994 paper by Durrett and Levin [12]. In most of the papers in the physics literature, the analysis is done by using the pair-approximation, which is equivalent to supposing that the state of the system is always a Markov chain.
Recently, Vasconclos, Levin, and Pinheiro [29] have considered a version of the q-voter in which the powers q 1 and q 0 for flipping to 1 and 0 can be different. They did this to study complex contagions which have been used to model the spread of idioms and hashtags on Twitter [26] and in many other situations, see the book by Centola [7]. When q 1 = q 0 , there arises situations when one opinion dominates the other, see Figure 2a in [29], but the situation with q 1 = q 0 seems to capture of all of the interesting behavior.

Voter model perturbations
The linear voter model has a rich theory due to its duality with coalescing random walk. This duality exists because the process can be constructed from a graphical representation. See Section 2.1 for details. However, the inherent asymmetry between 1's and 0's in the graphical representation makes it impossible to construct nonlinear voter models where the flip rates depend only on f x . See Section 2.2 for a proof.
To get around this difficulty, we will suppose q is close to 1 and view the system as a voter model perturbation in the sense of Cox, Durrett, and Perkins [10]. On Z d , this theory requires d ≥ 3 so that the voter model has a one parameter family of stationary distributions ν u , 0 ≤ u ≤ 1. For this and other elementary facts about the voter model that we use, see Liggett's 1999 book [23].
In general, the rate of flipping from i to j = i in a voter perturbation has the form where f j is the fraction of neighbors in state j, and h i,j (x, ξ) is the perturbation to the rate of flipping from i to j. Usually the perturbation variable is , but here it will be convenient to let = δ 2 . To simplify formulas we will assume h i,j (x, ξ) = 0 when ξ(x) = i. Here we will consider the special case in which the neighborhood has size k and the flip rate only depends on the number of neighbors n(x) in state j: for 1 ≤ n(x) ≤ k.
The r k i do not have to be nonnegative, see (1.7) in [10], but we will suppose r k 0 = 0 so that ≡ 0 and ≡ 1 are absorbing states. For simplicity, we will restrict our attention to three dimensions. In that context, we will consider neighborhoods x + N with 0 / ∈ N and |N | ≥ 3 chosen so that the group generated by N is Z 3 q-voter model. The rate at which a site x flips to 0 in the q-voter model is f q x , where f x is the fraction of neighbors with the opposite opinion. Suppose for the moment that q < 1. In this case, if we write f q x = f x + (f q x − f x ), then the term in parentheses is ≥ 0. Let q = 1 − δ 2 and write u instead of f x Then, u q − u = u u −δ 2 − 1 = u exp(δ 2 log(1/u)) − 1 ≈ δ 2 u log(1/u).
From this we see that if q < 1, then the perturbation is which vanishes when i = 0 or k. If we let q = 1 + δ 2 and again write u instead of f x , then u q − u = u u δ 2 − 1 = u exp(δ 2 log(u)) − 1 ≈ −δ 2 u log(1/u).

ODE limit
Following the approach of Cox and Durrett [8], who used the voter perturbation machinery to study evolutionary games on the torus in dimension d ≥ 3, we will consider the q-voter model in what they called the weak-selection regime. (For results in the strong selection regime see Section 1.4.) Let T n be the three dimensional torus with n points and hence side length L = n 1/3 . Let n = δ 2 n . The first thing to do is to prove convergence of the density of 1's, to the solution of an ODE. Let ρ i m denote the probability that in ν u the origin is in state i while exactly m of the neighbors are in state 1 − i. We write a n b n for positive quantities a n and b n to indicate a n /b n → 0 as n → ∞. Theorem 1. Suppose q = 1− n with n −1 n n −2/3 . If U n (0) → u 0 then U n (t) converges uniformly on compact sets to the solution of the ODE Intuitively, Theorem 1 holds due to a separation of time scales. The voter model runs at a fast rate, so when the density is u on the torus, the system has distribution ≈ ν u . The rate of change of the density can then be computed by looking at the expected rate of change when the state is ν u . Writing u for expected value with respect to ν u , the right hand side of the ODE is This result will be proved by constructing the process on a graphical representation and then defining a dual that is a coalescing branching random walk. The voter part of the process leads to a coalescing random walk. When a perturbation event occurs at a point x, the dual branches to include all of the points in x + N . This will be described in detail in Section 2.3. The proof of Theorem 1 is almost identical to the proof of Theorem 6 in Cox and Durrett [8] so we will only outline the proof, referring to [8] for details. When n n −2/3 the particles in the dual have time to wrap around the torus and come to equilibrium in between branching events. It is known that on the torus if we start two random walks from independent randomly chosen locations, then the time to coalesce is of order n. Thus the assumption n n −1 is needed for the perturbation to have an effect. Computing the r i m (u), see Section 5, leads to the following ODE Theorem 2. In the three dimensions when the neighborhood has size k, the limiting ODE is where f k (u) is a polynomial that is positive on [0, 1] and f (0) = f (1) = 1. We have + for q < 1 and − for q < 1.
When q < 1, the fixed point at 1/2 is attracting and we have Theorem 3. Suppose q = 1 − n and n ∼ Cn −a for some a ∈ (2/3, 1). There is a T 0 that only depends on u 0 , so that for any γ > 0 and m < ∞, if n is large then with high probability Here and in what follows "with high probability" means with probability → 1 as n → ∞. To prove Theorem 3, we will follow the approach of Huo and Durrett [20] who proved a similar result for the latent voter model on a random graph generated by the configuration model. Although the random graph has a more complicated geometry than the torus, the proof in that setting is simpler than the one given here, since on the graph random walks mix in time O(log n) rather that in time O(n 2/3 ). Outline of the proof of Theorem 3.
• Section 3.1 introduces a general result for proving convergence of stochastic processes to limiting ODEs, due to Darling and Norris [11], which is the key to the proofs of the persistence results for our model (and for the latent voter model). The main difficulty is to bound the difference between the drift in the density U n of the particle system and the drift in the ODE. In particular, one must prove that the drift in the density of U n , which is a function of the configuration, is almost a function of the overall density.
• In Section 3.2 we take the first step in the proof, which is to show that if 2/3 < b < a then we can ignore the perturbation on [t/ n − n b , t/ n ], i.e., the process will evolve like the voter model. This has the consequence that if there are n · u 1's at time t/ n − n b , then at time t/ n the process is close to the voter equilibrium ν u . The argument here is an improvement over the one in Section 3.1 of [20]. We use Azuma's inequality to get error estimates that are stretched exponentially small, i.e., ≤ C exp(cn −α ) with α > 0 rather than polynomial, i.e., ≤ Ct −p .
• In Section 3.3 we introduce a result about "renormalizing" the voter model, that comes from work of Bramson and Griffeath [4] in d = 3 and Zähle [30] in d ≥ 3. They show that if we consider the number of 1's in the voter model equilibrium with density λ, ξ λ , in a cube Q(r) of side r, then We use this to obtain information about a similar normalized sum T r of the number of ones in a cube of side r on the torus at time t/ n when the number of 1's at time t/ n − n b is λn. To be specific, we letS n be the normalized sum of ξ λ σ(n) (x) in the process that starts at time 0 from product measure with density λ and is run for time σ(n) = n 0.6 . We show thatS r ≤ T r ≤ S r , where T r is a small modification of T r .
• In Section 3.4 we bound the difference between T r and T r . This in turn gives us a bound on the largest coalescing random walk cluster in T r in Q(r), see (26), and a bound on the fluctuations of the density in the cubes, which is important for completing the next step.
• In Section 3.5 we bound the difference between the drifts in the particle system and the ODE. To do this, we have to show that the empirical finite distributions on the torus T n are close to the values that come from ν u . In doing this we rely on the result about the density in cubes proved in Section 3.3 to divide space at time t/ n + s n into cubes . Here s n = n (2+α)b(2)/3 with α small, so that the empirical f.d.d.'s in cubes of volume n b(3) that do not touch are almost independent. This leads to errors of size C exp(−n 1−b(3)−2α ).
• In Section 3.6 we put the pieces together to prove the result. As in Section 3.5 of [20] we do this by showing that if the density U t reaches |U t − 1/2| = 4 then with very high probability (i.e., for any k with probability ≥ 1 − n −k for large n) it will return to |U t − 1/2| ≤ before we have |U t − 1/2| > 5 . Taking δ = 5 gives the desired result In all of our estimates except those in Sections 3.3 and 3.4, the errors are bounded stretched exponentially small, so we Conjecture. When q < 1 the process persists for time exp(n β ) for some β > 0.
The could be proved with a rather small value of β if the errors in (24) and (26) could be improved to be stretched exponentially small. Readers familiar with long time survival results for the contact process, see e.g., Section 3 in part I of Liggett [23], might expect the conjecture to say survival occurs for time exp(γn) with γ > 0. However, the conjecture above cannot hold for β > 1/3. If we run time backwards from t/ n to t/ n − n 2/3 then the n initial particles in the CRW will have coalesced to n 1/3 particles. If all of these happen to land on sites in state 0 at time t/ n − n 2/3 the process will go extinct at time t/ n .

Rapid Extinction when q > 1
When q > 1, the fixed point at 1/2 is unstable while the ones at 0 and 1 are locally attracting To get rid of the constant c k in the ODE limit we consider Theorem 4. Suppose q = 1 + n and n ∼ Cn −a for some a ∈ (2/3, 1). If U n (0) = u 0 < 1/2 and α > 1/3 then P (U n (α log n) = 0) → 1 as n → ∞.
This is proved in Section 5. Much of the work for the proof of Theorem 4 has already been done in the proof of Theorem 3. Those results imply that the density in the particle system stays close to the solution of the ODE. To be precise, we can show that with high probability.

Results for strong selection
Let ξ t be a voter model perturbation on Z d with flip rates where f j is the fraction of neighbors in state j and the second term is the perturbation. As before we let n = δ 2 n . In this section we will examine the case n n −2/3 , which we call the strong selection regime.
Intuitively, the next result says that if we rescale space to δ n T n (recall T n is the three dimensional torus) and speed up time by δ −2 n , then the process converges to the solution of a partial differential equation on R 3 . The torus turns into R 3 in the limit because δ n n −1/3 while the torus has side n 1/3 . To make a precise statement, the first thing we have to do is to define the mode of convergence. To simplify the writing we drop the subscript n on δ. Given r ∈ (0, 1), let a δ = δ r−1 δ, Q δ = [0, a δ ) 3 , and |Q δ | the number of points in Q δ . For x ∈ a δ Z d and ξ ∈ Ω δ , the space of all functions from δZ 3 to S, let We endow Ω δ with the σ-field F δ generated by the finite-dimensional distributions. Given a sequence of measures λ δ on (Ω δ , F δ ) and continuous functions w i , we say that λ δ has asymptotic densities w i if for all 0 < η, R < ∞ and all i ∈ S lim δ→0 sup x∈a δ Z 3 ,|x|≤R Suppose the initial conditions ξ δ 0 have laws λ δ with asymptotic densities w i and let x) the solution of the system of partial differential equations: with initial condition u i (0, x) = w i (x). The reaction term where the brackets are expected value with respect to the voter model stationary distribution ν u in which the densities are given by the vector u.
This result is Theorem 2 in [8]. For more details see that paper. The intuition is similar to that for the ODE limit in Theorem 1. On the fast time scale the voter model runs at rate δ −2 versus the perturbation at rate 1, so the states of sites near x at time t is always close to the voter equilibrium ν u(t,x) . Thus, we can compute the rate of change of u i (t, x) by assuming the nearby sites are distributed according to the voter model equilibrium ν u(t,x) .
Cox and Durrett considered evolutionary games on the torus in d ≥ 3 with game matrix 1 + wG, where 1 is a matrix of 1's. Their w corresponds to our n . When w = 0 the system reduces to the voter model. They found convergence to an ODE when n −1 w n −2/d and convergence to a PDE when w n −2/d . Their results can be used prove a PDE limit for our system when n n −2/d . Since there are only two opinions we only need one variable u 1 , which corresponds to our u. The φ in (7) is the same as the right hand side of our ODE, which should be clear from (4).
In the case of a 2 × 2 game with a stable mixed strategy equilibrium that uses strategy 1 with probability ρ with probability ρ and strategy 2 with probability 1 − ρ, the limiting φ(u) = cu(ρ − u)(1 − u) with c > 0. Here, as in the case q < 1, the fixed point ρ is attracting. To translate Theorem 4 in [8] to our situation, we note that w = 2 L and n = L d .
Theorem 6. Suppose that n ∼ Cn −2α/3 , where 0 < α < 1, and that we start from a product measure in which each type has positive density. Let N 1 (t) be the number of sites occupied by 1's at time t. There is a c > 0 so that for any η > 0 if n is large and log n ≤ t ≤ exp(cn (1−α) ), then N 1 (t)/N ∈ (ρ − η, ρ + η) with high probability.
The intuition behind the answer is that after space is rescaled the volume of the torus is asymptotically n (1−α) . Theorem 6 is a lower bound so it does not rule out survival for time exp(cn). However, Cox and Durrett proved for the contact process with fast voting introduced by Durrett, Liggett, and Zhang [13] Theorem 7. There is a C < ∞ so that extinction in the contact process plus fast voting occurs by time exp Theorem 6 can be generalized to the q-voter with q < 1 since it only relies on the hydrodynamic limit in Theorem 5 and a block construction. Theorem 7 does not extend, because ξ ≡ 1 is an absorbing state, and this limits our ability to suddenly kill the process.

Voter model
We begin by describing the graphical representation and duality for the voter model in which the neighbors of x are x + N and N = {y 1 , . . . y k }. The state of the voter model at time t gives the opinion of the individual at x at time t. We write y ∼ x to indicate that y is a neighbor of x. In the usual voter model, the rate at which the voter at x changes its opinion from i to j is is the fraction of neighbors in state j. To study the voter model, it is convenient to construct the process on a graphical representation, introduced by Harris [18] and further developed by Griffeath [16]. For each x ∈ Z d and y ∈ x + N let T x,y m , m ≥ 1, be the arrival times of a Poisson process with rate 1/k. At the times T x,y n , n ≥ 1, the voter at x decides to change its opinion to match the one at y. To indicate this, at time T x,y n we write a δ at x and draw an arrow from y to x. To calculate the state of the voter model on a finite set, we start at the bottom and work our way up. We think of the 1's in the initial configuration as sources of fluid, the δ's as dams that block the fluid, while the arrows move the fluid in the direction indicated. Arrows from y to x arrive just after the δ. A nice feature of this approach is that it simultaneously constructs the process for all initial conditions so that if ξ 0 (x) ≤ ξ 0 (x) for all x, then for all t > 0 we have ξ t (x) ≤ ξ t (x) for all x.

Figure 3: Voter model graphical representation
To define the dual process starting from x at time t, we set ζ x,t 0 = x and work down the graphical representation. A particle stays at its current location until the first time that it encounters a δ. At this point it jumps across the edge in the direction opposite its orientation. A little thought reveals that the path of a single particle in ζ x,t s , 0 ≤ s ≤ t, is a random walk that at rate 1 jumps to a randomly chosen neighbor. Intuitively, ζ x,t s gives the source at time t − s of the opinion at x at time t. That is, The example in Figure 3 should help explain the definitions. Here we work backwards to determine the states of the two sites marked by '?'. The dark lines indicate the locations of the two dual particles. The family of particles ζ x,t s are coalescing random walks. That is, if a particle ζ x,t s lands on the site occupied by ζ y,t s , the two particles coalesce to form a single particle, and we know that ξ t (x) = ξ t (y).
To illustrate the power of duality, we analyze the asymptotic behavior of the voter model on Z d , proving a result of Holley and Liggett [19]. In dimensions 1 and 2, nearest neighbor random walk is recurrent, so the voter model clusters, i.e., does not depend on t so we drop the superscript t. Duality implies The probabilities on the left-hand side of (8) are enough to determine the distribution of the limit ξ u ∞ . Since the limit exists, it is a stationary distribution that we denote by ν u . Before moving on, we note that the duality equation can be written as where ξ A t is the voter model starting with 1's on A and ζ B t is the coalescing random walk starting with particles on B. This holds because the left-hand side is the probability of a path from A × {0} up to B × {t}, while the right-hand side is the probability of a path from B × {t} down to A × {0}. There are several types of duality. This one is called additive because ξ A∪B t = ξ A t ∪ ξ B t , a property that holds because ξ A t is defined to be the set of sites at time t that can be reached from a path starting in A.

Nonlinear voter models
Though it is tempting to try to find a duality like the one between the voter model and coalescing random walk to help analyze the q-voter model, in this section we will prove Claim. Using the graphical representation described in the previous section we cannot construct a voter model in which the flip rates depend only on the number of neighbors with the opposite opinion n x and are nonlinear.
Proof. For simplicity, we only prove the result when the neighborhood has size 4. Consulting Griffeath's book we see that the only gadgets than can be used in the graphical representation are combination of arrows and δ's. To begin, we will consider the set of processes that can be constructed by only using gadgets that have a δ at x and a number of arrows that point to x from its neighbors. We call these objects arrow-δs. Since the flip rates only depend on the number of sites, all arrow-δs with k arrows have the same rate, a k .
• When there is a 1 at x the δ will cause the 1 to flip to a 0. However, the site will only stay a 0 if all neighbors connected to x by arrows are in state 0.
• When there is a 0 at x then the δ does nothing, and the site will flip to 1 if there is at least one neighbor in state 1 connected to x by an arrow.
The number of k-arrow gadgets is k 2 so the flip rates are as follows If we add δ's with no arrows then they will flip 1s even when all their neighbors are 1. If a 2 , a 3 , or a 4 is positive the rate of flipping 1 → 0 is < the rate of flipping 0 → 1. when n x = 1, 2, 3. Adding arrows with no δs will only further increase the rates of flips 0 → 1.

Duality for voter model perturbations
In the previous section we have shown that the q-voter does not have an additive dual. In this section we will introduce a generalization of the graphical representation used in Section 2.1 that allows us to construct voter model perturbations. This idea goes back to [11]. See also Section 2 in [10]. Calculating the state of the process is not as simple as in the additive case, but it does allow us to compute the state of the process on a finite set B at time t by working backwards from time t.
Voter model perturbations have flip rates where f j is the fraction of neighbors in state j. The perturbation function h ij , j = i, may be negative (and this happens when q > 1) but in order for the analysis in [10] to work, there must be a law q of (Y 1 , . . . Y k ) ∈ (Z d ) k and a functions g i,j ≥ 0, so that for some γ < ∞, we have In our situation Y 1 , . . . Y k are k neighbors in N and g i,j , which does not depend on , is the fraction of sites x + Y 1 , . . . x + Y k in state j = 1 − i raised to the qth power. Suppose now that we have a voter model perturbation of the form (10) which satisfies (11). We construct the voter model portion as in Section 2.1. We call the arrow-δs voter events. To add the perturbation we let then we set ξ t (x) = j. The uniform random variables slow down the transition rate from the maximum possible rate r i,j to the one appropriate for the current configuration.
To define the dual, we proceed as before. When a particle encounters a δ associated with a voter event, it jumps to the other end of the arrow. When a particle encounters the head of an arrow associated with a branching event it gives birth to new particles at the other ends of all of the arrows. If either action results in two particles on the same site they coalesce to 1. Let I B,t s be the set of particles at time t − s when we start with particles on B at time t. Durrett and Neuhauser [14] called I B,t s the influence set because Lemma 1. If we know the values of ξ t−s on I B,t s , then using the graphical representation (including the associated uniform random variables) we can compute the values of ξ t in B by working our way up the graphical representation starting from time t − s and determining the changes that should be made in the configuration at each jump time.
This fact should be clear from the construction. A formal proof can be found in Section 2.6 of [10]. The computation process, as it is called in [10], is complicated, but is useful because up to time t/ n there will only be O(1) branching events affecting particles in the dual.

Prolonged persistence
In this section, we will prove Theorem 3. The key is to bound the difference between the density of the particle system and the ODE, using a result of Darling and Norris [11]. Section 3.1 describes this result and the work needed to apply it to finish the proof of Theorem 3. Sections 3.2, 3.3, 3.4, and 3.5 complete this work and Section 3.6 gives the final details.

Darling-Norris theorem
To state the result from [11] result we need to introduce some notation. Let ξ t be a continuous time Markov chain with countable state space S and jump rates q(ξ, ξ ). In our case ξ t will be the state of the q-voter model on the torus. We are interested in proving an ODE limit For each ξ ∈ S we define the infinitesimal drift We let b be the drift of the proposed deterministic limit x t . In our case and only depends on the number of neighbors k . The sign is + for q = 1 − n and − for q = 1 + n . The crucial theorem from [11] is To make this statement meaningful we need more definitions. To measure the size of the jumps we let σ θ (y) = e θ|y| − 1 − θ|y| and let The good sets Ω i , i = 0, 1, 2 are given by The parameters in these events are coupled by the following relationships. If we let K be the Lipschitz constant of the drift b and η be the upper bound on the error in the approximation by the differential equation in Theorem 8, then It is clear that our b(x) is Lipschitz continuous. Our assumption that U n (0) → u 0 implies that Ω c 0 = ∅ for large n. To bound P (Ω c 2 ), we will choose an A > 0 that works well. We begin with a useful lemma: Taking θ = log 2, we have E exp(Z log 2) = exp(λ), so using Chebyshev's inequality we have which proves the result with γ(2) = 2 ln 2 − 1.

Ignoring branching
The remainder of Section 3 is devoted to bounding P (Ω c 1 ). To begin to do this, we return to the original time scale. We defineξ s to be the same as ξ s at time s = t/ n − n b , while on the time interval [t/ n − n b , t/ n ],ξ s only has voter events, ignoring the perturbation. The value b ∈ (2/3, a) is chosen so that lineages in the dual coalescing random walk will have time to wrap around the torus but, as we will now show, the perturbation will not have much effect. LetX be the density of this new processξ.
We will now show that ignoring the perturbation changes the values of more that ηn sites with a stretched exponentially small probability.
Step 1. The number of perturbation events M in time n b is bounded by a Poisson(λ) random variable with λ = Cn 1+b+a . Lemma 2 implies that since λ ≥ Cn b .
Step 2. Let η t (x) = |ξ t (x) −ξ t (x)|, so that η t (x) = 1 means there is a discrepancy between the two processes ξ t andξ t at position x. We want to prove that x η t/ n (x) is less than ηn with a stretched exponentially small probability. To do this, note that when an edge (x, y) with η s (x) = 0 and η s (y) = 1 is hit by a voter event (that is, there is an arrival in the Poisson process T x,y or T y,x ), then the 1 is changed to 0 with probability 1/2 (when the arrival is in T x,y ) and the 0 is changed to a 1 with probability 1/2 (when the arrival is in T y,x ). Thus, the change in the number of discrepancies due to voter events is a martingale. The change is always ≤ 1 so if there are N jumps, then by Azuma's inequality If N is the number of changes due to voter events in the time interval [t/ n − n b , t/ n ], then N ≤ Poisson(n b+1 ). By Lemma 2, Note that if n 0 < 2n 1+b , then 2 exp(−z 2 /2n 0 ) < 2 exp(−z 2 /4n 1+b ). So, taking z = ηn and N = 2n 1+b , we get

Bounding the density
The results in the previous section show that on the interval [t/ n − n b , t/ n ] we can ignore the perturbation and assume that the process evolves like the voter model. To understand the distribution of 1's at time t/ n we will use results of Bramson and Griffeath [4], and Zähle [30]. The first reference only treats d = 3. The second covers d ≥ 3 and is more detailed, so we will follow it. Let ζ λ : Z d → {0, 1} have the distribution of the equilibrium of a finite range voter model on Z d with density ν λ . For an explanation of this and the other basic facts about the voter model that we will use, see Liggett's book [23]. For simplicity we will do calculations for the nearest neighbor case. The results are the same in the finite range case, but are more awkward to write since, for example, the limiting normal has a general covariance matrix, we cannot use the reflection principle, etc. To formulate the limit theorem in [30], we will write the process at a fixed time as a random field where φ is a member of a suitable class of test functions. To rescale space, we let Theorem 1 on pages 1265-1266 of [30] shows that in our nearest neighbor case where ⇒ denotes weak convergence as r → ∞, Normal(µ, σ 2 ) is a one-dimensional normal distribution with mean µ and variance σ 2 , and B is the bilinear function Restricting our attention now to d = 3, Zähle's result implies that Bramson and Griffeath [4] prove (18) by the method of moments, which gives In our situation, we need a slightly different result. In particular, these results are for the voter model on Z 3 , and we need a result for the voter model on the 3-d torus. Let where λ is the fraction of sites in state 1 at time t/ n − n b , and Q(r) is a fixed cube with side r = n β with β < 1/3. To prove a limit result for T r we will sandwich it between S r and whereζ λ σ(n) is the voter model on the torus starting from product measure with density λ and run for time σ(n) = n 0.6 . To couple this with T r we createS r by running coalescing random walks starting at time t/ n from points in Q(r) backwards in time for σ(n), and then use independent coin flips with probability λ of heads (1) and 1 − λ of tails (0) to determine the states of the sites.
(i) With stretched exponentially small probability, no coalescing random walk will move more than n 0.33 in any coordinate by time σ(n) = n 0.6 .
Proof. We will use a special case of (7.3) on page 553 in Feller volume II [15].
Taking k = n 0.6 and x = 0.03 , it follows that the probability some coalescing random walk starting inside the cube Q(r) and run for time σ(n) moves by more than n 0.33 in any coordinate is ≤ 2 · 6r 3 exp(−(1 − )n 0.06 /2).
Here the 2 comes from using the reflection principle to relate the maximum to the value at time n 0.6 , and 6 is 3 coordinates times 2 signs.
The result (i) implies that with very high probability there is no difference between the coalescing starting from Q(r) with r = n β for β < 1/3, run to time σ(n) = n 0.6 on the torus or on Z 3 .
(ii) There is a γ > 0 so that at all times t ≥ (k + 1)n 2/3 , the total variation between the distribution of a nearest neighbor random walk on the torus and the uniform distribution is ≤ (1 − γ) k .
Proof. To prove the result, we use a simple coupling. At time n 2/3 the distribution of each particle has a density that is ≥ γ/n at each point of the torus. At time n 2/3 the distribution has the form γ · µ n + (1 − γ)q n , where µ n is uniform on the torus and q n is some transition probability. Uncoupled mass at time (k − 1)n 2/3 can be coupled to the uniform distribution with probability ≥ γ at time kn 2/3 and the desired result follows.
Definition of T n . We continue the construction of T r : from the end of the construction of S r at time σ(n), we run the coalescing random walk particles on Z 3 . To assign values to the lineages at time n b we extend the configuration on the torus at that time to be periodic on Z 3 . It follows from (ii) that with very high probability there is no difference between flipping coins at time n 0.6 to determine the states of the sites in the sumS n or continuing to run the coalescing random walks on Z 3 until time n b . Having done this, we no longer perfectly reproduce T n , so we call the result T n . The good news is that when we run the coalescing random walk on Z 3 starting at σ(n), we will have T r ≺ S n . That is, the coalescing random walk clusters in T r are contained in clusters in S r .
To prove the result in (18), Zähle defines a cluster to be a set of sites that coalesce to the same limiting particle, and lets Z r,k , 1 ≤ k ≤ K(r) be the cluster sizes and lets η r,k be independently = 1 with probability λ and = 0 with probability 1 − λ. As she notes in (3.6) on page 1274, If we condition on the Z r,k , then we have a sum of independent random variables. If we let v 2 n = k Z 2 n,k , then using Lyapunov's theorem (see the bottom of page 1275) it follows that where Z is the σ-field generated by the Z r,k and χ is a standard normal. In Lemma 1 on page 1276 in [30] she shows that v 2 n converges in probability to a constant, so if we remove the conditioning we get the same limit. Lemma 2 computes the limit of Ev 2 n and (18) follows. The last argument can be applied toS n to conclude that it converges to a normal distribution. To find the limiting variance we compute x,y∈Q(r) When the coalescing random walks starting from x and y do not coalesce, the states at x and y are independent; otherwise, they are equal. Thus, if we let τ x,y be the time the two coalescing random walks hit, then the above sum is x,y∈Q(r) λ(1 − λ)P (τ x,y ≤ n 0.6 ) Using the local central limit theorem, The right-hand side gives the expected amount of time the two particles spend together. When they hit they spend an exponential rate 2 amount of time together. In addition, they will hit a geometric number of times with success probability β d . Changing variables t = |x − y| 2 /2s, dt = −|x − y| 2 /(2s 2 ) the integral becomes

Consulting Lemma 4 in [30] we find
Using the formula for c 3 it follows that the asymptotic variance forS r is the same as for S r .
Limit theorem for T r . Let X r,k ≺ Y r,k ≺ Z r,k be the cluster sizes inS r , T r , and S r . The limiting variances of the unnormalized sums are Since the top and bottom sums have the same asymptotics, this gives us the Gaussian limit theorem for T n . Replacing 2 by 2m and recalling that Bramson and Griffeath [4] proved their result for S r by the method of moments gives the desired results for T r : The last result implies so if we letD r = [λ(1 − λ)] 1/2 r 5/2 T r , (i.e., we remove the scaling) then This is the concentration result we desired for T n . Recall that T n was constructed as a slight modification of T n , which is the true rescaled and centered density that we which to prove results about.

Controlling the difference between T n and T n
The goal in this section is to generalize (24) to T r .
Bounding the number of extra coalescences in T n . When we went from the torus to Z 3 we may have eliminated some coalescence in T n at times in [n 0.6 , n b ]. For this to happen the difference in two particles positions must have wrapped around the torus, an event we call G, and the particles projected back to the torus must have hit, an event we call H. To bound this event we note that Let α = 2(1 − )/3. Lemma 4 implies that the probability G happens during [n 0.6 , n α ] is ≤ exp(−n η ) for some η > 0. On [n α , n b ], the probability that a random walk is at a fixed site is ≤ 1/n 1− . Thus, for a fixed pair of particles, If r = n b(2)/3 , then n b(2) is a trivial upper bound for the number of particles at time σ(n), which holds with probability 1. We will now estimate the number of collisions of a fixed particle with all of the others. This number is increased if we ignore coalescence, and run the particles as independent. We do this so that Lemma 5. If m ≥ 1 and a particle belongs to a cluster of size 2m or 2m + 1 with m ≥ 1 formed by coalescence during [n α , n b ], then there are at least m disjoint pairs of particles that have coalesced.
Proof. Recall that on this time interval we are running the lineages on Z 3 . We will prove the result by induction. To be able to disentangle the graph constructed by coalescence we will number the particles. Once two particles hit the two future trajectories could be assigned to either particle so we allow ourselves the liberty of be exchanging the labels at any collision. If the cluster has size 2 or 3, this is trivial. Suppose now that m ≥ 2. Locate the time t 0 at which the first two particles coalesced. Call them x and y and let t 1 be the first time after t 0 that the coalesced particle collided with another one that we call z. Remove the Y -shaped part of the genealogy leading from x and y to the coalescence at time t 1 . Label the lineage coming out t 1 the same as the one coming in on z's trajecctory. We have identified one pair of coalescing particles and reduced the number of sites in the cluster by 2, so the result follows by induction.
Given Lemma 5, our next task is to estimate the probability that m disjoint pairs will coalesce. Using the trivial upper bound n b(2) on the number of lineages, the number of coalescing pairs is N ≤ Binomial(n 2b(2) , Cn b /n 1− ).
Note that this bounds the number of coalescing pairs that coalesce in the system, not just those that form one cluster. The expected number is Cn b+2b(2)+ −1 , where b is larger than 2/3 and can be assumed to be ≤ 0.7. If b(2) ≤ 0.1, then −ν = b + 2b(2) + − 1 < 0 when < 0.5. In this case, Bounding the size of clusters inŜ r . Formula (19) tells us that . From this we see that when r is large Combining (25) and (26) we see that if Y r,k are cluster sizes in T n , then Combining (25) with k = m/2ν and (27) we see that the combined size of the clusters in T n but not in T n is Using this with (24) and letting D r = [λ(1 − λ] 1/2 r 5/2 T n it follows that Suppose r = n b(2)/3 where 0 < b(2) < 1, then Now, partition the torus into cubes of side n b(2)/3 . Letting N i be the number of 1's in the ith cube we have For fixed β > 0, given a k < ∞ we can pick m large enough then the right hand side is ≤ n −(1−b(2))−k . Then we have,

Bounding the difference in the drifts
Thus far we have been concerned with the overall density of particles on the torus. However, to successfully bound P (Ω c 1 ) we need to show that if u is the density of ones in the voter model at time t/ n − n b , then the empirical finite dimensional distributions on the torus are close to those of the voter model equilibrium ν u at time t/ n + s n , where The reasoning for introducing this extra time s n is described below. For x, y 1 , . . .
be a finite dimensional event. For simplicity, we do not display the dependence on the sites y and the states i. The first step is to partition the torus at time t/ n into boxes with side r = n b(2)/3 . Using (30), we can conclude that with high probability the density in each box is close to u, the density of 1's at time t/ n − n b . We divide the torus at time t/ n + s n into cubes with side (2). The β in the time guarantees that if we work backwards from time t/ n + s n to t/ n , the probability a random walk particle will move by an amount much larger than n b(2)/3 , the size of the boxes at time t/ n , is stretched exponentially small. See Lemma 4. As in [14] and [10] this implies the conditional distribution of the position given that the lineage ends in a specific box is almost uniform, and hence the probability it lands on a 1 will be close to u. A second consequence is that Lemma 6. With very high probability, the empirical finite dimension distributions at time t/ n + s n will be close to ν u (G x,y,v ).
Proof. To see this, note that we compute the probabilities of finite dimensional sets in the voter model equilibrium ν u by starting the CRW with points at y 0 , . . . y m , and running time to s n . The particles that coalesce are a partition of the original set. We then flip a coin with a probability u of heads (state 1) to determine the states. Here we are only running time to s n so our partition is finer, but the final particles are roughly independent and uniform on the torus so whether they land on 1 or 0 are roughly independent coin flips.
The last paragraph shows that probabilities of the f.d.d.'s are close to the voter model equilibrium ν u . This enables us to conclude that the expected value of the drift of our process when the density is x is close to b(x). The next step is control the fluctuations about the mean. Using normal tail bounds on random walks in Lemma 4, it follows that if B n is the event that some coalescing random walk at time t/ n + s n moves by more than n b(3)/3 in time s n , then for any γ > 0 we have for large n Figure 5: Picture summarizing the proof. Here s n = n (2+β)b(2)/3 . The words at the top indicate the quantity that is "good" at each time, i.e., close to its average value on the cubes. The dark line at time t/ n shows the interval in which we will with high probability find the lineage of the black dot when it is worked backwards in time.
For the last inequality to be useful we need to choose β so that 2b(3) − (2 + β)b(2) > 0. The estimate in (32) implies that the states of sites in cubes in the decomposition at time t/ n + s n that do not touch are independent on B c n . We can divide our collection of cubes into 27 subcollections C i of size n 1−b(3) /27 so that no two cubes in the subcollection touch. For 1 ≤ i ≤ 27, let N i be the number of times G x,y,v occurs in the union of the cubes in C i , let N i,j be the number of times G x,y,v occurs for x in the jth cube in C i . If x is close to the edge of the cube then some of the x + y i may be outside. However, the y i are fixed, so for large n they will at worst be in an adjacent cube.
For fixed i, the N i,j are independent on the event B c n , and 0 using the independence of the N i,j across j. So, we have Since we do not know much about ψ i,j (θ), we will let η n = n −α , and later choose θ n so that lim n→∞ θ n = 0. Expanding log ψ i,j around 0: .
When θ = 0, we have ψ i,j (0) = 1 by definition, and also So, if θ i,n → 0, then we have the approximation Since X i,j ∈ [−ρ i,j , 1 − ρ i,j ] and EX i,j = 0, To optimize the bound in (33) we d/dθ the term in square brackets in (33) to get which says we want to take . This gives the following large deviations bound since 2τ i ≤ 1. The same reasoning can be used to get a bound on the other deviation. Since we have expanded the moment generating function around 0 the bound is the same, giving the final result

and then use the triangle inequality to get
The last task is to relate this to the difference of the drifts. To do this, we note that Let p n x,y,v be the probability of G x,y,v when we work backwards in the coalescing random walk starting from x, x + y 1 , . . . x + y k then we have In the three neighbor case we only have to consider: y 1 = e 1 , y 2 = e 2 , and y 3 = e 3 . When there are more neighbors, we have to consider a number of other possibilities, see the calculations in Section 5. Let r(v) = r(v 0 , v 1 , v 2 , v 3 ) be the jump rate of vertex x when the states are v i . Multiplying by r(v), summing over the relevant values of y, v we have The choice of s n guarantees that as we work backwards in time the particles in the CRW move by an amount n b (3) . The bound in (30) implies that each particle in the CRW lands on a 1 with probability close to u. It follows that with very high probability. The bounds derived above only works for fixed t. However, it is easy to extend them so that they hold uniformly on [0, t 0 ] and hence are valid for the integral. To do this, we subdivide the interval into subintervals of length 1/n 1/2 n . Within each interval the probability there are more than 2n 1/2 flips is ≤ exp(−c √ n). If we add this to previous error probability and multiply by the number of subinterval we still have a result that holds with very high probability.

Final details
To get long time survival, we will iterate. Let T 0 = inf{t : |x t − 1/2| < η} and note that x t is the solution of the ODE so this is not random. Theorem 8 implies that |X(T 0 ) − 1/2| ≤ 2η with very high probability. Let and note that on [T 0 , T 1 ] we have |X t − 1/2| ≤ 4η. There is a constant t η so that if x(0) = 1/2 + 4η or x(0) = 1/2 − 4η then |x(t η ) − 1/2| ≤ η. Let S 1 = T 1 + t η . Since T 1 is random, S 1 is a random time. However, due to the Markov process, we can translate time to apply Theorem 8 again. That is, considerX t := X t+T 1 . Then since |X 0 − 1/2| = 4η, Theorem 8 implies that with high probability |X tη − 1/2| = |X( We can with high probability iterate the construction n k times before it fails. Since each cycle takes at least t 0 units of time, taking η = γ/5 the proof of Theorem 3 is complete.

Rapid extinction for q > 1
In this section we will prove Theorem 4. There are two steps to the proof. First, we use the results in Section 4 to show that the fraction of 1'sin the random process is close to solution of the ODE until time τ = min{t : where b(0) will be defined in the proof of Lemma 7. The second step is to prove that when we start with ≤ n b(0) ones, then fluctuations in the voter model will cause it to hit 0 in time ≤ Cn b(0) . This time is < n b for large n, so by results in Section 3.2, it is legitimate to assume that the process acts like the voter model. The proof for the second step is based on a Green's function calculation and estimates for the rate of change of the number of ones in the voter model.

First step
Lemma 7. Suppose X 0 < 1/2 and let τ be defined in (36). Then, for any η > 0, as n → ∞, Proof. We use (30) from Section 3.4. If X 0 = u and we divide the torus at time t/ n into boxes of side r = n b(2)/3 , then taking m large in (30) gives for any β > 0 and k < ∞. Since u 1/2 > (u(1 − u)) 1/2 , we can change this to For this estimate to be useful, we need u u 1/2 n −b(2)/6+β which is equivalent to u n −b(2)/3+2β . If b(2) is close to 1 and β is small, we can define b(0) by so that b(0) < min{b, 1 − α} where α > 1/3 is the quantity from Theorem 4. Combining these estimates and using results from the previous section we have that if x 0 < 1/2 and η > 0 then as n → ∞ P (|X t − x t | ≤ ηx t for all t ≤ τ ) → 1.

Lemma 7 follows.
This result shows that the number of 1's gets driven to ≤ (1 + )n b(0) at the deterministic time τ . To complete the process of extinction we will rely on fluctuations in the voter model.

Green's function calculation
To motivate the calculation in the next lemma we note that the voter model is a time change of simple random walk.
Lemma 8. Let S t be continuous-time simple random walk on {0, . . . , n} with jump-rate r(j) at position j. Let 0 < x < z ≤ n be integers, and T 0,z the first time that S t hits 0 or z. Then, 2xy zr(y) .
Since P x (T z < T 0 ) = x/z, this is enough to bound the extinction time if x/z → 0.
Proof. First consider the embedded discrete-time chain of S t . For 0 ≤ y ≤ z, let N x (y) be the number of times the random walk visits y before hitting 0 or z, starting from position x. Consider the Green's function Fix y and write g(x) = G 0 (x, y). Then we have that g satisfies 1)) , x = 0, y, z g(y) = 1 + 1 2 (g(y + 1) + g(y − 1)) g(z) = 0 . From this it is clear that g should be linear and increasing on [0, y] and linear and decreasing on [y, z]. That is, To satisfy the conditions for g(x) and g(y), the constants must be The walk will spend an average of 1/r(y) units of time at position y before jumping. Thus, if G(x, y) is defined to be the expected amount of time the continuous time walk spends at y, started from x, before hitting 0 or z, we have: Thus, the expected total time before being absorbed, started from x, is 2xy zr(y) , which establishes (39)

Boundary size calculations
To use (39) to bound the extinction time, we need to understand the size of the boundary of the voter model: ∂ξ = {{x, y} : x ∼ y, ξ(x) = ξ(y)}. Here x ∼ y means that x and y are neighbors and {x, y} is the un-oriented edge that connects them. For a voter model configuration ξ, let |ξ| = x ξ(x) be the number of 1s. The next result gives trivial upper and lower bounds on |∂ξ| when |ξ| = k: Using (39), we see that if x = n p and z = n q for some 0 < p < q < 1, then for r(y) = y, If p = b(0) and q > p, this gives us what we want, an extinction time n b . On the other hand, if we use the lower bound and plug in r(y) = y 1/3 , then If we take x = n b(0) and z = n c then this is ≤ Cn 5b(0)/3 , which is much longer than the interval of length n b over which the process behaves like the voter model. Combining (40) and (42) gives Lemma 9. If x = n p with p < 3b/5 and z = n q with q > p and 2p/3 + q < b then This will let us show that the time spent at small values of |∂ξ t | can be ignored. For larger values, we need a more precise statement about the size of the boundary. This has been done by Cox, Durrett, and Perkins [9], in order to show that in d ≥ 2 the rescaled voter model converged in distribution to super-Brownian motion. This was later used by Bramson,Cox,and LeGall [3] to prove a result for the voter model in d ≥ 3 started at 0. See Theorem 4 on page 1012 in [3].
To prepare for stating our lemma we describe the result from [9]. They use a general probability kernel p(z). In our case p(z) = 1/6 for the nearest neighbors of 0.
If ξ t (x) = 0 we set V t (x) = 0. This part of the definition is not really needed in the statement since X N s is supported by points on the rescale lattice in state 1. On page 202 of their result you find the following result.
(I1) There is a finite γ > 0 so that for all φ ∈ C ∞ 0 (R d ) and T > 0 Here X N t is the voter model with space scaled by √ N and time scaled by N and turned into a measure by assigning mass 1/N to states in state 1, see (1.4), and V N,s (x) is a suitably rescaled version of V t (x). The formula on page 202 has V because they want to write the formula so that it is valid for d = 2 and d ≥ 3.
In our situation γ = 2dβ d . However, in this proof we need control on the size of the error. The reader should think of s as a point in the time interval [t/ n − n b /2, t/ n ] over which our process behaves like the voter model. This result is often known as Doob's h-transform. Since the lineage will wrap around the torus in the remaining ≥ n b /2 units of time, the ratio is close to 1 and can be ignored. For each neighbor y of an x with ξ t (x) = 1, let V x,y = 1 if it does not coalesce with x by time r and 0 otherwise. For any α > 0, if k is large and the density of 1's is u which is small then Here we are using the hydrodynamic limit Lemma 6 to conclude that the distribution of the process is close to ν u at time r.
Let W x = y∼x V x,y , µ(x) = y∼x EV x,y , and where Σ x is short for x:ξ 0 t (x)=1 . Arguments in Section 3.5 imply that if |x − x | > s then the correlation between W x and W x is small enough to be ignored so since |W x | ≤ 6 and for a given x there are at most Cr 3 values of y with |x − y| ≤ r. If we use Chebyshev's inequality If α < 1/10 this gives the desired result.

Extinction time
The results about the boundary of the voter model can now be applied to the Green's function calculation to get the result Lemma 11. Consider the voter model started with configuration |ξ 0 | = x and let T 0,z be the first time the configuration hits 0 or z. If x = n b(0) and z = n c with c > b(0) then Proof. We can divide the sum in (39) into the pieces where Lemma 9 can be applied. That is, define x = n p < x so that p < 3b(0)/5 and 2p/3 + c < b(0). Then, The first term is less than a constant times n b(0) by Lemma 9. To bound the second hitting time, we use (41) and Lemma 10 to conclude that the expected amount of time when |∂ξ s |/|ξ s | is not within of 2dβ d is which finally completes the proof.
Theorem 4 now immediately follows: apply Lemma 7 to get that U n (α log n) < n −(1−b(0)) with high probability. Next, use Section 3.2 so that with high probability we can assume the q-voter model only experiences voter branching events for the remainder of the time. Lemma 11 then proves that with high probability the unscaled voter model started with n b(0) occupied sites will hit 0 or n c in an additional time of Cn b(0) . The probability that the process hits 0 first is simply (n c − n b(0) )/n c → 1. Since b(0) > 2/3, this additional time is o(1) for the time-scaled process U n (t). Thus, P (U n (α log n) = 0) → 1 as n → ∞.

Computing the perturbation
In this section, Theorem 2 is proved. Recall Theorem 1 state that the limiting ODE for the model with a k-sized neighborhood is where ρ i m (u) is the probability under the voter model equilibrium ν u that the origin is in state i and a exactly m of the neighbors are in state 1 − i. In this section, we analyze these quantities. Before stating the proof for a general k, we first show an explicit proof for a neighborhood of size 3 to give a flavor of how the individual terms are computed, while introducing some necessary notations in an organic manner.

k=3
To compute ρ 0 i we have to compute the coalescence fate of 0, e 1 , e 2 , e 3 . There are 7 possibilities one 0 ; 3 1: 2 2 ; 1 3; 0 two 0; 2, 1 1: 1, 1 three 0; 1,1,1 The first number in each string gives the number of neighbors that coalesce with 0. The others give the size of the limiting coalescing clusters formed by the remaining neighbors. The word at the beginning of the row is the number of numbers after the semi-colon. We can ignore 3; 0 because in that case all the neighbors have the same state as 0.
Let ρ 0 i be the probability that in the voter equilibrium ν u the origin is 0 while exactly i of the neighbors are 1. Factoring out the probability the origin is we have ρ 0 i = (1 − u)q i (u).To compute the q i (u) we use the following table.
• The coefficients of u come from the "one" terms.

General k
In this case we have to compute the coalescence fate of 0 with k neighbors. Again ρ 0 i = (1 − u)q i (u), where the functions q i (u), i ≤ k − 1 defined as before are polynomials with terms of the type u a (1 − u) b . First let us look at the difference ∆ a,b (u) of these terms, where ∆ a,b (u) = ρ 0 i − ρ 1 i = u a (1 − u) b+1 − u b+1 (1 − u) a . Note that ∆ a,b (u) = 0 if a = b + 1. In the case a ≤ b we have To see the last step write 1 − 2u = (1 − u) − u and the telescope the sum. In the case a > b + 1 Since n j=0 u j (1 − u) n−j > 0 on [0, 1] we have that 0, 1 and 1/2 are the only roots of ∆ a,b (u). Also note that ∆ a,b (u) = −∆ b+1,a−1 (u). We claim where f (u) is a positive polynomial in u with no real roots. To prove this, given a coalescence fate s 0 ; s 1 , s 2 , s 3 , · · · , s j where j s j = k we look at number of ways to obtain a clusters with opinion 1 (which gives the coefficients of the terms u a (1 − u) b , a > b + 1) and compare it with the number of ways to obtain b + 1 clusters with opinion 1 (which gives the coefficients of the terms u b+1 (1 − u) a−1 ).
First, suppose b = 0 and a ≥ 2. Let s 0 be the number of neighbors that have coalesced with 0, and s 1 , s 2 , · · · , s a be the sizes of the limiting coalescing clusters formed by the rest of the neighbors, where we assume that the sizes are arranged in an increasing order, i.e., s 1 ≤ s 2 ≤ · · · ≤ s a . The coefficient of ∆ a,0 (u) in φ(u) is given by r s 1 +···+sa p s 0 ;s 1 ,··· ,sa (Since all the clusters have opinion 1, there is only one way to choose). Similarly the coefficient of ∆ 1,a−1 (u) in φ(u) is given by (r s 1 + · · · + r sa )p s 0 ;s 1 ,··· ,sa (Since exactly one of the clusters has opinion 1, there are a different choices, the coefficient of each of the clusters needs to be added individually).
Since ∆ a,0 (u) = −∆ 1,a−1 (u), if we only look at terms of the type ∆ 1,a−1 (u)p s 0 ;s 1 ,··· ,sa (which is non-negative) in φ(u), we get a non-negative polynomial in u with no roots other than 0, 1 and 1/2. Now suppose b = 0 and a ≥ b + 2. As explained in the previous case, let s 0 be the number of neighbors that coalesce with 0, and s 1 , s 2 , · · · , s a+b be the sizes of the limiting coalescing clusters formed by the rest of the neighbors, where we assume that the sizes are arranged in an increasing order, i.e., s 1 ≤ s 2 ≤ · · · ≤ s a+b . There are a+b a ways of choosing a clusters out of the a + b clusters. Denote the total size of each of these clusters by x i , where 1 ≤ i ≤ a+b a , where wlog we assume that the sizes are arranged in an ascending order. The coefficient of ∆ a,b (u) in φ(u) is given by p s 0 ;s 1 ,s 2 ,··· ,s a+b ( a+b a ) i=1 r x i . Given 1 ≤ i ≤ a + b, the number of clusters in which cluster s i has opinion 1 is given by a+b−1 a−1 . Hence the total size of all the clusters, where a of them have opinion 1, is given by x i = a + b − 1 a − 1 (s 1 + s 2 + · · · + s a+b ) .
Using a similar argument there are a+b b+1 ways of choosing b + 1 clusters out of the a + b clusters. Denote the total size of each of these clusters by y i , where 1 ≤ i ≤ a+b b+1 , where wlog we assume that the sizes are arranged in an ascending order. The coefficient of ∆ b+1,a−1 (u) in φ(u) is given by p s 0 ;s 1 ,s 2 ,··· ,s a+b ( a+b b+1 ) i=1 r y i . Given 1 ≤ i ≤ a + b, the number of clusters in which cluster s i has opinion 1 is given by a+b−1 b = a+b−1 a−1 . Hence the total size of all the clusters, where b + 1 of them have opinion 1, is given by a − 1 (s 1 + s 2 + · · · + s a+b ) .
For ease of notation, let us denote a+b a by n and a+b b+1 by m. Then m > n since Since n i=1 x i = m i=1 y i , and the x i s as well as the y i s are arranged in ascending order, we have x i > y i + m − n, for 1 ≤ i ≤ n.

Now using the definition of r
r k y i . Now using the above inequality along with the fact that ∆ a,b = −∆ b+1,a−1 , if we only look at terms of the type ∆ b+1,a−1 (u)p s 0 ;s 1 ,··· ,s a+b (which is non-negative) in φ(u), we get a nonnegative polynomial in u with no roots other than 0, 1 and 1/2. This proves Theorem 2 for q < 1.
where f k (u) is a strictly positive polynomial in u.
Proof. Recalling the perturbation from (1) and (2), note that the perturbation when q > 1 has the same value as the perturbation when q < 1 but with the opposite sign. This along with the above work proves the corollary.