Bounding Fastest Mixing

In a series of recent works, Boyd, Diaconis, and their co-authors have introduced a semidefinite programming approach for computing the fastest mixing Markov chain on a graph of allowed transitions, given a target stationary distribution. In this paper, we show that standard mixing-time analysis techniques--variational characterizations, conductance, canonical paths--can be used to give simple, nontrivial lower and upper bounds on the fastest mixing time. To test the applicability of this idea, we consider several detailed examples including the Glauber dynamics of the Ising model--and get sharp bounds.


Introduction
Sampling from a complex collection of objects is a basic procedure in physics, statistics and computer science. A widely used technique, known as Markov chain Monte Carlo (MCMC), consists in designing a Markov chain on the set to be sampled such that the law of the chain converges to the desired distribution. The chain is run long enough for a sample to be picked from a good approximation of the stationary distribution. The time one has to wait in order for this approximation to be satisfactory is known as the mixing time. In practice, it is crucial that this parameter be small. See e.g. [J03] for a survey of theoretical results on MCMC.
One way to picture a Markov chain (MC) on a combinatorial structure is to think of the states as nodes and of the transitions as edges. For a chain to be implementable, the neighbourhood structure surrounding each node must be relatively simple. Under this constraint, one has to choose a set of allowed transitions that is most likely to produce fast convergence. This is usually done in a heuristic manner.
Once a graph of transitions has been chosen, there still is room for improvement. Indeed, one has some freedom in assigning transition probabilities to each edge under the requirement, however, that the stationary distribution be of the right form. It turns out that choosing appropriately those probabilities can lead to a sizable decrease in the mixing time.
In this context, Boyd et al. [BDX04] have recently observed that minimizing the mixing time of an MC on a graph of transitions with a given stationary distribution can be formulated as a semidefinite program (SDP), a wellknown generalization of linear programming to matrices. See e.g. [BV03]. This enables the numerical computation of the fastest mixing chain on a graph. Boyd et al. [BDX04] have solved numerically a number of simple examples.
A further benefit of this approach is that it provides a tight lower bound on the optimal mixing time through the dual of the SDP. In a follow-up paper, Boyd et al. [BDSX04] have used this bound to exhibit an analytic expression for the fastest chain-and prove its optimality-when the graph is made of a simple path under uniform distribution.
However, a weakness of the SDP formulation is that only small graphs can be studied thoroughly because numerical solvers run in time polynomial in the size of the graph; in practice, chains have prohibitively large state spaces. As for the dual, it is potentially useful from a theoretical point of view even for complex chains, but Boyd et al. [BDX04] give no intuitive interpretation of it, making it difficult to apply.
Our goal in this paper is to provide evidence that those shortcomings can be overcome by a simpler approach. Our claim arises from the following observation: one can obtain lower and upper bounds on the mixing time of completely specified chains by way of well-known techniques such as path coupling, conductance, canonical paths etc. [J03]; formally, those bounds are parameterized by transition probabilities. This prompts the questions: can one optimize those bounds as functions of the transition probabilities, and how close to optimum can one get by doing so?

Our results
We show through general results and examples that for well-structured problems, the above scheme can be implemented, and that it is capable of providing nontrivial, sharp bounds.
On the lower bound side, we use a standard extremal characterization to derive a general lower bound which has a simple geometrical interpretation. It consists in embedding the nodes of the graph into an Euclidean space so as to stretch the nodes as much as possible under constraints on the distance separating nodes connected by an edge. We show through convex optimization arguments that it is actually tight. The simple interpretation makes it much easier to apply than the dual SDP mentioned above. Our result is similar to a bound obtained recently by Sun et al. [SBXD04] in a different context. We also specialize the usual conductance bound to the context of fastest mixing. We apply those general results to several examples obtaining close-to-optimal lower bounds.
On the upper bound side, it seems much harder to derive useful, general results. A trivial bound can be obtained by considering any chain on the graph, e.g. a canonical Metropolis-Hastings chain, and computing an upper bound on its mixing time. But as was shown by Boyd et al. [BDX04], there can be a large (unbounded) gap between standard and optimal chains. Instead, we show through examples that one can obtain almost tight bounds by studying closely standard canonical paths arguments and minimizing the bound over transition probabilities. Put differently, our technique consists in identifying bottleneck edges and increasing the flow on them. The fact that this scheme can work on nontrivial Markov chains is not obvious a priori, and this constitutes our main result in the upper bound case. Moreover, this technique is constructive and it allows to design a chain which might be close to the fastest one. Our scheme is likely to work only on well-structured problems but, even in that case, there is no other non-numerical approach known-and the numerical approach breaks down on large-scale problems.
Our main example is the Glauber dynamics of the Ising model, a problem which is beyond the reach of the numerical SDP approach. In the case of the tree, by a judicious choice of rates at which nodes are updated, we improve the mixing time by an optimal factor.

Organization of the paper
We begin in Section 2 with a description of the setting and approach of [BDX04]. We introduce our main techniques in Sections 3 and 4. Section 5 is devoted to optimal rates of the Glauber dynamics of the Ising model.

Setting
We are given an undirected graph G = (V, E) and a probability distribution π defined on the nodes of G. We seek to sample from π and do so by running a reversible Markov chain (X t ) t≥0 on the state space V with stationary distribution π, i.e. if P = (P (i, j)) i,j∈V denotes the transition matrix of (X t ) t≥0 , we must have π(i)P (i, j) = π(j)P (j, i), ∀i, j ∈ V. We also require that the only transitions allowed are those given by edges of G, i.e. P (i, j) = 0, ∀(i, j) / ∈ E. For convenience, we assume that all self-loops are present.
The time to reach stationarity is governed by the second largest eigenvalue of P . More precisely, let n = |V| and 1 = λ 1 (P ) > λ 2 (P ) ≥ · · · ≥ λ n (P ) ≥ −1 be the eigenvalues of P . We measure the speed at which stationarity is reached by the relaxation time τ 2 (P ) = 1 1−λ 2 (P ) . See [AF04] for a thorough discussion of other related quantities. The smaller λ 2 (P )-and therefore τ 2 (P )-is, the faster (X t ) approaches π. Given this observation, it is natural to define the fastest mixing chain on (G, π) as the solution of the optimization problem min (1) In the remainder of this paper, we save the notation P ⋆ for a solution of (1)which might not be unique-and let λ ⋆ 2 = λ 2 (P ⋆ ), and τ ⋆ 2 = τ 2 (P ⋆ ). Note that our definition of fastest mixing differs slightly from that in [BDX04]. Here, we take the usual approach of ignoring the smallest eigenvalue by considering the possibility of adding a constant probability to each self-loop afterwards in order to bound the smallest eigenvalue away from −1.

Fastest mixing via SDP
The main observation in [BDX04] is that (1) is actually a semidefinite program (SDP). See e.g. [BV03] for background on convex and semidefinite programming. This observation makes possible the numerical computation of optimal transition matrices. Unfortunately, since the running time of SDP algorithms is at best polynomial in the size of the state space, this allows only to study small graphs-for which sampling is actually quite trivial. One idea put forward by Boyd et al. [BDX04] is to solve the SDP on small instances of large combinatorial problems and try and guess the structure of the optimal matrix from the results. This is the approach used in [BDSX04] to identify the optimal chain on the path. The prospect of reproducing this type of exact result in other cases seems limited.
From a theoretical point of view, an interesting consequence of the SDP formulation is the existence of a dual which can be used to give lower bounds on the optimal mixing time. Let Y * be the sum of the singular values of Y . Then, in the case of the uniform stationary distribution, the dual (of the more general version taking into account the smallest eigenvalue) has the Any feasible solution of (2) provides a lower bound on the best mixing time achievable on (G, π). Moreover, strong duality holds. In [BDSX04], this is used to prove optimality of a conjectured fastest chain when the graph is a path. Note that giving an intuitive interpretation of this optimization problem is not straightforward. This is a potential obstacle to the devising of good feasible solutions.

Lower bounds
In this section, we discuss general lower bounds on fastest mixing that can be derived from common techniques for completely specified chains. We apply our bounds to several examples.

Variational characterization
The standard lower bound for completely specified chains is based on a variational characterization of the second eigenvalue of the transition matrix. See e.g. [AF04]. To reveal the geometric flavor of our result, we will consider a more general bound. Let ψ 1 , . . . , ψ n : V → R be functions with 0 expectation under π, i.e. i∈V π(i)ψ l (i) = 0 for all l (where, as before, n is the number of nodes). For all i ∈ V, think of Ψ(i) = (ψ 1 (i), . . . , ψ n (i)) as a vector associated to node i. Therefore, Ψ(1), . . . , Ψ(n) is an embedding of the graph into R n . For each l separately, we have the inequality where Q(i, j) = π(i)P (i, j). Summing over l we get the bound where · denotes the Euclidean norm in R n . To turn the r.h.s. into a bound on 1 − λ ⋆ 2 , we maximize over Q. But note that, for ψ 1 , . . . , ψ n fixed, the r.h.s. is linear in Q so this can be expressed as the linear program (3) The dual of this linear program is 1 (4) Note the similarity with (2). Note also that we can now minimize over ψ 1 , . . . , ψ n as well to get the best bound possible. Make the change of variables w(i) = z(i) k∈V π(k) Ψ(k) 2 for all i ∈ E, assume w.l.o.g. that i∈V π(i)w(i) = 1 (one can always renormalize the Ψ's by i∈V π(i)w(i)) and take the multiplicative inverse of the objective function. This finally leads to: Moreover, this bound is tight, i.e. we have equality above.
Informally, we seek to embed the graph into R n so as to spread the nodes as much as possible under local constraints over the distances separating nodes connected by edges. The w's give some slack in choosing which edges are bound by stronger or weaker constraints. See the examples. This bound is similar to that obtained recently by [SBXD04] in a continuous-time context. There, however, the r.h.s. in the inter-node distance constraint is a fixed weight d ij (instead of w(i) + w(j)), giving rise to a quite different problem.
Proof (of tightness): This follows from convex optimization duality. To see this, we go back to formulation (4). Note that w.l.o.g., we can assume that Then, using the Gram matrix representation for symmetric positive semidefinite matrices (an n × n matrix M is symmetric positive semidefinite if and only if there is a set x 1 , . . . , x n of vectors in R n such that M ij = x T j x i ; see e.g. [HJ85]), we get the equivalent bound where A 0 indicates that A is positive semidefinite. One can check that the dual of this convex optimization problem is equivalent to minimizing the second largest eigenvalue over reversible transition matrices on (G, π).
Contrary to the standard setting, the multidimensionality of the embedding seems necessary in the fastest mixing context. In particular, plugging the eigenvector corresponding to the second largest eigenvalue of the optimal matrix as ψ 1 (with all other coordinates 0) into (5) does not necessarily give a tight bound because there is no guarantee that the optimal w's will allow enough room for a 1-dimensional embedding to spread sufficiently.

Remark 1
The above bound is actually very similar to that in the case of completely specified chains which can be reformulated as Here the "slack" takes the form of a fixed weighted average over inter-node distances. The multidimensionality turns out not to be necessary in this case.

Remark 2
The same scheme can be applied to the log-Sobolev constant.
In that case, one maximizes the entropy instead of the variance. See also [BDX04].
Remark 3 The smallest eigenvalue has its own geometry. There, the bound is the same with the term Ψ(i) − Ψ(j) 2 in the inter-node distance constraint replaced by Ψ(i) + Ψ(j) 2 . The formulation (2) is equivalent to a combination of the two geometries (smallest and second largest eigenvalues).

Conductance
As an illustration of Proposition 1 we give a simple adaptation of the conductance bound to the context of fastest mixing.
Proposition 2 Let Υ be the weighted vertex expansion of (G, π) where a ∧ b = min{a, b} and δS is the set of nodes i ∈ S c such that there is a j ∈ S with (i, j) ∈ E. We have the following bound This bound is actually folklore. It is easily derived from the usual conductance bound and is often used to obtain lower bounds on completely specificied chains. Here we give a direct proof.

Examples
This is the graph made of two n-node complete graphs joined by an edge. We denote the nodes on one side of the linking edge by 1, . . . , n and those on the other side by 1 ′ , . . . , n ′ . The linking edge is (1, 1 ′ ). The stationary distribution is uniform. The vertex expansion bound gives Υ = 1/(2n) n/(2n) = 1 n and τ ⋆ 2 ≥ n/2. To get something sharper, we appeal to our more general bound. The bottleneck in this graph is intrinsically one-dimensional, so we take all coordinates except the first one to be 0, i.e. we consider only ψ 1 . By symmetry, it is natural to map the nodes to ψ 1 (1) = −ψ 1 (1 ′ ) = x 0 and ψ 1 (i) = −ψ 1 (i ′ ) = x 1 , for i = 1, with 0 ≤ x 0 ≤ x 1 . The main insight here is that we should make the distance between 1 and 1 ′ as large as possible because that pushes away from 0 all the other points at the same time (because of the local constraints). So we take w(i) = w(i ′ ) = 0, for all i = 1, and w(1) = w(1 ′ ) = n, which gives x 0 = √ 2 2 √ n and x 1 = √ 2+2 2 √ n. Summing the squares leads to a lower bound asymptotic to ( 3 2 + √ 2)n ≥ 2.914n. In Section 4, we give an almost matching upper bound. See also [BDPX04] for a similar upper bound.

n-cycle and d-dimensional torus
In constrast to our preceding example, the n-cycle gives rise naturally to a multidimensional embedding. We let the stationary distribution be uniform. By symmetry we choose all w's equal. So all pairs of consecutive nodes have to be embedded to points at distance (at most) √ 2. Our goal of maximizing the sum of the squared norms-and the natural symmetry-leads to spreading the points evenly on a circle centered around the origin (in any 2-dimensional subspace of R n ). That is, we take all coordinates except the first two to be 0 and, numbering the nodes from 1 to n in order of traversal, we let (ψ 1 (i), ψ 2 (i)) = (R cos(2πi/n), R sin(2πi/n)), i = 1, . . . , n, for a value of R which remains to be determined. The distance between consecutive points has to be √ 2 so a little geometry suggests R = √ 2 2 sin(π/n) ≥ √ 2n 2π . Thus the lower bound is τ ⋆ 2 ≥ n 2 2π 2 , matching the relaxation time of the symmetric walk. See e.g. [AF04].

Geometric random graphs
In their analysis of random walks on geometric random graphs, Boyd et al. [BGPS04] consider, in a key step, a variant of the d-dimensional grid of the previous example. Let k be a fixed integer smaller than m. Again, our graph is made of the m d points of the d-dimensional torus Z d m (integers modulo m) with uniform stationary distribution. Two nodes (i 1 , . . . , i d ) and (j 1 , . . . , j d ) are connected by an edge if i l − j l modulo m is less or equal to k for all 1 ≤ l ≤ d (the points are at most k cells apart in every dimension). Because of the "diagonal" edges, it seems natural to collapse all nodes on a single mcycle. More precisely, we map (i 1 , . . . , i d ) to (R cos(2πi 1 /m), R sin(2πi 1 /m)). We take uniform w's. Because some edges connect nodes k steps apart, the radius (which is constrained by the fact that points connected by an edge are at most 2kπ (assume that k divides n for convenience). Thus τ ⋆ 2 ≥ m 2 2k 2 π 2 = Θ((n/D d ) 2/d ), where D d is the degree of each node and n is the number of nodes. This bound matches the lower bound in [BGPS04]. There, exact expressions for the eigenvalues of tensor products of circulant matrices and the analysis of a linear program lead to a lower bound on fastest mixing on this graph. Our geometric method is much simpler. (7) gives tight lower bounds on the symmetric walks. More generally, the lower bound in Proposition 1 applies to any completely specified chain-as do all lower bounds on fastest mixing-and it could prove useful as an alternative to the standard variational characterization when the precise details of the transition matrix appear too cumbersome.

Upper bounds
It seems difficult to give general upper bounds on fastest mixing. An obvious technique is to pick an arbitrary chain and compute an upper bound on its relaxation time. For example, one might use the canonical (max-degree like) chain defined by the transition probabilities P d (i, j) = π(j)/π * if (i, j) ∈ E (and 0 otherwise) with π * = max{ j:(i,j)∈E π(j) : i ∈ V}. Let π 0 = min i∈V π(i) and recall the definition of vertex expansion Υ from Proposition 2.
Noting that for any subset S ⊆ V, and applying the standard Cheeger inequality to P d leads to, A different chain would have provided a different-and possibly betterbound. Anyhow, this Cheeger-type bound is very unlikely to lead to useful results, and moreover it tells us nothing about the optimal chain. Instead, the goal of this section is to illustrate the computation of a nontrivial upper bound through a canonical paths argument. The underlying idea is similar to that used in the lower bound above. That is, we think of a standard upper bound for completely specified chains as parameterized by transition probabilities and attempt to minimize the bound over those probabilities. It turns out that because of its straightforward dependence on the transition matrix, the canonical paths bound appears to be the most manageable. In this section and the next one, we show by way of examples that it can actually lead to sharp results.

Canonical paths: K n − K n example continued
We consider again the K n −K n graph with uniform distribution. This chain is analyzed in details in [BDPX04], where using sophisticated group-theoreticbased symmetry analysis, all eigenvalues are computed. Here, we give a very different, much more elementary, treatment. Also, being simpler, our approach has the potential of being applicable more generally. We proceed as follows: we write down the canonical paths upper bound as a function of P ; we then choose P among π-reversible chains so as to minimize the bound. Given a set Γ of paths γ xy in G for all pairs of nodes x, y, the canonical paths upper bound is τ 2 (P ) ≤ρ(P, Γ), withρ (P, Γ) = max e γxy∋e π(x)π(y)|γ xy | Q(e) , where |γ xy | is the number of edges in γ xy . Notice that the choice of paths depends-crucially-only on the graph and is therefore valid for any transition matrix consistent with (G, π). Let W (e) be the numerator in (8). On K n − K n , the natural choice of paths is to let γ xy be the shortest path (in terms of number of edges) between x and y. Then Similar values hold for the other complete subgraph. The largest contribution to the maximum above clearly comes from W (1, 1 ′ ). In order to decrease the ratio inρ(P, Γ), we need to choose a large value for Q(1, 1 ′ ). But as we increase Q(1, 1 ′ ), the Q(1, i)'s and Q(1 ′ , i ′ )'s have to be lowered accordingly. We do so until congestion is the same on edges (1, 1 ′ ), (1, i)'s and (1 ′ , i ′ )'s. That is, we require and similarly for the other side. The solution is We extend this to all edges by The upper bound becomes τ ⋆ 2 ≤ 3n(1 − 5/(6n)). Recall that our lower bound was τ ⋆ 2 ≥ 2.9n. Note that the standard chain would have consisted in choosing a neighbour uniformly at random at each step. The same calulation gives an upper bound of Ω(n 2 ) in that case.

Remark 5 In summary, our upper bound technique consists in two steps:
identify transitions contributing to slow mixing by computing the congestion ratio in (8); then increase as much as possible the probability of transition on those bottleneck edges. Instead, one might try to use the same idea with conductance (or other upper bounds). But in that case, the fact that all cuts-instead of edges-have to be accounted for simultaneously makes the task more difficult.

Optimal rates for Glauber dynamics
In this section, we show that the framework discussed so far can be applied to large, well-structured combinatorial problems where the numerical SDP method has little chance of being helpful.

Glauber dynamics
where C is a finite set. Typically, σ is a spin or a color. We consider the following stationary distribution on C V where Z is a normalization constant and (v, w) is an undirected edge with endpoints v, w. Let S ⊆ C V be the subset of C V on which π is nonzero. We wish to sample from π by running a reversible MC on S, but allow only transitions that change the state of one node at a time, i.e. the transition graph is G = (V, E) with V = S and (σ, σ ′ ) ∈ E if and only if σ(v) = σ ′ (v) for all but at most one node v ∈ V . Let σ a v be the configuration One such "local" MC is the so-called Glauber dynamics which, at each step, picks a node v of G uniformly at random and updates the value σ(v) according to the transition probability distribution σ(w)) .
One can check that K is π-reversible. We actually consider a generalization of the Glauber dynamics by allowing the update rates to vary. More precisely, at each step, we pick a node v of G with probability ρ(v) for some distribution ρ : V → [0, 1], and we update σ(v) according to K as above. The standard chain corresponds to uniform ρ.
Predictably the question we ask is: can we compute the rates ρ minimizing the mixing time? Or at least can we get reasonable lower and upper bounds on fastest mixing in this restricted setting? We do so by following the methodology put forward in the previous sections.
We first give an elementary bound on the best achievable improvement. This observation is essentially due to [BDX04].
Proof: By the variational characterization of λ 2 (P ⋆ ) and the fact that A similar argument gives the second inequality. Thus, assume K is O(1), then the best improvement over P U one can hope for is a factor of O(|V |).
We now use a canonical paths argument similar to that in Section 4 to obtain a general upper bound on fastest mixing for Glauber dynamics.
Proposition 4 Let Γ be a set of paths γ σ,σ ′ in G for each pair σ, σ ′ in S. Assume we have a bound B v (depending only on v) on the ratio appearing in the canonical paths bound (8) for edges of the form (σ, σ a v ) in the uniform rates case. Then, Proof: The first inequality is the canonical paths bound. For the second one, note that the ratio in (8) is multiplied by (|V |B v / u B u ) −1 when replacing uniform rates withρ(v). We then apply the canonical paths bound to Pρ using the bound B v and the previous observation. Note thatρ is the choice of rates that makes all bounds on the ratio in (8) equal. The point of Proposition 4 is that optimal improvement can be attained if most B v 's are small compared to max v B v . We give such an example in the next subsection.
As shown in [KMP01], the mixing time of the Glauber dynamics on a graph depends on its cut-width.
Definition 1 The cut-width ξ(G) of a graph G is the smallest integer such that there exists a labeling v 1 , . . . , v |V | of the vertices such that for all 1 ≤ k ≤ |V | the number of edges from {v 1 , . . . , v k } to {v k+1 , . . . , v |V | } is at most ξ(G).
To use Proposition 4, we have to define the width of each node. Let I : V → {1, . . . , |V |} be some ordering of the nodes (not necessarily optimal), then we let ξ I (v) be the number of edges from {w : I(w) ≤ I(v)} to {w : I(w) > I(v)}. Let ∆ be the maximum degree of G. Then it follows from [KMP01] that a bound as required in Proposition 4 is One can try and compute v B v /|V | in special cases. A rather uninteresting graph is the s×s grid. There, a natural ordering is to start from a corner, move horizontally as far as one can, then go to the next line and start over. In this ordering, the width of most nodes, including the maximum-width node, is approximately s and therefore using non-uniform rates has essentially no effect.
Here is a more interesting example. Let T (b) r = (V r , E r ) be the complete rooted b-ary tree with r levels (the root is at level 0 and the leafs, at level r). Let n r be the number of vertices in T Proposition 5 For β large enough, an appropriate choice of rates leads to the estimate τ 2 (P ρ ⋆ ) = O n r e 4(b−1)βr , as r tends to +∞. In constrast, the best known upper bound on the uniform Glauber dynamics [KMP01] is τ 2 (P U ) = O n 2 r e 4(b−1)βr .
Proof: A good ordering of nodes of T (b) r , say I, is given by a depth-first search (DFS) traversal of the tree starting from the root. This implies that [KMP01]. Note that the width of a node v is the number of unvisited neighbours of previously visited vertices when the DFS search reaches v. Therefore, the width of the root is b. Then, say vertex v is on level 1 ≤ l < r and is the q-th child of its parent w (in the DFS traversal order). Then ξ I (v) = ξ I (w)+b−q because (1) v has b children, (2) q children of w have now been visited, and (3) all descendants of the first q − 1 children of w have been visited-so these add nothing to the width. As for nodes on level r, we have similarly ξ I (v) = ξ I (w) − q if v is the q-th child of w. Thus, the contribution to v B v of the l-th level, 1 ≤ l < r, is B (l) = B (l−1) e 4(b−1)β + · · · + e 4(0)β = B (l−1) e 4bβ − 1 e 4β − 1 ≡ B (l−1) ζ(b, β), with a similar expression for l = r. Summing over all levels, we get v B v |V | = n 2 r e (4b+2∆)β n r 1 + ζ(b, β) + · · · + ζ(b, β) r−1 + e −4bβ ζ(b, β) r = n r e (4b+2∆)β ζ(b, β) r − 1 ζ(b, β) − 1 + e −4bβ ζ(b, β) r .
In the low-temperature regime, i.e. for β large (we actually assume e 4β ≫ 1), this is Therefore, we get an optimal improvement of O(n r ) over the usual Glauber dynamics. For a lower bound, we have the following result where we assume b = 3 for convenience.
Proof: Kenyon et al. [KMP01] use recursive majority to define a cut in the space of configurations and apply the conductance bound. The recursive majority m(σ) of a configuration σ is computed as follows: set M(v) = σ(v) for all v on level r; starting from level r − 1 and up, compute M on each node by taking the majority of the values of M at the children of that node; output the value of M at the root. Let S be the set of configurations σ with m(σ) = +1. It follows from [KMP01] that, under π, the probability that a configuration is such that its recursive majority is flipped by changing the value at a fixed leaf is at most (2ǫ + 8ǫ 2 ) r−1 . The union bound and the {−1, +1} symmetry imply that π(δS c ) ≤ 3 r 2 (2ǫ + 8ǫ 2 ) r−1 and π(S) = 1 2 . By Proposition 2, we deduce λ ⋆ 2 ≥ 1 − 2(3) r (2ǫ + 8ǫ 2 ) r−1 . On the other hand, the usual conductance bound applied to the uniform case gives that λ 2 (P U ) ≥ 1 − 2Φ S , with Φ S = π(S) −1 σ∈S,τ ∈S c π(σ)P U (σ, τ ) ≤ 2(3) −r σ∈S,τ ∈S c (σ,τ )∈E π(σ) ≤ (2ǫ + 8ǫ 2 ) r−1 , where we have used that P U (σ, τ ) ≤ 3 −r for neighbours σ, τ [KMP01]. Since 3 r = O(n r ) our lower bound on fastest mixing is O(n r ) times smaller than that on the standard Glauber dynamics. Obtaining tighter bounds would require a sharper analysis in the standard setting.

Remark 6
We are not claiming that this choice of rates leads to the fastest sampling algorithm for this model. Indeed, in the case of the Ising model on a tree, a very simple propagation algorithm is much faster [EKPS00]. Rather, our point is to establish that fastest mixing analysis is feasible on nontrivial large-scale chains-a fact that was not immediate from previous works. It remains to be seen whether fastest mixing ideas will find useful applications in sampling.