A symmetric entropy bound on the non-reconstruction regime of Markov chains on Galton-Watson trees

We give a criterion of the form Q(d)c(M)<1 for the non-reconstructability of tree-indexed q-state Markov chains obtained by broadcasting a signal from the root with a given transition matrix M. Here c(M) is an explicit function, which is convex over the set of M's with a given invariant distribution, that is defined in terms of a (q-1)-dimensional variational problem over symmetric entropies. Further Q(d) is the expected number of offspring on the Galton-Watson tree. This result is equivalent to proving the extremality of the free boundary condition-Gibbs measure within the corresponding Gibbs-simplex. Our theorem holds for possibly non-reversible M and its proof is based on a general Recursion Formula for expectations of a symmetrized relative entropy function, which invites their use as a Lyapunov function. In the case of the Potts model, the present theorem reproduces earlier results of the authors, with a simplified proof, in the case of the symmetric Ising model (where the argument becomes similar to the approach of Pemantle and Peres) the method produces the correct reconstruction threshold), in the case of the (strongly) asymmetric Ising model where the Kesten-Stigum bound is known to be not sharp the method provides improved numerical bounds.


Introduction
The problem of reconstruction of Markov chains on d-ary trees has enjoyed much interest in recent years.There are multiple reasons for this, one of them being that it is a topic where people from information theory, researchers in mathematical statistical mechanics, pure probabilists, and people from the theoretical physics side of statistical mechanics can meet and make contributions.
Indeed, starting with the symmetric Ising channel for which the reconstruction threshold was settled in [4,8,9], using different methods and increased generality w.r.t. the underlying tree, there have been publications by a.o.Borgs, Chayes, Janson, Mossel, Peres from the mathematics side [12,13,11,2], deriving upper and lower bounds on reconstruction threshold for certain models of finite state tree-indexed Markov chains.From the theoretical physics side let us highlight [16] on trees (see also [7] on graphs) which contains a discussion of the potential relevance of the reconstruction problem also to the glass problem.That paper also provides numerical values for the Potts model on the basis of extensive simulation results.The Potts model is interesting because, unlike the Ising model, the true reconstruction threshold behaves (respectively is expected to behave) non-trivially as a function of the degree d of the underlying d-ary tree and the number of states q.For a discussion of this see the conjectures in [16], the rigorous bounds in [5], and in particular the proof in [17] showing that the Kesten-Stigum bound is not sharp if q ≥ 5, and sharp if q = 3, for large enough d.We refer to [1] for a general computational method to obtain non trivial rigorous bounds for reconstruction on trees.Now, our treatment is motivated in the generality of its setup by the questions raised and type of results given in [15], and technically somewhat inspired by [14,5].Indeed, for the Potts model the present paper reproduces the result of [5] (where moreover we provide numerical estimates on the reconstruction inverse temperature, also in the small q, small d regime.)However, in the present paper the focus is on generality, that is the universality of the type of estimate, and the structural clarity of the proof.It should be clear that the condition we provide can be easily implemented in any given model to produce numerical estimates on reconstruction thresholds.
The remainder of the paper is organized as follows.Section 2 contains the definition of the model and the statement of the theorem.Section 3 contains the proof.

Model and result
Consider an infinite rooted tree T having no leaves.For v, w ∈ T we write v → w, if w is the child of v, and we denote by |v| the distance of a vertex v to the root.We write T N for the subtree of all vertices with distance ≤ N to the root.
To each vertex v there is associated a (spin-) variable σ(v) taking values in a finite space which, without loss of generality, will be denoted by {1, 2, . . ., q}.Our model will be defined in terms of the stochastic matrix with non-zero entries By the Perron-Frobenius theorem there is a unique single-site measure α = (α(j)) j=1,...,q which is invariant under the application of the transition matrix M , meaning that The object of our study is the corresponding tree-indexed Markov chain in equilibrium.This is the probability distribution P on {1, . . ., q} T whose restrictions P T N to the state spaces of finite trees {1, . . ., q} T N are given by The notion equilibrium refers to the fact that all single-site marginals are given by the invariant measure α.
A probability measure µ on {1, 2, . . ., q} T is called a Gibbs measure if it has the same finite-volume conditional probabilities as P has.This means that, for all finite subsets V ⊂ T , we have for all N sufficiently large µ-almost surely.The Gibbs measures, being defined in terms of a linear equation, form a simplex, and we would like to understand its structure, and exhibit its extremal elements [6].Multiple Gibbs measures (phase transitions) may occur if the loss of memory in the transition described by M is small enough compared to the proliferation of offspring along the tree T .Uniqueness of the Gibbs measure trivially implies extremality of the measure P, but interestingly the converse is not true.Parametrizing M by a temperature-like parameter may lead to two different transition temperatures, one where P becomes extremal and one where the Gibbs measure becomes unique.Broadly speaking, statistical mechanics models with two transition temperatures are peculiar to trees (and more generally to models indexed by non-amenable graphs [10]).This is one of the reasons for our interest in models on trees.Now, our present aim is to provide a general criterion, depending on the model only in a local (finite-dimensional) way, which implies the extremality of P, and which works also in regimes of non-uniqueness.People with statistical mechanics background may think of it as an analogy to Dobrushin's theorem saying that c(γ) < 1 implies the uniqueness of the Gibbs measure of a local specification γ where c(γ) is determined in terms of local (single-site) quantities.
In fact, Martinelli et al. [15] (see Theorem 9.3., see also Theorem 9.3.' and Theorem 9.3") give such a theorem.Their criterion for non-reconstruction of Markov chains on d-ary trees has the form dλ 2 κ < 1 where κ is the Dobrushin constant [6] of the system of conditional probabilities described by P. Further λ 2 is the second eigenvalue of M .Our theorem takes a different form.Now, to formulate our result we need the following notation.
We write for the simplex of length-q probability vectors and we denote the relative entropy between probability vectors p, α ∈ P by S(p|α) = q i=1 p(i) log p(i) α(i) .We introduce the symmetrized entropy between p and α and write While the symmetrized entropy is not a metric (since the triangle inequality fails) it serves us as a"distance" to the invariant measure α.
Let us define the constant, depending solely on the transition matrix M , in terms of the following supremum over probability vectors where M rev (i, j) = α(j)M (j,i) is the transition matrix of the reversed chain.Note that numerator and denominator vanish when we take for p the invariant distribution α.Consider a Galton-Watson tree with i.i.d.offspring distribution concentrated on {1, 2, . . .} and denote the corresponding expected number of offspring by Q(d).
Here is our main result.
Theorem 2.1 If Q(d)c(M ) < 1 then the tree-indexed Markov chain P on the Galton-Watson tree T is extremal for Q-almost every tree T .Equivalently, in information theoretic language, there is no reconstruction.
Remark 1.The computation of the constant c(M ) for a given transition matrix M is a simple numerical task.Note that fast mixing of the Markov chain corresponds to small c(M ).In this sense c(M ) is an effective quantity depending on the interaction M that parallels the definition the Dobrushin constant c D (γ) in the theory of Gibbs measures measuring the degree of dependence in a local specification.While the latter depends on the structure of the interaction graph, this is not the case for c(M ).
Remark 2. Non-uniqueness of the Gibbs measures corresponds to the existence of boundary conditions which will cause the corresponding finite-volume conditional probabilities to converge to different limits.Extremality of the measure P means that conditioning the measure P to acquire a configuration ξ at a distance larger than N will cease to have an influence on the state at the root if ξ is chosen according to the measure P itself and N is tending to infinity.In the language of information theory this is called non-reconstructability (of the state at the origin on the basis of noisy observations far away).
Remark 3. (On irreversibility.)If M is any transition matrix reversible for the equidistribution and, for a permutation π of the numbers from 1 to q, we define M π (i, j) = M (i, π −1 j), then c(M ) = c(M π ) for all permutations π.This is seen by a simple computation.We can say that an irreversibility in the Markov chain which is caused by a deterministic stepwise renaming of labels (by π) is not seen in the constant.
Remark 4. (On Convexity.)For all fixed probability vectors α the function M → c(M ) is convex on the set of transition matrices which have α as their invariant distribution, i.e. αM = α.This is a consequence of the fact that, for M 1 , M 2 with αM 1 = α, αM 2 = α we have that (λM and that the relative entropy is convex in the first and second argument. This implies that, for each α, and fixed degree d, {M, αM = α; dc(M ) < 1}, for which the criterion ensures non-reconstruction, is convex.
We conclude this introduction with the discussion of two main types of test-examples.
Example 1 (Symmetric Potts and Ising model.)The Potts model with q states at inverse temperature β is defined by the transition matrix This Markov chain is reversible for the equidistribution.In the case q = 2, the Ising model, one computes c(M β ) = (tanh β) 2 which yields the correct reconstruction threshold.
Theorem 2.1 is a generalization of the main result given in our paper [5] for the specific case of the Potts model.That paper also contains comparisons of numerical values to the (presumed) exact transition values.Our discussion of the cases of q = 3, 4, 5 shows closeness up to a few percent, and for q = 3, 4 and small d these are the best rigorous bounds as of today.To see this connection between the present paper and [5] we rewrite c(M β ) = e 2β −1 e 2β +q−1 c(β, q) and note that the main Theorem of [5] was formulated in terms of the quantity c(β, q) = sup Numerical Example 2 (Non-symmetric Ising model.)Consider the following transition matrix The chain is not symmetric when 1 − δ 1 = δ 2 .Let us focus on regular trees.Mossel and Peres in [13] prove that, on a regular tree of degree d the reconstruction problem defined by the matrix ( 9) is unsolvable when while Martin in [3] gives the following condition for non-reconstructibility By the Kesten-Stigum bound it is known that there is reconstruction when d(δ 2 − δ 1 ) 2 > 1.When δ 1 + δ 2 = 1, the matrix M is symmetric and the Kesten-Stigum bound is sharp.Recently, Borgs, Chayes, Mossel and Roch in [2] have shown with an elegant proof that the Kesten-Stigum threshold is tight for roughly symmetric binary channels; i.e. when |1 − (δ 1 + δ 2 )| < δ, for some δ small.Even if the threshold we give is very close to Kesten-Stigum bound when the chain has a small asymmetry, by now, we are not able to recover this sharp estimate with our method.For large asymmetry the Kesten-Stigum bound has been proved to not hold: Mossel proves as Theorem 1 in [12] that, for any λ > 1 d there exists a δ(λ) such that there is reconstruction for δ 1 , δ 2 = λ + δ 1 when δ 1 < δ.On a Cayley tree with coordination number d, non-reconstruction for the Markov chain (9) with δ 2 = 0 (or 1 − δ 1 = 0) is equivalent to the extremality of the Gibbs measure for the hard-core model with activity . Restricted to this specific case, Martin proves a better condition than the one obtained taking δ 2 = 0 both in (11) and in (6).
Our entropy method provides a better bound than (11) and considerably improves (10) for the values of δ 1 and δ 2 giving a strongly asymmetric chain.

A computation gives
It is quite simple to compute numerically the constant c(M ); the numerical outputs and the comparisons with ( 10), (11)

Proof
We denote by T N the tree rooted at 0 of depth N .The notation T N v indicates the sub-tree of T N rooted at v obtained from "looking to the outside" on the tree T N .We denote by P N v the measure on T N v with free boundary conditions, or, equivalently the Markov chain obtained from broadcasting on the subtree with the root v with the same transition kernel, starting in α.We denote by P N,ξ v the correponding measure on T N v with boundary condition on ∂T N v given by ξ = (ξ i ) i∈∂T N v .Obviously it is obtained by conditioning the free boundary condition measure P N,ξ v to take the value ξ on the boundary.
We write To control a recursion for these quantities along the tree we find it useful to make explicit the following notion.
Proposition 3.2 Consider a tree-indexed Markov chain P, with transition kernel M (i, j) and invariant measure α(i).
Then the function is a linear stochastic Lyapunov function with center α w.r.t. the measure P for the constant (6).
Proposition 3.2 immediately follows from the following invariance property of the recursion which is the main result of our paper.Proposition 3.3 Main Recursion Formula for expected symmetrized entropy.
Warning: Pointwise, that is for fixed boundary condition, things fail and one has in general.In this sense the proposition should be seen as an invariance property which limits the possible behavior of the recursion.
Proof of Proposition 3.3.We need the measure on boundary configurations at distance N from the root on the tree emerging from v which is obtained by conditioning the spin in the site v to take the value to be j, namely Then the double expected value w.r.t. to the a priori measure α between boundary relative entropies can be written as an expected value w.r.t.P over boundary conditions w.r.t. to the open b.c.measure of the symmetrized entropy between the distributions at v and α in the following form.
Lemma 3.4 Proof of Lemma 3.4: In the first step we express the relative entropy as an expected value Here we have used that, with obvious notations, Further we have used that for x 1 , x 2 ∈ {1, . . ., q}.This gives and finishes the proof of Lemma 3.4.Let us continue with the proof of the Main Recursion Formula.We need two more ingredients formulated in the next two lemmas.The first gives the recursion of the probability vectors π N v in terms of the values π N w of their children w, which is valid for any fixed choice of the boundary condition ξ.
or, equivalently: for all pairs of values j, k we have The proof of this Lemma follows from an elementary computation with conditional probabilities and will be omitted here.
We also need to take into account the forward propagation of the distribution of boundary conditions from the parents to the children, formulated in the next lemma.Lemma 3.6 Propagation of the boundary measure.
This statement follows from the definition of the model.Now we are ready to head for the Main Recursion Formula.
We use the second form of the statement of the deterministic recursion Lemma 3.5 equation (21) to write the boundary entropy in the form using in the last step the definition of the reversed Markov chain.Finally applying the sum j,k α(j)α(k) • • • to both sides of (27) we get the Main Recursion Formula.To see this, note that the l.h.s. of (27) together with this sum becomes the r.h.s. of the equation in Lemma 3.4.For the r.h.s. of (27) we note that j,k α(j)α(k) π N w M rev (j) α(j) log This finishes the proof of the Main Recursion Formula Proposition 3.3.
for all s, for all ε > 0, and this implies the extremality of the measure P.This ends the proof of Theorem 2.1.

Definition 3 . 1
We call a real-valued function L on P a linear stochastic Lyapunov function with center p * if there is a constant c such that • L(p) ≥ 0 ∀p ∈ P with equality if and only if p = p * ; the Propagation-of-the-boundary-measure-Lemma 3.6 and (

Finally, Theorem 2 . 1
follows from Proposition 3.2 with the aid of the Wald equality with respect to the expectation over Galton-Watson trees since the contraction of the recursion and the Lyapunov function properties yield lim N ↑∞ P ξ : π N,ξ (s) − α(s) ≥ ε → 0 ,

Table 1 :
and the Kesten-Stigum bound are in table 1.For the particular pairs of values of (δ 1 , δ 2 ) we checked, the Kesten-Stigum upper bound on the non-reconstruction thresholds for asymmetric chains are quite close to our lower bounds.For δ 1 = 0.3, the Kesten-Stigum upper bound on the non-reconstruction thresholds for asymmetric chains are very close to ours.