Reconstruction on Trees: Exponential Moment Bounds for Linear Estimators

Consider a Markov chain $(\xi_v)_{v \in V} \in [k]^V$ on the infinite $b$-ary tree $T = (V,E)$ with irreducible edge transition matrix $M$, where $b \geq 2$, $k \geq 2$ and $[k] = \{1,...,k\}$. We denote by $L_n$ the level-$n$ vertices of $T$. Assume $M$ has a real second-largest (in absolute value) eigenvalue $\lambda$ with corresponding real eigenvector $\nu \neq 0$. Letting $\sigma_v = \nu_{\xi_v}$, we consider the following root-state estimator, which was introduced by Mossel and Peres (2003) in the context of the"recontruction problem"on trees: \begin{equation*} S_n = (b\lambda)^{-n} \sum_{x\in L_n} \sigma_x. \end{equation*} As noted by Mossel and Peres, when $b\lambda^2>1$ (the so-called Kesten-Stigum reconstruction phase) the quantity $S_n$ has uniformly bounded variance. Here, we give bounds on the moment-generating functions of $S_n$ and $S_n^2$ when $b\lambda^2>1$. Our results have implications for the inference of evolutionary trees.


Introduction
We first state our main theorem.Related results and applications are discussed at the end of the section.
Basic setup.For b ≥ 2, let T = (V, E) be the infinite b-ary tree rooted at ρ. Denote by T n the first n ≥ 0 levels of T .Let M = (M ij ) k i,j=1 be a k × k irreducible stochastic matrix with stationary distribution π > 0. Assume M has a real second-largest (in absolute value) eigenvalue λ and let ν = 0 be a real right eigenvector corresponding to λ with k i=1 π i ν 2 i = 1.
Let [k] = {1, . . ., k}.Consider the following Markov process on T : pick a root state ξ ρ in [k] according to π; moving away from the root, apply the channel M to each edge independently.Denote by (ξ v ) v∈V the state assignment so obtained and let for all v ∈ V Reconstruction.In the so-called "reconstruction problem," one seeks-roughly speaking-to infer the state at the root from the states at level n, as n → ∞.This problem has been studied extensively in probability theory and statistical physics.See e.g.[EKPS00] for background and references.Here, we are interested in the following root-state estimator introduced in [MP03].For n ≥ 0, let L n be the vertices of T at level n.Consider the following quantity (1) It is easy to show that for all n ≥ 0 that is, S n is "unbiased."Moreover, it was shown in [MP03] that in the so-called Kesten-Stigum reconstruction phase, that is, when bλ 2 > 1, it holds that for all n ≥ 0 max Main results.For n ≥ 0, i = 1, . . ., k, and ζ ∈ R, let and We prove the following.
Theorem 1 (Exponential Moment Bound) Assume M is such that bλ 2 > 1.Then, there is c = c(M ) < +∞ such that for all n ≥ 0, i = 1, . . ., k, and ζ ∈ R, it holds that The proofs of Theorem 1 and Corollary 1 can be found in Section 2.
Related results.Moment-generating functions of random variables similar to (1) have been studied in the context of multi-type branching processes.In particular, Athreya and Vidyashankar [AV95] have obtained large-deviation results for quantities of the type (in our setting) n , . . ., Z n ) is the "census" vector, that is, However, note that we are interested in the degenerate case w = ν ⊥ π (see e.g.[HJ85]) and our results cannot be deduced from [AV95].Note moreover that our bounds cannot hold when bλ 2 < 1.Indeed, in that case, a classical CLT of Kesten and Stigum [KS66] for multi-type branching processes implies that the quantity converges in distribution to a centered Gaussian with a finite variance (independently of the root state).See [MP03] for more on the Kesten-Stigum CLT and its relation to the reconstruction problem.
Motivation.The motivation behind our results comes from mathematical biology.More particularly, our main theorem has recently played a role in the solution of important questions in mathematical phylogenetics, which we now briefly discuss.
As mentioned above, the quantity S n arises naturally in the reconstruction problem as a simple "linear" estimator of the root state [EKPS00,MP03].In the past few years, deep connections have been established between the reconstruction problem and the inference of phylogenies-a central problem in computational biology [SS03,Fel04].A phylogeny is a tree representing the evolutionary history of a group of organisms, where the leaves are modern species and the branchings correspond to past speciation events.To reconstruct phylogenies, biologists extract (aligned) biomolecular sequences from extant species.It is standard in evolutionary biology to model such collections of sequences as independent samples from the leaves of a Markov chain on a finite tree where ℓ is the sequence length.The goal of phylogenetics is to infer the leaflabelled tree that generated these samples.In particular, developing reconstruction techniques that require as few samples as possible is of practical importance.An insightful conjecture of Steel [Ste01] suggests that the reconstruction of phylogenies can be achieved from much shorter sequences when the reconstruction problem is "solvable," in particular in the Kesten-Stigum reconstruction phase.This conjecture has been established in the binary symmetric case (equivalent to the ferromagnetic Ising model), that is, the case k = 2 and M symmetric, by Mossel [Mos04] and Daskalakis et al. [DMR09].The main idea behind these results is to "boost" standard tree-building techniques by inferring ancestral sequences.See [Mos04,DMR09] for details.Establishing Steel's conjecture under more realistic models of sequence evolution (i.e., more general transition matrices M ) is a major open problem in mathematical phylogenetics.Roughly, to reconstruct a phylogeny from samples at level n one iteratively joins the most correlated pairs of nodes, starting from level n and moving towards the root.To estimate the correlation between internal nodes u and v on level m < n using only (2) it is natural to consider quantities such as where L u n is the set of nodes on level n below u.In words, we estimate the correlation between the reconstructed states at u and v. Proving concentration of such quantities necessitates uniform bounds on the moment-generating functions of S n and S 2 n -our main result.We note in particular that our main theorem was recently used by Roch [Roc09], building on [Roc08], to prove Steel's conjecture for general k and reversible transition matrices of the form M = e tQ in the Kesten-Stigum phase.Moreover, this result was established using a surprisingly simple algorithm known in phylogenetics as a "distance-based method," thereby contradicting a conjecture regarding the weakness of this widely used class of methods.See [Roc08] for background.
Organization.The proof of our results can be found in Section 2.

Proof
We first prove our main theorem in a neighbourhood around zero.
Lemma 1 Assume M is such that bλ 2 > 1.Then, there is c ′ = c ′ (M ) < +∞ and ζ 0 ∈ (0, +∞) such that for all n ≥ 0, i = 1, . . ., k, and |ζ| < ζ 0 , it holds that Proof: We prove the result by induction on n.For n = 0, note that so the first step of the induction holds for all c ′ > 0 and all ζ ∈ R. Now assume the result holds for n > 0 with c ′ and ζ 0 to be determined later.For n ≥ 0, i = 1, . . ., k, and ζ ∈ R, let Let α 1 , . . ., α b be the children of ρ and, for ω = 1, . . ., b, denote by L ω n+1 the descendants of α ω on the n + 1'st level.For ω = 1, . . ., b, let Note that conditioned on ξ ρ , the random vectors , are independent and identically distributed.Hence, the variables S 1 n+1 , . . ., S b n+1 , are also conditionally independent and identically distributed.Applying the channel to the first level of the tree and using the induction hypothesis, we have for where we used that by assumption . By a Taylor expansion, as ζ 0 goes to zero (in particular ζ 0 < 1), we have Choose c ′ > 0 large enough so that Note that c ′ is well defined when bλ 2 > 1.Then there is The following lemma deals with values of ζ away from zero.
Let n ≥ 0 and ζ with |ζ| ≥ ζ 0 be fixed.Note that, when we relate the exponential moment at level m to that at level m − 1 with a recursion as in the proof of Lemma 1, the value of ζ is effectively divided by bλ.Therefore, there are two cases in the proof: either we reach the interval (−ζ 0 , ζ 0 ) by the time we reach m = 0 in the recursion; or we do not.
that is, we do not reach (−ζ 0 , ζ 0 ).We prove the result by induction on the level m = 0, . . ., n.At m = 0, we have , by ( 5) and ( 6) for all i = 1, . . ., k. Assume for the sake of the induction that , for all i = 1, . . ., k.Using the calculations of Lemma 1, we have where we used bλ 2 > 1 on the last line.The proof of the first case follows by induction, that is, we have for all i = 1, . . ., k.

Assume now that
Let m * be the largest value in 0, . . ., n such that The purpose of Assumption (4) above is to make sure that we never "jump" entirely over the subset of (−ζ 0 , ζ 0 ) where (5) holds.Indeed, by (4) and it follows that we must also have The last expectation is finite for ζ small enough.