Sufﬁcient condition for root reconstruction by parsimony on binary trees with general weights *

We consider the problem of inferring an ancestral state from observations at the leaves of a tree, assuming the state evolves along the tree according to a two-state symmetric Markov process. We establish a general branching rate condition under which maximum parsimony, a common reconstruction method requiring only the knowledge of the tree topology (but not of the substitution rates or other parameters), succeeds better than random guessing uniformly in the depth of the tree. We thereby generalize previous results of [13, 37]. Our results apply to both deterministic and i.i.d. edge weights.


Introduction
Ancestral reconstruction In biology, the inferred evolutionary history of organisms is depicted by a phylogenetic tree, that is, a rooted tree whose branchings indicate past speciation events with the leaves representing living species. The evolution of features, such as the nucleotide at a given position in the genome of a species, the presence or absence of a protein or the number of horns in a lizard, is commonly assumed to follow Markovian dynamics along the tree [35]. That is, on each edge, the state of the feature changes according to a Markov process; at bifurcations, two independent copies of the feature evolve along the outgoing edges starting from the state at the branching point.
Here we consider the problem of inferring an ancestral state from observations of a feature at the leaves of a known phylogenetic tree. We refer to this problem, which has important applications in biology [26,8,31], as the ancestral reconstruction problem. Many rigorous results have been obtained in previous work; see, e.g., [20,4,19,36,24,9,25,27,5,13,37,33,28,29,3,21,10,23,11,15,1] for a partial list. Typically, one seeks an estimator of the root state which is strictly superior to random guessinguniformly in the depth of the tree-under a uniform prior on the root [25]. Whether such T = (V, E) be an infinite complete binary tree rooted at ρ with vertex set V and edge set E. That is, all vertices of T have exactly two children; in particular, T has no leaf. Every edge e ∈ E is assigned a weight θ e ∈ [0, 1]. If e = (x, y) where x is the parent of y, we also write θ y = θ e . We use the notation x ≤ y to indicate that x is an ancestor of y and we write x < y for x ≤ y and x = y. We also let s(x) be the sibling of x = ρ.
Under the Cavender-Farris-Neyman (CFN) model over T and θ = (θ y ) y∈V , also known as Neyman 2-state model,we associate to each vertex x ∈ V a state σ x ∈ {0, 1} as follows. The state at the root, σ ρ , is picked uniformly at random in {0, 1}. Recursively, if y has parent x, state σ y is equal to σ x with probability θ y , otherwise it is picked uniformly at random in {0, 1}. We let p y = P T ,θ [σ x = σ y ] = Sufficient condition for root reconstruction by parsimony cutsets of T . We denote by T π = (V π , E π ) the finite tree obtained from T after removing all descendants of the vertices in π.
As mentioned above, our focus in this work is on a root state estimator known in phylogenetics as maximum parsimony. Fix a cutset π and assume that the states on π are observed. The parsimony principle dictates that one assigns to each vertex x (ancestor to the observed cutset π) a stateσ x such that the overall number of changes along the edges of T π , namely, (x,y)∈E π 1{σ x =σ y }, is minimized, where we let by defaultσ z = σ z for all z ∈ π. In case both 0 and 1 can be obtained in this way as root state, a uniformly random value in {0, 1} is returned. We let RA π T ,θ be the reconstruction accuracy of parsimony, i.e. the probability that it correctly reconstructs the root state.
Main result: deterministic weights In our main result, we give conditions under which the reconstruction accuracy of maximum parsimony is uniformly bounded away from 1/2. (1.1) Then inf π∈C (T ) RA π T ,θ > 1 2 , i.e., the reconstruction accuracy of parsimony on T is away from 1/2.
Condition (1.1) involves the branching number, a generalized notion of branching rate which plays a key role in the analysis of many stochastic processes on trees and tree-like graphs. See e.g. [22]. The following example provides some intuition in a special case. See Lemma 4.1 for another illustration. Example 1.3 (Fixed edge weights). As a simple illustration, observe that, when all weights are equal to θ * ∈ (0, 1], the supremum in (1.1) is attained for κ = 2θ * . Indeed, the sum in (1.1) when κ = 2θ * simplifies to for any cutset π, where the equality can be proved by induction on the graph distance from the root to the furthest vertex in π. On the other hand, letting π n be the cutset of all vertices at graph distance n from ρ, for any ε > 0 it holds that as n → +∞. Hence, in this case, condition (1.1) reduces to 2θ * > 3/2, that is, θ * > 3/4. In terms of substitution probability, this is p * = 1−θ * 2 < 1/8. The argument in the example above leads to the following corollary. Corollary 1.4 (All substitution probabilities below the threshold). Let the substitution probabilities (p z ) z∈V satisfy p z ∈ [0, p * ] for all z ∈ V, for some p * < 1/8. Then the reconstruction accuracy of maximum parsimony on T is bounded away from 1/2. Zhang et al. [37] also established the special case of Theorem 1.2 when p z = p * < 1/8 for all z and π = π n . Their proof proceeds through a careful analysis of the limit of a recurrence for RA πn T ,θ first derived in [34]. Our more general result follows from a softer argument which relies on the instability of a fixed point of this recurrence corresponding to asymptotic reconstruction accuracy 1/2. A more detailed proof sketch is given in Section 2 following some preliminaries. We note that our proof method may be of more general interest, e.g., to extend the results beyond the two-state case where the higher dimensionality of the system may complicate significantly the derivation of an explicit limit even when edge weights are constant.
The previous theorem covers in particular the case of the pure birth process, or Yule tree, which is a popular random model of phylogenetic trees. See e.g. [35]. In that case, θ z = e −2Tz , where T z is an exponential with rate λ. To derive the corresponding threshold, we note that Then µ > 3 4 ⇐⇒ λ > 6, which is consistent with the results of [13, Theorem 2.3].

Preliminaries
Computing parsimony Our proofs are based on a recurrence for the reconstruction accuracy. Maximum parsimony can be computed efficiently by dynamic programming, which is referred to as the Fitch method.
Definition 2.1 (Parsimony recursion). The Fitch method recursively constructs a set S π z of possible states for each vertex z ∈ V π , starting from π, as follows. If z ∈ π, S π z = {σ z }. If z / ∈ π and has children x and y, The method returns the maximum parsimony estimatorσ ρ which is equal to the unique state in S π ρ if | S π ρ | = 1, and otherwise returns a uniformly random value in {0, 1}.
What is described above is the bottom-up phase of the Fitch method. (A top-down phase, which we will not require here, then assigns a state to each vertex in V π consistent with a maximum parsimony solution; e.g. [35].) Let π be an arbitrary cutset on T with states σ u , u ∈ π, and let S π z , z ∈ V π , be the corresponding reconstructed sets under the Fitch method. We define Under our randomization rule, the reconstruction accuracy of maximum parsimony RA π T ,θ is given by Proof sketch Fix θ satisfying the assumptions of Theorem 1.2 and fix a cutset π. To analyze (α π z , β π z ), it is natural to take advantage of the recursive nature of T .Let x and y be the children of z. The event S π z = {σ z } occurs when either (i) S π x = {σ z } and S π y = {σ z }, or (ii) S π x = {σ z } and S π y = {0, 1} or vice versa. By the Markov property of the CFN model, the random variables S π x and S π y , which are functions only of the states of π below x and y respectively, are conditionally independent given σ z . Hence, letting q u = 1 − p u for u = x, y and taking into account the possibility of a mutation along the edges (z, x) and (z, y), it follows as first derived in [34,Lemma 7.20] that where the first and second lines on the r.h.s. correspond respectively to cases (i) and (ii) above. Similarly, In the case that p u = p for all u, a fixed point analysis was performed in [34,Theorem 7.22]. It was found that, if p ≥ 1/8, there is a single fixed point (1/3, 1/3) which corresponds informally to "having no information about the root." in the second case was established rigorously in [37]. One step in [37] involved the derivation of a new recurrence in terms of the quantities α πn z − β πn z and 1 − (α πn z + β πn z ), which facilitates the analysis of the limit in the fixed edge weight case.
Going back to binary trees with general weights, as our starting point we further modify the recurrence of [37]. For all z ∈ V, we define d π z = α π z − β π z , and u π z = 3(α π z + β π z ) − 2. (2.4) We show in Proposition 2.2 below that (d π z , u π z ) satisfies the following recurrence for z ∈ V π − π with children x, y, as well as the inequalities 0 ≤ d π z ≤ 1 and −1/2 ≤ u π z ≤ 1 for all z ∈ V π and the boundary conditions d π z = u π z = 1 for all z ∈ π. Our choice of parametrization is motivated in part by the fact that the "no information" fixed point is now at (0, 0) and that |d π z |, |u π z | ≤ 1. At a high level we show that, under the branching rate condition (1.1), the fixed point (0, 0) is "unstable" and that d π z in particular stays bounded away form 0. That in terms implies a lower bound on the reconstruction accuracy as, by (2.1), we have The link between stability and the weighted branching rate in (1.1) can be seen from (2.5). Consider first the simpler special case where all weights are equal to θ * and π = π n : by symmetry, d π x takes the same value for all x at the same graph distance from ρ; and moreover assuming that we are close to the fixed point (0, 0), that is, (u π x , d π x ) ≈ (0, 0), ECP 26 (2021), paper 55.
Recurrence Before proceeding to the proof of our main results, we first establish a basic recurrence which follows from the work of [37]. We give a short proof for completeness.
For all z ∈ V π − π, if x, y are the children of z, the system (2.5) and (2.6) holds.
The second statement is merely a change of variables. We briefly expand on the first equation (the other one being similar). Let z ∈ V π − π with children x, y. We define Σ u = α π u + β π u and ∆ u = α π u − β π u . By the definitions of p x and q x , note that and, similarly, p x α π x + q x β π x = 1 2 Σ x − 1 2 θ x ∆ x . Hence, by (2.2) and (2.3), Subtracting the above two equations, we get which after plugging in (2.4) gives (2.5).
Because α π z and β π z are probabilities and further α π z + β π z ≤ 1, we have that d π z ≤ 1 and u π z ≤ 1, for all z. Moreover, that together with the boundary conditions and (2.5) implies that d π z ≥ 0 for all z by induction. In turn, that together with the boundary conditions and (2.6) implies that u π z ≥ −1/2 for all z.

Deterministic weights
Before proceeding with the proof of Theorem 1.2, we first prove some lemmas.

Controlling dand u-values
In the first lemma, we express the d-value at the root as a function of the dand u-values above an arbitrary cutset. Recall that s(z) is the sibling of z. Lemma 3.1 (Controlling the root with a cutset). For any cutset π in T π , it holds that Proof. The result follows by recursively applying (2.5) from the root down to π . We implicitly use the fact that, by definition, a cutset is minimal.
Our second lemma shows that d-values cannot grow too fast down the tree. This fact will be useful to proving the next key lemma. We will need the lower bound θ * = inf z∈V θ z > 0, on the θ-values. For v, w ∈ V π , we let γ(v, w) be the graph distance between v and w in T π . Recall that θ * ≤ 1 < 2. .
Proof. Let z ∈ V π , not on π, have children x and y. (Note that, in the case where z is the parent of a vertex on the cutset π, z itself cannot be on the cutset by minimality and therefore both its children are in V π .) By Proposition 2.2, we have u π x , u π y ≤ 1 and d π x , d π y ≥ 0, which implies that both terms on the r.h.s. of (2.5) are non-negative. Hence, using θ x ≥ θ * , (2.5) gives d π z ≥ 1 2 θ * d π x . In particular, d π z ≤ ε implies that d π x ≤ ε (2/θ * ).
Recursing gives the claim.
Our final lemma controls u-values at the root of a subtree where d-values are uniformly small.
Proof. Let H be the smallest non-negative integer such that Define ε > 0 to be the largest positive real such that The rest of the proof proceeds in two steps: we derive a simplified recurrence for u-values and solve it. 1. Simplified recurrence: Assume that d π v ≤ ε (2/θ * ). Let w be a descendant of v with graph distance γ(v, w) ≤ H. Then, by Lemma 3.2, d π w ≤ ε (2/θ * ) H+1 < 1, where we used (3.3). This show, in particular, that all descendants of v in V within graph distance H are in fact strictly above π, because d-values are 1 on the cutset π. Moreover, by the recurrence (2.6) and the inequality (3.3), for any descendant w 0 of v with children w 1 , w 2 in V π that are within graph distance H of v, we have U h = sup {|u π w | : w ∈ V π , v ≤ w and γ(v, w) = h} , and U 0 = |u π v |. By the remark above, the set in the previous display is non-empty for all h = 0, . . . , H − 1. Taking a supremum on both sides of (3.4) gives the recurrence (3.5)

Solution:
We show by induction on h (backwards from H − 1) that For the base of the induction h = H − 1, we have indeed that , where we used (3.5) Because by assumption φ ≤ 1/9, the square bracket above is ≤ 3. That concludes the induction.

Proof of main theorem
Proof of Theorem 1.2. Fix π ∈ C (T ) and assume that θ * > 0 and that (1.1) holds. Then there is 0 < φ ≤ 1/9 and 0 < ζ < 1 such that for all cutsets π ∈ C (T ). For this value of φ, let ε be as in Lemma 3.3 and define ε = ε ζ < ε . The proof proceeds by contradiction. Assume that d π ρ ≤ ε. Let π be the cutset of those nodes closest to the root where the d-values first cross above ε , i.e., formally π = {x ∈ V π : d π x > ε and d π z ≤ ε , ∀z ≤ x}. Such a cutset (which is necessarily minimal) exists because d π v = 1 for all v ∈ π and ε > ε. By Lemma 3.2, for all z on or above π , i.e. such that z ≤ x for some x ∈ π , we have d π z ≤ ε 2 θ * and d π s(z) ≤ ε 2 θ * , since the immediate parent of z (and s(z)) has d-value ≤ ε by definition of π . By Lemma 3.3, we then have |u π z | ≤ 4φ and u π s(z) ≤ 4φ.
which is a contradiction.

I.i.d. weights
In this section, we prove our main result in the i.i.d. weight case. Because there is no lower bound on the weights, Theorem 1.2 cannot be applied directly to this case. In particular, the absence of a lower bound makes controlling the u-values more challenging.
Here we identify a subtree of T where u-values are well-behaved. The existence of such a subtree is established with a coupling to a percolation process, where open edges roughly indicate that weights are uniformly bounded in a properly defined neighborhood.
Proof of Theorem 1.5. First, we need the following percolation result. To each edge e = (x, y) of T , where x is the parent of y, we assign an independent random weightθ y drawn from a distribution Θ over (0, 1]. We also pick an independent indicator variablẽ J y , which is 1 with probabilityq ∈ [0, 1] and 0 otherwise. LetT = (Ṽ,Ẽ) be the subtree of T whose vertices x satisfy ρ =z≤xJ z = 1 and whose edges are those with endvertices satisfying that condition. We let N ext be the event of non-extinction, i.e., the event that T is infinite. By standard branching process arguments [2], the extinction probabilityφ satisfiesφ =q 2φ2 + 2(1 −q)qφ + (1 −q) 2 , i.e., (4.1)  Proof. This result can be proved along the lines of Proposition 3.2, Proposition 5.1 and Corollary 5.2 in [30]. We sketch a proof.
For any c > 0 and any rooted, locally finite tree (T , θ ) with positive edge weights, it follows from the definition of κ(T , θ ) that the property {T is finite or κ(T , θ ) ≤ c} is inherited, in the sense that every finite weighted tree (T , θ ) has this property and that the descendant subtrees of the root of (T , θ ) have the property whenever (T , θ ) itself has it. From an argument similar to Proposition 3.2 in [30] (and using the independence of the weights), κ(T ,θ) is almost surely constant conditioned on N ext .
Letπ n ∈ C (T ) be the set of vertices ofT at graph distance n from the root. Letting F n = σ {J z ,θ z : z ∈ π m , m ≤ n} , it can be checked that the sequence M n := x∈πn ρ =z≤x (2qμ) −1θ z is a non-negative martingale. It therefore converges to a finite limit almost surely as n → +∞. That implies that κ(T ,θ) ≤ 2qμ almost surely (whether or not extinction occurs).
By the previous two paragraphs it suffices to show that, conditioned on N ext , κ(T ,θ) ≥ 2qμ with positive probability. Suppose we perform the following percolation process on (T ,θ) with parameter p ∈ [0, 1]: each edge e = (x, y) ∈Ẽ is open independently with probability pθ y ; let O p be the event that there is then an infinite open path originating from the root onT . By a union bound, the inequality P O p (T ,θ) ≤ x∈π ρ =z≤x pθ z ECP 26 (2021), paper 55. holds for any π ∈ C (T ). On the other hand, E P O p (T ,θ) > 0 wheneverqμp > 1/2. Indeed, combining the process generating (T ,θ) with the subsequent percolation process on (T ,θ) is equivalent to a percolation process on the original binary tree T with parameterqμp. The results follows.
As discussed briefly above, we use a coupling argument. In order to describe the coupling, we first need to define some constants (not depending on θ). Recall that µ is the mean of the edge weight distribution Θ and that δ is the desired failure probability. Let q ∈ (0, 1) be close enough to 1 that and 2qµ > 3/2. Let then 0 < φ ≤ 1/9 be such that 2qµ > 3 2(1 − φ) .
The purpose of the coupling is to show that the argument used in Lemma 3.3 to control the u-values can be applied to the vertices inT . This is stated in the next lemma.