Increasing paths in regular trees

We consider a regular $n$-ary tree of height $h$, for which every vertex except the root is labelled with an independent and identically distributed continuous random variable. Taking motivation from a question in evolutionary biology, we consider the number of simple paths from the root to a leaf along vertices with increasing labels. We show that if $\alpha = n/h$ is fixed and $\alpha>1/e$, the probability there exists such a path converges to 1 as $h \to \infty$. This complements a previously known result that the probability converges to 0 if $\alpha \leq 1/e$.


Introduction
Consider a regular n-ary tree of height h, where n = αh.To each vertex except the root attach an independent and identically distributed continuous random variable.We are concerned with whether there is a simple (that is, non-backtracking) path from the root to a leaf whose labels only increase.Nowak and Krug [7] called this accessibility percolation and showed that P(there exists an increasing path) → 0 as n → ∞ if α < 1/e, whereas if α > 1 then there exists some p > 0 depending on α such that P(there exists an increasing path) > p.We give a complete characterisation in terms of α, showing that there is a phase transition at α = 1/e.Theorem 1.For α > 1/e, P(there exists an increasing path) → 1 as h → ∞.

Biological motivation
Consider the following simplified model of evolution in a population.Each genetic type, or genotype, in the population has an associated fitness.A particular genotype may give rise to multiple new genotypes through mutations, which either replace the original genotype or disappear from the population.If the rate of selection is stronger than the rate of mutation, only mutations which give rise to a fitter genotype survive.Therefore, the only possible evolutionary paths of genotypes are ones with increasing fitness.In the evolutionary biology literature, these increasing paths are known as selectively accessible [6,8,9].
To analyse the number of such paths, we also require the relationship between genotype and fitness.For this, we use the House of Cards model [4,5], in which every genotype has an independent and identically continuously distributed fitness.Since we only care about whether the fitnesses along a path are in increasing order, as long as the random variables are continuous, the precise distribution is not important.
The space of genotypes together with their fitnesses form a labelled graph.If we further assume that the population initally consists of one single genotype, and that separate mutations never give rise to the same genotype, then the space of genotypes becomes a rooted tree.A selectively accessible or increasing path is then a simple path from the root to a leaf along vertices with increasing labels.For the House of Cards model, we may assume that the root has the genotype of minimal fitness.This leads us precisely to the accessibility percolation model outlined above.

Other models
Our methods could be extended to consider, for example, Galton-Watson trees instead of n-ary trees.We might also be able to fine-tune our methods to gain information about the finer behaviour near the critical point α = 1/e, but this would be highly technical work and seems unlikely to offer further insight into the model.
Besides trees it is also natural to consider the House of Cards model on the n-dimensional hypercube {0, 1} n , for which there has been recent progress [2,3].A selectively accessible path in this setting is a path of minimal length on increasing labels from (0, . . ., 0) to (1, . . ., 1).Both papers consider the effect of varying the fitness at the zero vertex on the number of accessible paths.Hegarty and Martinsson obtain the threshold for the phase transition of the existence of increasing paths as n → ∞.
Berestycki, Brunet and Shi show that around this threshold, the number of such paths converges in distribution to the product of two independent exponential variables.As a first step, they obtain results for a particular rooted tree related to the hypercube.
Hegarty and Martinsson also consider another model for the relationship between genotype and fitness, known as the Rough Mount Fuji model in the biology literature [1], where a linear drift, depending on the distance to the root, is introduced to the random fitnesses.This model on n-ary trees was also considered in [7].

Notation
Throughout, we assume without loss of generality that the distribution of the labels is U [0, 1], and use the following crude double bound for Stirling's approximation valid for all n 1, Let P be the set of simple paths from root to leaf in the tree; then #P = n h .For a path u ∈ P , write X(u) = (X(u 1 ), . . ., X(u h )) for the (i.i.d., U [0, 1]) labels on its vertices.For any two paths u, v ∈ P , let a(u, v) = max{k : and for ε ∈ [0, 1), and It is clear that N ε N .We will attempt to show that P(N ε 1) is not too small when α > 1 e and α(1 − ε)e > 1.

Second moment bound
We break the second moment into a sum over k-forks: To this end, for k = 0, . . ., h, let Then ) each be a sequence of i.i.d.U [0, 1] random variables such that U j = V j for all j k and U j and V j are independent for j > k.Using the fact that a uniform [0, 1] random variable conditioned to have value at least ε is a uniform [ε, 1] random variable, we have for k = 2, . . ., h − 1, Putting these estimates together and then applying Stirling's approximation, we obtain that for k = 2, . . ., h − 1, . Similarly, Thus if α(1 − ε)e > 1, then for some constant c,

Proof of Theorem 1
By the Paley-Zygmund inequality, .
From our bounds on the first and second moments of N ε , we get for some δ > 0, for some constant c ′ > 0.
To complete the proof, we will consider the first four levels of the tree separately from the rest.We require the following form of Hoeffding's inequality.For j = 1, 2, 3, 4, let M j be the set of vertices v at the jth level of the tree such that v i ∈ [(i − 1)ε/4, iε/4) for each i = 1, . . ., j.  Summing these estimates gives the result.
Suppose that u ∈ M 4 , and consider the subtree of height h − 4 rooted at the vertex u 4 .In order that N e δh , it must hold that there are no more than e δh paths in this subtree that have labels ordered and greater than ε.But we know from (1), since n/(h − 4) n/h = α, that the probability of this event is at most 1 − c ′ h −3 .Thus, applying also Lemma 4, P(N exp(δh)) 4 exp(−ε 5 n/8192) + (1 − c ′ h −3 ) n 4 ε 4 /8 4 exp(−ηh) for some η > 0, which proves Theorem 1.