The Wright--Fisher model for class--dependent fitness landscapes

We consider a population evolving under mutation and selection. The genotype of an individual is a word of length $\ell$ over a finite alphabet. Mutations occur during reproduction, independently on each locus; the fitness depends on the Hamming class (the distance to a reference sequence $w^*$). Evolution is driven according to the classical Wright--Fisher process. We focus on the proportion of the different classes under the invariant measure of the process. We consider the regime where the length of the genotypes $\ell$ goes to infinity, and both the population size and the inverse of the mutation rate are of order $\ell$. We prove the existence of a critical curve, which depends both on the population size and the mutation rate. Below the critical curve, the proportion of any fixed class converges to $0$, whereas above the curve, it converges to a positive quantity, for which we give an explicit formula.


Introduction
Most of the living populations share, among others, these three main features: genomes are long, populations are large, and mutations are rare. Nevertheless, when modeling a living population, different relations between those three parameters will lead to different conclusions. We focus here on a situation which is most appropriate for living beings of small complexity, as RNA viruses, or replicating macromolecules: we aim to model a population in which both the population size and the inverse mutation rate are of the same order as the length of the genome [10]. The main forces that will drive the evolution of such a population are, of course, mutation, but also selection, and genetic drift. Selection is introduced via a fitness function on the genotypes, which encodes the average number of offspring of an individual carrying a particular genotype. Genetic drift is introduced by considering a finite population of constant size. This modeling situation is known to lead to very particular and interesting phenomena: Error threshold. There is a critical mutation rate separating two different regimes. Above the critical mutation rate, all genetic information is eventually lost, while below the critical mutation rate, an equilibrium state is reached in which the fittest genotype (the master sequence) is present in a positive proportion.
Quasispecies. The equilibrium that is reached below the error threshold consists of a positive proportion of the fittest genotype, which may be very low, and mutants that are a few mutations away from the master sequence may appear in high proportions. Thus, the genetic heterogeneity of such an equilibrium state is huge, and we might as well not be able to identify the master sequence. Such a population is often referred to as a quasispecies.
Population threshold. A low mutation rate is not enough for a quasispecies to form. Indeed, if the population is too small, it is likely that the master sequences present in the population mutate all at once or in a few generations, thus loosing the driving force of the quasispecies. This event becomes more and more unlikely as the population size grows, thus giving rise to a second threshold phenomenon, namely a population threshold.
The first two phenomena where first observed by Eigen, in a mathematical model for prebiotic populations [7]. The concept of quasispecies was later popularized by Eigen and Schuster [8]. The model considered by Eigen takes the population size to be infinite, and models the evolution via a system of differential equations. The system is studied in the long chain regime, i.e., when the length of the genomes goes to infinity, and the error threshold and quasispecies phenomena are found. In order to observe the population threshold, it is necessary to consider a model where the population is taken to be finite. This phenomenon has first been observed in [1] for the Moran model and [2] for the Wright-Fisher model. A nice account of the error threshold and quasispecies phenomena, the main models where they arise, and their applications can be found in [6]. We refer the reader to [1] for a more detailed exposition of the different attempts to build finite population models that present the error threshold and the quasispecies.
Most of the works that show the above three phenomena deal with the simplest possible fitness landscape, namely the sharp-peak landscape: there is a single fittest genotype, the master sequence, and all the other genotypes share the same fitness. The works [2,4] show how, in the sharp-peak landscape, the Wright-Fisher model presents all three of the above phenomena. Our objective is to extend these results to more general fitness landscapes. We focus in the present paper on the case of class-dependent fitness functions: there is a single fittest genotype, and the fitness of any other genotype is a function of its Hamming distance to the fittest genotype. We present the model in section 2, while the main result is presented in section 3, along with a sketch of the proof. The remaining sections are devoted to the proof of the main result.

The model
Let A be a finite alphabet of cardinality κ ≥ 2, and let ℓ ≥ 1 represent the length of the genome. We consider individuals whose genotypes are elements of A ℓ . Each genotype u ∈ A ℓ has a fitness A(u) associated to it, which should be interpreted as the mean number of children of an individual carrying the genotype u. When a reproduction occurs, the newborn child is subject to mutations. We suppose that mutations happen independently over each site of the genotype, with probability q ∈ ]0, 1[ . When a particular site mutates, the present letter is replaced with a uniformly chosen letter from the κ − 1 remaining ones. Thus, the probability of mutating from a chain u to another chain v is given by where d(u, v) represents the Hamming distance between u and v, or equivalently, the number of digits the two sequence differ in.
The evolution will be guided by the classical Wright-Fisher process. Nevertheless, the analysis of the Wright-Fisher process for an arbitrary fitness function A is far too complicated. We focus here on fitness functions of a particular form, namely the class-dependent fitness functions. We make the following assumptions on A.
Master sequence. We assume the existence of a genotype with maximal fitness w * ∈ A ℓ , which we call the master sequence.
Class-dependence. We assume further that the fitness of a genotype u depends only on the number of point mutations away from the master sequence. All the sequences at Hamming distance k from the master sequence form the Hamming class k, and they all share the same fitness.
Eventually constant. Finally we assume that there is a Hamming class K ≥ 0 such that the fitness of all the genotypes in the classes over K is 1.
Under these assumptions, we can define a function A H : { 0, . . . , ℓ } → R + such that: • For all u ∈ A ℓ we have A(u) = A H d(u, w * ) .
When K = 0, all the genotypes other than the master sequence have fitness 1. This particular case is referred to as the sharp-peak landscape; the Wright-Fisher model on the sharp peak landscape has been studied in detail in [2,4]. Our aim is to generalize the results therein to class-dependent fitness functions which are eventually constant. One of the main advantages of working with class-dependent fitness functions is that we can break the space A ℓ into Hamming classes. This is possible because the mutation matrix M respects the Hamming classes (cf. [2] for a proof): fix 0 ≤ k, l ≤ ℓ and let X ∼ Bin(k, q/(κ − 1)), Y ∼ Bin(ℓ − k, q) be independent random variables, then for any u ∈ A ℓ in the class k, We denote the above quantity by M H (k, l), and we call M H the lumped mutation matrix, and A H the lumped fitness function. The original mutation matrix M and fitness function A have served their purpose, and we will not make reference to them again in the rest of the paper.
Notation. In order to ease the notations, we will no longer add the subscript H to the lumped fitness function or the lumped mutation matrix, and we will denote them simply by A and M.
We consider a population of size m ≥ 1 evolving according to the classical Wright-Fisher process. Informally, the transition from the population at time n, to the population at time n + 1 is done as follows: m individuals are sampled from the population at time n, with replacement. At each of the m trials, the probability for a given individual to be chosen is fitness of the individual sum of all fitnesses in the population .
Each of the m chosen individuals reproduces, and the offspring mutate. The ensemble of the m offspring, after mutation, form the population at time n + 1. We will only be interested in the proportions of the different Hamming classes, and not on the distribution of the different genotypes inside the classes themselves; the only information we actually need about the population at time n, is the number of individuals in each of the Hamming classes. Indeed, this information is enough to determine the number of individuals in each class at time n + 1. The process that keeps this information is the occupancy process (O n ) n≥0 and it will be the starting point of our study. It is obtained from the original Wright-Fisher process (X n ) n≥0 by using a technique known as lumping; for a formal definition of the original Wright-Fisher process, as well as for a formal derivation of the occupancy process from it, we refer the reader to sections 2 and 4 of [2]. Let P m ℓ+1 be the set of the ordered partitions of the integer m in at most ℓ + 1 parts: is interpreted as an occupancy distribution, which corresponds to a population with o(l) individuals in the Hamming class l. The occupancy process (O n ) n≥0 is a Markov chain with values in P m ℓ+1 and transition matrix given by: for o, o ′ ∈ P m ℓ+1 , We define the function F : S ℓ −→ S ℓ by setting In view of the expression of the transition matrix, for all o ∈ P m ℓ+1 and n ≥ 0, given that O n = o, the random vector O n+1 follows a multinomial law with parameters m and F (o/m).
Notation. The expression appearing in the denominator of the function F (x) represents the mean fitness of the population x. Since it will recurrently appear in the subsequent formulas, for any k ≥ K and x ∈ R k+1 , we denote A straightforward treatment of the occupancy process is hardly tractable. Lucky for us, in most living populations, genomes are long, populations large, and mutations rare. We will thus carry out the study of the occupancy process in the following asymptotic regime: This asymptotic regime has two main consequences on the normalized occupancy process (O n /m) n≥0 , • Since m → ∞, the multinomial law involved in the transition mechanism of the process concentrates around its mean, which is given by the mapping F , and the trajectories of the process tend to be close to those of the discrete dynamical system given by the iterates of F .
• Since ℓ, 1/q → ∞ and ℓq → a, the mutation matrix M converges to an infinite upper diagonal matrix M ∞ ; the probability of mutating to a lower class converges to 0, and the probability of jumping forward converges to a Poisson law of parameter a (cf. the appendix A).
The first k + 1 coordinates of F converge to a mapping G : D k −→ D k given by Thus, asymptotically, the coordinates 0, . . . , k of the normalized occupancy process can be seen as a random perturbation of the discrete dynamical system given by the iterates of G: r 0 ∈ D , r n = G(r n−1 ) = G n (r 0 ) , n ≥ 1 .
In fact, this dynamical system will play a key role in our analysis. The mapping G and the dynamical system associated to it have extendedly been studied in the works [3] and [5]. The main results concerning the fixed points of G are given in [3], while the results concerning the stability of the fixed points and the convergence of the dynamical system are given in [5]. We summarize these results in the upcoming propositions. Consider the set of Let I A be the set of these indexes, to which the index K + 1 has been added too.
Proposition 2.1. The mapping G has as many fixed points in D as there are elements in I A . For each b ∈ I A , the associated solution ρ b is given by .
Note that in particular, the solution corresponding to the index K + 1 is identically 0, and that it is the only fixed point of G if and only if A(0)e −a ≤ 1.
Let I A = { b 1 , . . . , b N } and note that N = 1 corresponds to 0 being the only fixed point of G. Define, for b ∈ I A , the set D b ⊂ D by We have the following result.
Moreover, the map G is contracting in a small enough neighborhood of ρ b intersected with D b .
Consider for example the following fitness function: In this case K = 2. Suppose further that a is such that 4e −a > 1 > 2e −a . Then, the mapping G = (G 0 , G 1 , G 2 , ) has three fixed points in the set D = { r ∈ R 3 : r 0 , r 1 , r 2 ≥ 0 and r 0 + r 1 + r 2 ≤ 1 }. The point 0 is always a fixed point, and in this case, its basin of attraction is just { 0 }. We have two other fixed points, ρ 0 and ρ 2 . The basin of attraction of ρ 2 is the set { r ∈ D : r 0 = 0 } \ { 0 }, and the basin of attraction of ρ 0 is the set { r ∈ D : r 0 > 0 }. In fact, if A(0)e −a > 1, the fixed point ρ 0 always exists, and its basin of attraction is always the set { r ∈ D : r 0 > 0 }. Moreover, the mapping G is contracting in a small enough neighborhood of ρ 0 .

Main result
Let us denote by µ the invariant probability measure of the process (O n ) n≥0 . For any 0 ≤ k ≤ l we denote by π k the mapping R l+1 → R k+1 that keeps the first k + 1 coordinates, i.e.

Moreover, in both cases,
The remaining section are devoted to the proof of the theorem; let us now give a sketch of this proof. Recall that the occupancy process can be seen as a random perturbation of the discrete-time dynamical system associated to G. When A(0)e −a ≤ 1, the mapping G has 0 as its only fixed point, and the result readily follows. When A(0)e −a > 1 the behavior of the process is much more intricate. First, we need to differentiate between two very different regimes: the neutral and non-neutral phases. The neutral phase consists of the populations where none of the classes 0, . . . , K are present. The process will then explore the set of those populations until it finds an individual in one of the classes 0, . . . , K. While exploring the set of the neutral populations, there is no selection, and the process behaves the same as if the function A were constant. In order to study this phase we will rely on the results found in [2] for the sharp-peak landscape, and we will show in section 9 that the mean time needed to exit the neutral phase is of the order of κ ℓ . The non-neutral phase consists of the populations where at least one of the classes 0, . . . , K is present. In the set of non-neutral populations, the process (O n ) n≥0 will tend to behave as the dynamical system associated to G, this fact will be rigorously stated thanks to a large deviations principle, which we develop in section 4. Inspired by the theory of Freidlin and Wentzell for random perturbations of dynamical systems [9], we exploit the large deviations principle in order to control several quantities associated with the process (O n ) n≥0 , • We show that the process is very unlikely to stay away from a neighborhood of all of the fixed points for a long time (section 5).
• We show that the process enters the basin of attraction of the main fixed point ρ 0 in a few steps with reasonable probability. In fact, this is one of the most technical parts of the proof, since the large deviations principle is of little help. Indeed, we need to control the probability for the process to create ηm master sequences out of 1 master sequence, for some η > 0 (section 6).
• We estimate the mean time that the process needs to exit the set of nonneutral populations, which turns out to be of the order of e mψ(a) . The function ψ(a) represents the quasipotential linking the points ρ 0 and 0, or otherwise stated, the "energy" of the most likely path the process is to follow when going from ρ 0 to 0 (section 7).
• We show that when inside the set of non-neutral populations, the process spends most of its time in a neighborhood of ρ 0 (section 8).
Finally, we put all the above estimates together, and we use them to show the main theorem, with help of the ergodic theorem for Markov chains (section 10 and 11). The case K = 0 corresponds to the sharp-peak landscape, and has been treated in [2,4]. The generalization to the class-dependent case is not straightforward. Indeed, the proofs in [2,4] rely strongly on coupling and monotonicity arguments, which cease to work for arbitrary class-dependent functions. In addition, the behavior of the dynamical system associated to G is richer; in the sharp peak landscape, the only possible fixed points are ρ 0 and 0, while for more general fitness functions intermediate fixed points appear. The new proofs rely on finding estimates that are uniform with respect to the initial points, and are therefore more robust than the original proofs in [2,4].
Since our aim is to send the length of the sequences ℓ to infinity, the number of coordinates of the occupancy process will grow to infinity with ℓ. In order to deal with this inconvenience, we will truncate the process (O n ) n≥0 so that the number of coordinates is fixed. Throughout the rest of the section we fix k to be an integer larger or equal to K. We define the truncated process The process (Z n ) n≥0 takes values in the set The process (Z n ) n≥0 is not Markovian, since the coordinates that we are leaving out in its definition cannot be ignored when computing the transition probabilities of the process (Z n ) n≥0 . Indeed, for any o ∈ P m ℓ+1 , z ∈ D k and n ≥ 0 we have However, in the asymptotic regime we consider, the process (Z n ) n≥0 behaves as a small random perturbation of the dynamical system associated to the mapping G, and therefore, the process (Z n ) n≥0 can be seen as being "asymptotically Markovian". We will start by developing a large deviations principle for the transition probabilities of the process (Z n ) n≥0 in the next section. In most subsequent sections the process (Z n ) n≥0 will be the main object of our study. We develop next a large deviations principle for the transition probabilities of the process (Z n ) n≥0 .
Notation. In defining (Z n ) n≥0 we have fixed a coordinate k ≥ K, but since the treatment is the same for all k ≥ K, in the sequel, we assume that k = K. We will also denote the sets D K and D K simply by D and D, and the mapping π K by π.

Large deviations principle
For p, t ∈ D, we define the quantity I K (p, t) as follows: We make the convention that 0 ln 0 = 0 ln(0/0) = 0. The function I K (p, ·) is the rate function governing the large deviations of a multinomial distribution with parameters n and p 0 , . . . , p K , 1 − |p| 1 . We have the following estimate for the multinomial coefficients: The proof is similar to that of lemma 7.1 of [2]. Thanks to the lemma, for o ∈ P m ℓ+1 and z ∈ D The error term Φ(o, z) satisfies, for m large enough, where C(K) is a constant that depends on K but not on m. We define a function V 1 : D × D → [0, ∞] by setting, for r, t ∈ D, For x ∈ S ℓ and t ∈ D, we have, lim ℓ→∞, q→0 ℓq→a Notation. For a subset A of D, we denote by A the set mA ∩ D. For r ∈ R K+1 , we denote by ⌊r⌋ the vector ⌊r⌋ = (⌊r 0 ⌋, . . . , ⌊r K ⌋). • For any subset U of D and for any r ∈ D, we have, for n ≥ 0, • For any subsets U, U ′ of D, we have, for n ≥ 0, Proof. We begin by showing the large deviations upper bound. Let U, U ′ be two subsets of D and notice that, for all z ∈ D and n ≥ 0 Let o ∈ P m ℓ+1 be such that π(o) ∈ U. For n ≥ 0, we have The number of elements in the sum is of polynomial order in m, the exponent of m depending on K only. Thus, thanks to the above estimates on the transition probabilities for the process (Z n ) n≥0 , we have, for m large enough, where C(K) and C ′ (K) are constants that depend on K but not on m. Define the mappings F , F : D −→ D by setting, for all r ∈ D and k ∈ { 0, . . . , K } Asymptotically, for 0 ≤ k < j ≤ ℓ, we have M(j, k) ≤ M(k + 1, k). Thus, asymptotically, for all x in the unit simplex S ℓ , and for all k ∈ { 0, . . . , K }, Define next the function V : Moreover, asymptotically, for r, t ∈ D, Thus, For each m ≥ 1, let z m , z ′ m ∈ D, be two terms that realize the above minimum. Up to the extraction of a subsequence, we can suppose that when m → ∞, Optimizing with respect to r, t, we obtain the upper bound of the large deviations principle. We show next the lower bound. Let r, t ∈ D and notice that, for all z ∈ D and n ≥ 0 let o ∈ P m ℓ+1 be such that π(o) = ⌊mr⌋. We have Moreover, asymptotically, for r, t ∈ D, Thus, for every o ∈ P m ℓ+1 such that π(o) = ⌊mr⌋, We take the logarithm and we send m, ℓ to ∞ and q to 0. We obtain then lim inf ℓ,m→∞, q→0 ℓq→a Moreover, if t ∈ U o , for m large enough, ⌊tm⌋ belongs to U. Therefore, We optimize over t and we obtain the large deviations lower bound.
A similar proof shows that the l-step transition probabilities of (Z n ) n≥0 also satisfy a large deviations principle. For l ≥ 2, we define a function V l on D × D as follows: Corollary 4.3. For l ≥ 1, the l-step transition probabilities of (Z n ) n≥0 satisfy the large deviations principle governed by V l : • For any subset U of D and for any r ∈ D, we have, for n ≥ 0, • For any subsets U, U ′ of D, we have, for n ≥ 0, lim sup ℓ,m→∞, q→0 ℓq→a The rate function V 1 (r, t) is equal to 0 if and only if t = G(r). Thus, the Markov chain (Z n /m) n≥0 can be seen as a random perturbation of the dynamical system associated to the map G (cf. section 2). The next sections study the consequences of the large deviations principles of proposition 4.2 and corollary 4.3 on the asymptotic behavior of the process (Z n ) n≥0 .
Notation. In the sequel, by "asymptotically" we mean: for ℓ, m large enough, q small enough, and ℓq close enough to a. All subsequent statements and inequalities need not be true for all values of ℓ, m and q, but only asymptotically, even if we do not state so explicitly. For o ∈ P m ℓ+1 or z ∈ D, we use the notation Note that the first expectation will usually be a number, while the second one will usually be a random variable. Thus, an expression of the sort should be interpreted as

Time spent away from the fixed points
The aim of this section is to show that the process (Z n ) n≥0 has a small probability of staying away from a neighborhood of the fixed points for a long time. We begin by giving a useful lemma. For a set A ⊂ D and ε > 0 we denote by A ε the set of points in D at a distance smaller than ε from A, i.e., Let K, U be subsets of D satisfying : • The set K is compact and U is open (with respect to the relative topology in D).
• There exists ε > 0 such that any trajectory of the dynamical system starting on K goes through U, and does so before exiting K ε , i.e., for all r ∈ K there exists n(r) ∈ N such that G 1 (r), . . . , G n(r)−1 (r) ∈ K ε and G n(r) (r) ∈ U .
Lemma 5.1. There exist h ∈ N and c > 0 (depending on K, U) such that, asymptotically, for every point z ∈ K Proof. For r ∈ R K+1 and η > 0, we denote by B(r, η) the open ball around r of radius η (intersected with D). Recall that for r ∈ D we denote by r n the n-th iterate of r by the map G. By continuity of the map G, for every r ∈ K, there exists h(r) ∈ N and 0 < η r 0 , . . . , η r h(r) < ε such that, for all 0 ≤ n ≤ h(r) − 1, The family { B(r, η r 0 ) : r ∈ K } forms an open cover of the compact K. Thus, there exist r 1 , . . . , r M ∈ K such that Let t ∈ K and let i ∈ { 1, . . . , M } be such that t ∈ B(r i , η r i 0 ). We denote the quantity h(r i ) simply by h(i), the open ball B(r n i , η r i n ) by B n , and we set B n = mB n ∩ D. We have then, The large deviations principle for the transitions of (Z n ) n≥0 yields the following bound, i is strictly positive. Let 0 < η < c n i . From the above inequalities, we conclude that Since h is fixed, and since the number of constants c n i is finite, the above probability is bounded by e −mc , for some c > 0 independent of t.
We have discussed the behavior of the dynamical system associated to the mapping G in section 2, we recall that the set I A encodes the fixed points of G. Let δ > 0 and define, for b ∈ I A , the sets The set D \ U is compact and for every r ∈ D \ U, We use the lemma to prove the following corollary.
Corollary 5.2. There exist h ∈ N and c > 0 such that, asymptotically, for every z ∈ D \ U and n ∈ N, Proof. Divide the interval { 0, . . . , n } into subintervals of length h. Using iteratively the previous lemma, we have, for i ≥ 1, Iterating this procedure we get Taking i + 1 = ⌊n/h⌋ gives the desired result.

Creating enough master sequences
Throughout this whole section we assume that A(0)e −a > 1. The aim of this section is to show that starting from any point of D \ { 0 }, the process (Z n ) n≥0 creates a number of master sequences of order m with a reasonable probability, within a time of order ln m.
Assume first that the process (Z t ) t≥0 starts from a neighborhood of one of the fixed points. More precisely, let b ∈ I A \ { 0 } and assume that the starting point is in a small neighborhood of ρ b and is of the form Since, for δ small enough, G is contracting in the intersection of the set will stay inside such a neighborhood for a long time.
Note that for some ε > 0 depending on the neighborhood, A similar inequality holds for points close to z, so that if the neighborhood is small enough, as long as the process is inside it, the number of master sequences will tend to increase geometrically. This is the key idea of the proof of the theorem, which will be carried out in a few different steps: • First we show that from any starting point, the process jumps to a point of the form of z in a finite number of steps, with probability higher than e −εm , for every ε > 0.
• Then we build a deterministic trajectory that, starting from a point of the form of z, creates γm master sequences in less than C ln m steps, with γ, C > 0.
• Finally we show that the process is likely enough to follow the deterministic trajectory.
Before we begin with this strategy, let us give a few auxiliary results. We define the mappings F , F : D −→ D by setting, for r ∈ D and k ∈ { 0, . . . , K } The mappings F and F satisfy The first inequality is always true, while the second one holds asymptotically.
Proof. Recall that M ∞ represents the limit mutation matrix (cf. the appendix A). Let r ∈ D and k ∈ { 0, . . . , K }, we have This last quantity converges to 0 asymptotically, uniformly on r ∈ D. The rest of the lemma can be shown in a similar way.
Results similar to propositions 2.1 and 2.2 hold for the mapping F . The proofs are exactly the same, even if the form of the fixed points is different. More precisely, we have the following result.
The last convergence is a direct consequence of lemma 6.2. Let δ > 0 and define, for b ∈ I A , the sets Let U(δ) denote the union of the sets U b (δ), and let W (δ) denote this same union but where the neighborhood of η 0 has been left out, i.e., Let ε > 0. The mapping G is continuous on the compact set D, it is therefore uniformly continuous on D. In view of the lemma 6.2, δ can be chosen small enough so that for all b ∈ I A , asymptotically, Let z ∈ D \ { 0 }, and note that G(z) = 0. By lemma 5.1, there exist h ∈ N and c > 0 such that, asymptotically, Suppose now that z ∈ U b (δ) for some b ∈ I A , fix w ∈ N and set z ′ = (w, 0, . . . , 0, z b , . . . , z K ). We show next that, for δ small enough, Indeed, note that for any o ∈ P m ℓ+1 satisfying π(o) = z, we have the following asymptotic bound on the probability of creating a master sequence, for some M > 0. Which implies the following lower bound on the probability of jumping from z to z ′ , We use now the lemma 4.1 in order to obtain the following asymptotic bound, The first two quantities go to 0, when m goes to infinity, so that the sum of both is eventually larger that −ε/2. Since z/m ∈ U b (δ), we have We choose δ small enough so that this last quantity is smaller than ε/4. A similar argument shows that δ can be chosen small enough so that the last term is also bounded below by −ε/4, thus giving the desired bound; lim ℓ,m→∞, q→0 ℓq→a We build next a deterministic trajectory (z n ) n≥0 such that z 0 = z and for δ small enough, w large enough, there exist η, C > 0 and N < C ln m satisfying z N 0 ≥ ηm. We set z 0 = z and for n ≥ 1, Denote by φ and φ the maximum and the minimum mean fitness of a popu- We define by N b (δ) the first time of exit of the dynamical system z n from U b (δ): Take δ small enough and w large enough so that, asymptotically, Then, asymptotically, for any n ≤ N b (δ), The sequence (z 0 n ) 0≤n≤N b (δ) is increasing, and bounded below by a geometric sequence with ratio ρ. Let γ > 0, then, ρ n w is larger than γm if n ≥ n(γ) = ln ρ −1 ln γm w .
Therefore, we try to show next that there exists γ > 0 such that N b (δ) is larger than the quantity n(γ). In order to do so, we show in the next lemma that the coordinates (z n k ) 0≤k<b cannot grow at a faster rate than z n 0 . We prove afterwards that the same thing holds for the difference |z n k − η b k |, where b ≤ k ≤ K. Let us choose ε > 0 small enough so that, asymptotically, and let w be such that φ/w < ε. We prove next the following lemma.
Proof. We will prove the lemma by induction on k. The case k = 0 is obviously true. Set k ∈ { 1, . . . , b − 1 } and suppose that the statement of the theorem holds for the coordinates 0, . . . , k − 1. Then, we have The sequence (z n 0 ) 0≤n≤N b (δ) is increasing, and thus, φ/z n−1 By the induction hypothesis Iterating this inequality, and noting that z 0 k = 0, we obtain Yet, asymptotically, We take c k to be equal to the right hand side of this inequality, which fulfills the proof of the lemma.
Lemma 6.5. There exist constants c b > 0 and 0 < c δ < 1 such that for n ≤ N b (δ), asymptotically, we have Proof. For n ≥ 0 and b ≤ k ≤ K, we have, noting that η b is a fixed point of the mapping F , Thus, We have, Reporting back in the inequality for |z n k − mη b k |, we get, Summing from k = b to K, and recalling that, asymptotically, F is contracting on the set U b (δ) ∩ D b , we deduce the existece of a constant c δ < 1 such that Using the previous lemma, we get that, asymptotically, for some constant C depending on δ only. Iterating this inequality, and noting that for t ≤ n, we have z n−t 0 ≤ ρ −t z n 0 , we conclude that

The proof is achieved by taking
As a consequence of these two lemmas, we have Thus, taking γ(K + 1) < min { δ/c 1 , . . . , δ/2c b }, then N b (δ) ≥ n(γ), as wanted. We show next that the process (Z n ) n≥0 has a fairly high probability to follow the deterministic trajectory that we have just built.
Lemma 6.6. Let z 0 = (w, 0, . . . , z b , . . . , z K ) ∈ U b (δ/2) and let (z n ) n≥0 be the trajectory built from z 0 by setting z n = ⌊mF (z n−1 /m)⌋, and let γ be as above. We have, Proof. We have, For any 0 ≤ n ≤ n(γ), as in the proof of the large deviations principle 4.2, Let o ∈ P m ℓ+1 be such that π(o) = z n . We have, Thus, being a constant that depends on K but not on m. Next we bound the quantity involving the rate function I. Recall that The function F has been defined so that for all x ∈ S ℓ , F k π(x) ≤ F k (x), for 0 ≤ k ≤ K. Therefore, for all o ∈ P m ℓ+1 such that π(o) = z n , Thus, The argument of the logarithm is larger than 1, and for all x ≥ 0, ln(x) ≤ x − 1. Therefore, the above quantity is bounded by For any On one hand, for any h ∈ { 0, . . . , ℓ } the sum M(h, 0) + · · · + M(h, K) is bounded by a constant c which is strictly smaller than 1. Thus, |π K (F (x))| 1 is bounded above by this same constant c, uniformly on x ∈ S ℓ . On the other hand, for 0 ≤ k ≤ K, since π(o) = z n , Yet, there exists a positive constant c ′ such that asymptotically, M(h, k) ≤ c ′ /m, for 0 ≤ k < h ≤ ℓ. Therefore, the above quantity is bounded by c ′ /m. We conclude that Therefore, Since n(γ) is of the order of ln m, we see that this last quantity goes to 0 when m goes to infinity, as wanted.
Combining the previous lemmas we obtain the proof of theorem 6.1. Once the process Z n has γm master sequences, it will converge to ρ 0 in a few steps. Indeed, let δ > 0 and define We have the following corollary.
Corollary 6.7. Let δ > 0. There exists a positive constant C such that for Proof. Let γ > 0 be small enough and let C ′ associated to γ as in theorem 6.1. Then, for any C > C ′ and z ∈ D \ { 0 }, we have, By lemma 5.1, there exist h ∈ N and c > 0 such that, asymptotically, for Thus, taking C such that ⌊C ln m⌋ − ⌊C ′ ln m⌋ > h, and in view of theorem 6.1, we obtain the result of the corollary.

Persistence time
We assume throughout this whole section that A(0)e −a > 1. The aim of this section is to compute the expected hitting time of 0 for the process (Z n ) n≥0 . In order to ease the readability of the upcoming formulas, we denote the quantity V (ρ 0 , 0) simply as V . Let us define We will prove the following result.
Let ε > 0. We first show that there exists a constant C > 0 such that Let γ > 0, z ∈ D \ { 0 }, and assume first that z 0 > γm. Define the sequence (r n ) n≥0 by setting r 0 = z/m and The mapping V 1 is continuous on the first argument in a neighborhood of ρ 0 ; let us choose δ small enough so that Moreover, for δ small enough there exists h ∈ N such that for all r ∈ D satisfying r ≥ γ, and for all n ≥ h, we have Indeed, by the theorem 2.2, δ can be chosen sufficiently small so that the δ-neighborhood of ρ 0 is contracting. By continuity of the map G, for all r ∈ D such that r 0 > γ, there exists δ r > 0 and h(r) ∈ N such that if The set D γ = { r ∈ D : r 0 ≥ γ } is compact and the family { B(r, δ r ) : r ∈ D γ } is an open cover of the set D γ . Thus, there exist r 1 , . . . , r N ∈ D γ such that Set h to be the maximum of h(r 1 ), . . . , h(r n ). Then, for all r ∈ D γ and n ≥ h, we have |G n (r) − ρ 0 | < δ. Let h ′ ≥ 0 and let (t i ) 0≤i≤h ′ be a sequence in D satisfying Consider next the sequence (s i ) 0≤i≤h+h ′ +1 defined by Proceeding as in the proof of the large deviations principle 4.2, we see that Then, Thus, asymptotically, uniformly on z ∈ D γ . Suppose now that z ∈ D γ . By theorem 6.1, there exist C ′ > 0 such that Thus, for every z ∈ D \ { 0 }, Taking C such that ⌊C ln m⌋ ≥ ⌊C ′ ln m⌋ + l, we conclude that for every z ∈ D, P z τ 0 ≤ ⌊C ln m⌋ ≥ e −m(V +2ε) .
Proceeding as in corollary 5.2 we obtain that, for every h ≥ 1 and z ∈ D\{ 0 }, Thus, We conclude that, for every z ∈ D, We send ε to 0 and we obtain the desired upper bound. We proceed now to the proof of the lower bound. Let us set δ > 0 and let us define, for x ∈ D the δ-neighborhood of x by Let τ δ be the hitting time of the δ-neighborhood of 0, Obviously, τ δ ≤ τ 0 . We will first show that for every z ∈ U δ (ρ 0 ), lim inf ℓ,m→∞, q→0 ℓq→a, m/ℓ→α In order to ease the notation in the sequel, we set V δ to be the infimum appearing in the above formula, and we write P z , E z for the probabilities and expectations associated with the process (Z n ) n≥0 starting from z. Using Markov's inequality, for all T ≥ 0 Thus, we set T ≥ 0 and we bound the probability of the event { τ δ < T }. Let us denote by T 0 the last time before τ δ that the process is in U δ (ρ 0 ), i.e., We will bound the probability of the event { τ δ < T } by studying the trajectory of the process (Z n ) n≥0 between T 0 and τ δ . The idea is the following, either the trajectory (Z n ) T 0 <n<τ δ spends a long time outside a neighborhood of ρ 0 and 0, which is very unlikely (lemmas 5.1, 6.1 and corollary 5.2), or it jumps in a few steps from one fixed point to another until reaching 0, in which case the lower bound of the large deviations principle 4.2 will give us the desired estimate. We prove first a useful lemma.
Lemma 7.2. Let ε > 0. For all T > e 2εm and for all z ∈ D \ { 0 }, we have, asymptotically, Proof. Let ε, γ > 0, as we have shown in the proof of the upper bound, there exists h ∈ N such that for every z ∈ D satisfying z 0 ≥ γm, In view of theorem 6.1, there exists C > 0 such that for every z ∈ D \ { 0 }, Thus, taking l = ⌊C ln m⌋ + h, we have, for every z ∈ D \ { 0 }, Proceeding as in corollary 5.2, for every j ∈ N and for every z ∈ D \ (U δ (ρ 0 ) ∪ U δ (0)), Taking j > e 2εm /l, this probability is smaller than e −εm , as wanted.
We continue now with the proof of the lower bound. We have, for T ≥ 0, z ∈ U δ (ρ 0 ), and k ≤ T , The first of the terms in the right-hand side of ( †) can be bounded thanks to the large deviations principle 4.3. Indeed, if 0 ≤ t 0 < t * < T and t * − t 0 < k, we have where C(K) is a positive constant that depends on K but not on m. We have then, Yet, thanks to the large deviations principle of corollary 4.3, We deal next with the second term in the right-hand side of ( †). Let us define the set U to be the union of all the δ-neighborhoods of the fixed points ρ b , b ∈ I(A), and the set W to be this same union, but where the neighborhoods of ρ 0 and 0 have been left out, i.e., We define the random time T * 1 by T * 1 = min n ≥ T 0 : Z n ∈ W . We break the second term of the right-hand side of ( †) as follows, The first of the sums in the right-hand side of ( ‡) can be bounded thanks to lemma 7.2. Indeed, if t * − t 0 > e εm , then The second of the sums in the right-hand side of ( ‡) can be bounded thanks to corollary 5.2. Let h and c be as in corollary 5.2. Then, we have, for 0 ≤ t 0 < t * < T and t * − t 0 > k, where C ′ (K) is a positive constant that depends on K but not on m. Thus, In order to bound the last sum in ( ‡), we introduce, for b ∈ I(A), the random time . We decompose the last term in ( ‡) as follows, For a given b ∈ I(A) \ { 0, K + 1 }, we decompose further the above sum, by considering the three following cases: • If t * 1 − t 0 > k, then, for some positive constant C(K) that depends on K only, the above sum can be bounded by • If t * 1 − t 0 < k and t * − t 1 < k, then the sum can be bounded thanks to the large deviations principle in corollary 4.3, which gives the following bound: • If t * 1 − t 0 < k and t * − t 1 > k, then we define the set W −b by and we define the hitting time of W −b after the time T 1 by Then, the sum can be bounded by The first of the sums is again bounded by In order to bound the second sum, we can break it again in three different cases, and iterate this same procedure until we exhaust the fixed points in the set I(A). We will then get 3|I(A)| summands, each of them being bounded by where M(K) is a natural number depending on K only. We choose k large enough so that We set Then, taking ε small enough so that M(K)ε < δ, and putting the above estimates together, we conclude that, asymptotically, We deduce from here that, for every z ∈ U δ (ρ 0 ), Now let z ∈ D \ { 0 } and note that, from the proof of theorem 6.1 we can deduce that there exists C > 0 such that Therefore, for every T ≥ ⌊C ln m⌋, we have

Thus, for any
We let δ go to zero and we get the desired result.
In fact the above calculations tell us further, that for every δ > 0, there exists ε > 0 such that asymptotically,

Concentration near ρ 0
We assume throughout this whole section that A(0)e −a > 1. Our purpose is to study the behavior of the process (Z n ) n≥0 inside the set D, in order to show that it spends most of its time close to the fixed point ρ 0 . Let δ > 0 and denote by U δ the δ-neighborhood of ρ 0 , i.e., As previously, we write U δ = mU δ ∩ D. We also write V (ρ 0 , 0) simply as V . We introduce the following stopping times: set T 0 = 0 and

Set also
Our purpose is to estimate the quantity Noting that the argument in the expectation is bounded by τ 0 , for any i * ∈ N, we can break the above expectation as follows: According to the corollary 6.7, for any δ, ε > 0, there exists C = C(δ, ε) > 0, such that asymptotically, for any z ∈ D \ { 0 }, From this inequality we deduce the following bound.
Corollary 8.1. Let δ, ε > 0. There exists C = C(δ, ε) > 0 such that, asymptotically, for every z ∈ D \ { 0 }, The proof is similar to that of corollary 5.2. Thanks to this bound, asymptotically, for any z ∈ D \ { 0 } We conclude that, for any i * ∈ N and ε > 0, Let η > 0 and define t η m = e m(V +η) . Then, Let us begin by bounding the first term on the right-hand side of this inequality. We have, for every n ∈ N and z ∈ D \ { 0 }, When proving the upper bound for the persistence time (cf. the section 7), we have shown the following inequality: for every γ > 0, there exists C > 0 such that Using this inequality with γ = η/2, and setting n = h⌊C ln m⌋, we get Yet, if h = t η m /⌊C ln m⌋ , we have, And since the above expectation goes to 0 when m goes to infinity. We deal now with the remaining term. Let τ denote the exit time of the process (Z n ) n≥0 from the set U 2δ , i.e., τ = inf n ≥ 0 : Z n ∈ U 2δ .
We have the following bound on τ .
Lemma 8.2. There exist γ, γ ′ > 0 such that, asymptotically, for all z ∈ U δ , Proof. Define S to be the last time before τ that the process is in U δ , i.e., For any n ≥ 1, Let h = h ≥ 2 and c = c > 0 be as in corollary 5.2. For a given value of s, we split the sum over t in two parts: We study next the first sum, when t > s + h. We condition on the state of the process at time s + 1. By the Markov property, t:t>s+h Since the set U 2δ \ U δ contains none of the fixed points, and since t > s + h, by corollary 5.2, this last probability is smaller than exp(−mc⌊(t−s−2/h)⌋). Therefore, We bound next the second sum. Conditioning on the state at time s: Using the large deviation principle of corollary 4.3, since h is fixed, for any Recall that δ has been chosen small enough so that G(U δ ) ⊂ U δ . Thus, the above infimum is strictly positive. We deduce that there exists c ′ > 0 (depending on d) such that t:1≤t≤h the bound being uniform over z ′ ∈ U δ .
Let us go back to the inequality We set i * = 2e m(V +η−γ) . Then, combining the previous lemma with the lemma B.1, there exists C > 0 such that which goes to 0 when m goes to infinity. We conclude, by choosing η, γ, ε

The neutral phase
The aim of this section is to study the process (O n ) n≥0 when none of the classes 0, . . . , K are present in the population. Nevertheless, instead of using the occupancy process (O n ) n≥0 for our study, we will use a related process, namely the distance process (D n ) n≥0 . The distance process is a Markov chain on { 0, . . . , ℓ } m ; an element d ∈ { 0, . . . , ℓ } m , is a vector that represents the distances to the master sequence of the m individuals present in the population. The transition matrix p H of the distance process is given by .
The distance process and the occupancy process are related by a standard lumping procedure (cf. section 4 of [2]). The distance process has been studied in detail in section 8 of [2]. We state next some of the results therein, and we give a simple argument in order to obtain the remaining estimates that we will need. Let k ≥ 0 We are interested in measuring the hitting time τ * k of the set of populations containing the classes 0, . . . , k. Let us define, with a slight abuse of notation, The hitting time τ * k is then defined by The dynamics of the process D n , started form any point in the set N K , and until the time τ * K , is the same as if the fitness landscape were neutral. Since we are ultimately interested in the hitting time τ * k for k ≥ K, we will assume throughout the rest of this section that the fitness function A is constant and equal to 1.
Neutral hypothesis. Throughout this section we assume that A(k) = 1 for all k ≥ 0. Section 8 of [2] is concerned with estimating the hitting time τ * 0 . The main results therein that are of interest to us are contained in section 8.3; we summarize them next.
• Concerning the expectation of the hitting time τ * 0 , for any d ∈ N 0 , lim ℓ,m→∞, q→0 ℓq→a, m/ℓ→α • Concerning the concentration of τ * 0 around its mean, for any ε > 0 and d ∈ N 0 , lim inf ℓ,m→∞, q→0 ℓq→a, m/ℓ→α Since for any k ≥ 0, the set W * 0 is contained in the set W * k , the hitting time τ 0 must be larger than the hitting time τ k . The following lemma is an immediate consequence of these observations. Lemma 9.1. Asymptotically, for any k ≥ 0, ε > 0 and d ∈ N k , Our next purpose is to find a lower bound for τ * k . Consider a population d ∈ D k ; there exists an individual at distance at most k from the master sequence. On one hand, the probability for this individual to be chosen for reproduction is bounded below by min 0≤i≤k A(i)/mA(0) .
On the other hand, the probability for this individual to transform into the master sequence by mutation is bounded bellow (at least asymptotically) by Combining these two facts, we deduce that, asymptotically, for any d ∈ W * k , for some M > 0 that depends on k but not on m, ℓ, q. This inequality will be the key to bound the hitting time τ * k . Indeed, since every time we hit the set W * k , we have a fairly high probability of hitting the set W * 0 in one step, and since the hitting time τ * 0 is large with high probability, the hitting time τ * k cannot be very small. We formalize this idea in the following lemma: Proof. Let k ≥ 0, ε > 0 and let d ∈ N k . Define δ by −δ = lim inf ℓ,m→∞, q→0 ℓq→a, m/ℓ→α Set T = κ ℓ(1−ε) and let N ∈ N. Asymptotically, . The first of the terms is bounded by exp(−δℓ/2), while the second one can be bounded by Yet, by the Markov property, summing the second probability over f , we get Using the Markov property again on the third probability, we conclude that Iterating this inequality, we obtain Thus, taking 0 < γ < ε/4 and letting N = κ ℓγ , we conclude from here that, asymptotically, sup Yet, in view of the result regarding the concentration of τ * 0 around its mean, we must have δ = 0.

The supercritical case
Define the function a → ψ(a) to be equal to V (ρ 0 , 0) on ]0, ln A(0)[ , and to be equal to 0 elsewhere. We suppose that αψ(a) > ln κ, so that in particular, A(0)e −a > 1, and ρ 0 is well defined. Recall that the aim is to show that, for any continuous and bounded function f : Let f : R K+1 −→ R be a continuous, bounded function. By the ergodic theorem for Markov chains, Let ε > 0. We will prove that this last quantity is smaller than ε, for m, ℓ large enough, q small enough, and ℓq, m/ℓ close enough to a, α. We break the state space P m ℓ+1 into two disjoint subsets, the populations containing at least an individual in one of the classes 0, . . . , K, and the populations containing no individuals in any of the classes 0, . . . , K, The process (O n ) n≥0 will jump between these two sets. We define the following sequence of stopping times, we set τ 0 = 0 and Set next δ > 0, and define the set U δ to be the δ-neighborhood of ρ 0 , i.e., The set D being compact, the function f is uniformly continuous on D. We choose δ small enough so that for every r ∈ U 2δ , and so that the set U δ satisfies G(U δ ) ⊂ U δ (cf. theorem 2.2). As in the previous sections, for a set A ⊂ D we denote by A the set mA ∩ D. For each k ≥ 0 we define the following sequence of stopping times, we set T k,0 = τ * k and We distinguish between three different situations: is outside a neighborhood of ρ 0 . We bound the above sum by breaking it according to these three situations, which gives the following bound, The next step is to bound the above sums. We start with the first one of them. We define, for n ≥ 1, the random variable ι(n) by ι(n) = max k ≥ 0 : τ k−1 < n .
We can rewrite the sum with the help of this new random variable as Define by τ (N K ) the hitting time of N K , i.e., By the last remark in the section 7, there exists a number γ > 0 such that, max z∈U δ P z τ (N K ) ≤ e m(V −ε) < e −γm .
Let ι(n) and i n be as in the previous section. Taking the expectation, the above sum can be bounded by Noting that ι * (n) ≤ ι(n), the first term can be shown to converge to 0 as n goes to ∞, as in the previous section. Let us deal with the expectation. We introduce the following stopping times: set T 0 = 0 and T * 1 = inf n ≥ T 0 : Z n ∈ U δ T 1 = inf n ≥ T * 1 : Z n ∈ U 2δ . . . . . .
Fix k ∈ { 1, . . . , i n }. By the Markov property, Yet, as we have shown in section 8, the last expectation is bounded by e m(V −γ) , for any γ > 0. Therefore, which, choosing ε < γ, converges to 0 when m goes to ∞.

The subcritical case
We suppose that αψ(a) < ln κ. Recall that the aim is to show that, for any continuous and bounded function f : R K+1 −→ R τ k − τ * k + n − τ * ι(n) .
Let i ≥ 1. Since this quantity is obviously bounded by n, we can decompose it as according to whether ι(n) is greater or smaller than i and bound it as follows Since for every o ∈ W * ≤ nP κ(n) ≥ i + i exp m ψ(a) + ε .
by the different of two independent binomial laws, i.e., if X ∼ Bin(i, q/(κ − 1)) and Y ∼ Bin(ℓ − i, q) are independent random variables, then M H (i, j) = P (i − X + Y = j) .
Fix i and j and let ℓ go to infinity, q go to 0, and ℓq go to a; the first of the binomial laws converges to a Dirac mass at 0, while the second one convergence to a Poisson random variable of parameter a. Thus, In particular, in the limit, there is no back mutation. Furthermore, for ℓ large enough, q small enough, and ℓq close enough to a, ∀ i > j M H (i, j) ≤ M H (j + 1, j) .

B Bounds on hitting times
Let E be a finite set and (X n ) n≥0 a recurrent Markov chain on E. For a set A ⊂ E we denote by τ A the hitting time of A, i.e., τ A = inf n ≥ 0 : X n ∈ A .
Let A ⊂ B ⊂ E and define the following sequence of stopping times, we set T 0 = 0 and T * 1 = inf n ≥ 0 : X n ∈ A T 1 = inf n ≥ T * 1 : X n ∈ B . . . . . .
Our objective is to give a bound on the random variable ι(n). Let us assume that there exist N, p > 0 such that max z∈A P τ E\B ≤ N X 0 = z < p . Proof. Let us assume that hλ is an integer number (otherwise we may replace it by ⌊hλ⌋). From the definition of ι(n), we see that We define the random variables (Y i ) i≥1 by setting In view of the assumption on τ E\B , for every i ≥ 1, We define the following sequence of Bernoulli random variables Thus, if T h−1 < hλN, at least (h − 1)λ of the random variables Y 1 , . . . , Y h−1 must satisfy Y i ≤ N. Whence, P T h−1 < hλN ≤ P ε 1 + · · · + ε h−1 ≥ (h − 1)λ .