Stochastic individual-based models with power law mutation rate on a general finite trait space

We consider a stochastic individual-based model for the evolution of a haploid, asexually reproducing population. The space of possible traits is given by the vertices of a (possibly directed) finite graph $G=(V,E)$. The evolution of the population is driven by births, deaths, competition, and mutations along the edges of $G$. We are interested in the large population limit under a mutation rate $\mu_K$ given by a negative power of the carrying capacity $K$ of the system: $\mu_K=K^{-1/\alpha},\alpha>0$. This results in several mutant traits being present at the same time and competing for invading the resident population. We describe the time evolution of the orders of magnitude of each sub-population on the $\log K$ time scale, as $K$ tends to infinity. Using techniques developed in [Champagnat, M\'el\'eard, Tran, 2019] we show that these are piecewise affine continuous functions, whose slopes are given by an algorithm describing the changes in the fitness landscape due to the succession of new resident or emergent types. This work generalises [Kraut, Bovier, 2019] to the stochastic setting, and Theorem 3.2 of [Bovier, Coquille, Smadi, 2018] to any finite mutation graph. We illustrate our theorem by a series of examples describing surprising phenomena arising from the geometry of the graph and/or the rate of mutations.


INTRODUCTION
Adaptive dynamics is a biological theory that was developed to study the interplay between ecology and evolution. It involves the three mechanisms of heredity, mutations, and natural selection. It was first introduced in the 1990ies by Metz, Geritz, Bolker, Pacala, Dieckmann, Law, and coauthors [29,17,22,4,5,16], who mostly considered a deterministic setting but also heuristically mentioned first stochastic versions. A paradigm of adaptive dynamics is the separation of the slow evolutionary and the fast ecological time scales, which is a result of reproduction with rare mutations. Invasion, fixation or extinction of a mutant population is determined by its invasion fitness, that describes the exponential growth rate of a single mutant in the current (coexisting) population(s) at equilibrium. This work was partially supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany's Excellence Strategy GZ 2047/1, projekt-id 390685813 and GZ 2151, project-id 390873048 and through the Priority Programme 1590 "Probabilistic Structures in Evolution". This work was also partially funded by the Chair Modélisation Mathématique et Biodiversité of VEOLIA-Ecole Polytechnique-MNHN-F.X. The authors thank Anton Bovier for stimulating discussions at the beginning of this work and comments, and L. Coquille and C. Smadi also thank him for his invitations and welcome at Bonn University. The authors would also like to thank Sylvain Billiard for pointing out some bibliographical references.
Stochastic individual-based models of adaptive dynamics have been rigorously constructed and first studied in the seminal work of Fournier and Méléard [21], and there is now a growing literature on these models. The population consists of a collection of individuals who reproduce, with or without mutation, or die after random exponential times depending on the current state of the whole population. The population size is controlled by a carrying capacity K which represents the amount of available resources. This class of models has first been studied in the original context of separation between evolutionary and ecological time scales. That is in the joint limit of large populations and rare mutations such that a mutant either dies out or fixates before the next mutation occurs. Mathematically this amounts to considering a probability of mutation satisfying in particular µ K 1/K log K as K → ∞.
(1.1) We will call this regime 'rare mutation regime' in the sequel. The description of the succession of mutant invasions, on the mutation time scale 1/Kµ K , in a monomorphic [7] or polymorphic [9,2] asexual population gives rise respectively to the so-called Trait Substitution Sequence or Polymorphic Evolution Sequence. Extensions of the question to sexual populations were then studied, both in the haploid [38,12] and the diploid [11,32] cases.
It is natural to consider the effect of a higher mutation rates, where mutation events are no longer separated, if we want to describe several mutant traits being present microscopically at the same time and competing for invading the resident population. The mutation rate given by µ K = K − 1 α , for α > 0 (1.2) was considered in different contexts [19,39,6,10] and will be the concern of the present paper. Notice that another mutation scale has been considered in [2,3] to model the interaction of few mutants in the case without recurrent mutations, namely µ K of order 1/K log K.
Another approach to adaptive dynamics has been introduced by Maynard Smith [28] under the name of adaptive walks. This was further developed by Kauffman and Levin [23] and many others, as mentioned below. Here, a given finite graph represents the possible types of individuals (vertices) together with their possibilities of mutation (edges). A fixed, but possibly random, fitness landscape assigns real numbers to the vertices of the graph. The evolution of the population is modelled as a random walk on the graph that moves towards higher fitnesses. This can be interpreted as the adaptation of the population to its environment. In contrast to the adaptive dynamics context, this fitness landscape is not dependent on the current state of the population. Adaptive walks move along edges towards neighbours of increasing fitness, according to some transition law, towards a local or global maximum. In particular, in such models it is not possible for a population to cross a fitness valley. This is partially solved by a variation of this model, called adaptive flight [31]. It consists in a walk jumping between local fitness maxima, before eventually reaching a global maximum. The questions of the distribution of maxima [33], the typical length of a walk [34], or the typical accessibility properties of the fitness landscape [25,37,1] have been studied under different assumptions on the graph structure, the fitness law, or the transition law of the walk. Moreover, comparisons of these models with actual empirical fitness landscapes have been performed in [40]. As Kraut and Bovier showed [24], adaptive walks and flights arise as the limit of individual-based models of adaptive dynamics, when the large population followed by the rare mutations limit is taken. They also conjecture, and this will be proved in the present article, that similar results hold in the stochastic setting under the mutation rate (1.2), as we detail below.
In this paper, we consider an individual-based Markov process that models the evolution of a haploid, asexually reproducing population. The space of possible traits is given by the vertices of a (possibly directed) finite graph G = (V, E). The evolution of the population is driven by births, deaths, and competition rates, which are fixed and depend on the traits, as well as mutations towards nearest neighbors in the graph G. We start with a macroscopic initial condition (that is to say of order K, see Definition 2.1) and we are interested in the stochastic process given by the large population limit under the mutation rate (1.2). We describe the time evolution of the orders of magnitude of each sub-population on the log K time scale, as K tends to infinity. We show that the limiting process is deterministic, given by piecewise affine continuous functions, which are determined by an algorithm describing the changes in the fitness landscape due to the succession of new resident or emergent types.
This work constitutes an extension of the paper by Kraut and Bovier [24] to the stochastic setting. They consider the deterministic system resulting from the large population limit of the individual-based model (K → ∞), and let the mutation probability µ tend to zero. By rescaling the time by log(1/µ), they prove that the limiting process is a deterministic adaptive walk that jumps between different equilibria of coexisting traits. A corollary of our results gives the same behaviour, on the log K time scale, for the stochastic process under the scaling (1.2) for α larger than the diameter of the graph G. Kraut and Bovier also study a variation of the model, where they modify the deterministic system such that the subpopulations can only reproduce when their size lies above a certain threshold µ α . This limits the radius in which a resident population can foster mutants, and mimics the scaling (1.2) that we consider. The resulting limiting processes are adaptive flights (which are not restricted to jumping to nearest neighbours), and thus can cross valleys in the fitness landscape and reach a global fitness maximum. We obtain the same behaviour, on the log K time scale, for the stochastic process under the scaling (1.2) without any restriction on α.
The results of the present paper can also be seen as a generalisation of Theorem 3.2 in [6] by Bovier, Coquille and Smadi to any finite trait space. Indeed, they consider the graph with vertices V = {0, . . . , L} embedded in N and choose parameters such that the induced fitness landscape exhibits a valley: mutant individuals with negative fitness have to be created in order for the population to reach a trait with positive fitness.
Several speeds of the mutation rate are considered, and in particular, when α > L, the exit time of the valley is computed on the log K time scale. This becomes a corollary of our results, and we can give an algorithmic description of the rescaled process for more general graphs endowed with a fitness valley, as we discuss in several examples in Section 3.
Our proof heavily relies on couplings of the original process with logistic birth and death processes with non-constant immigration, and the analysis of the later simpler processes on the log K time scale. This approach was developed by Champagnat, Méléard and Tran in [10]. They consider an individual-based model for the evolution of a discrete population performing horizontal gene transfer and mutations on V = [0, 4] ∩ δN, δ > 0. Their goal is to analyze the trade-off between natural selection, which drives the population to higher birth-rates, and transfer, which drives the population to lower ones. Under the mutation rate (1.2), they exhibit parameter regimes where different evolutionary outcomes appear, in particular evolutionary suicide and emergence of a cyclic behavior. As in the present paper, their results characterize the time evolution of the orders of magnitude of each sub-population on the log K time scale, which are shown to be piecewise affine continuous functions whose slopes are given by an algorithm describing the succession of phases when a given type is dominant or resident. Their proofs provide us with the main ingredients needed for our results. However, the graph structure they choose simplifies the inductions and we have to generalise their approach to treat the case of more general graphs, in the proof spirit of Kraut and Bovier [24].
Our results are general, and could be applied to have a better understanding of evolutionary trajectories in complex fitness landscapes. There are now more and more empirical studies of fitness landscapes (see [14] for a comprehensive review of data and tools up to 2014 for instance), and the probability and effect of specific mutations in given landscapes are better and better understood. For instance oriented mutation graphs can stem from mutation bias, through codon usage bias or similar molecular phenomena which make some mutations more probable than others [35].
We present a series of specific examples where surprising phenomena arise from the geometry of the graph G and/or the rate of mutations (1.2). Most of them could not happen under a different scaling of mutation rates.
− In Example 1, we describe a scenario where the ancestry of the resident population consists, with high probability, of back mutations towards a previously extinct trait, although the mutations that happen in between are not deleterious. In other words, the final resident individuals, say of trait v, although they can be produced from a wild type directly, come with high probability from a sequence of non deleterious mutations which went back to the wild type before mutating to v. This phenomenon can also happen in the regime (1.1), that is for α ∈ (0, 1), on the mutation time scale (1/Kµ K log K), where invading mutants fully replace the resident population before a new mutant arises. We show that it can still occur for higher mutation rates of the form (1.2), on a log K time scale, when parameters are chosen such that temporary extinction of the original trait is likely. Such mutational reversions have been observed (see [15] for instance). − If evolution and mutation time scales are separated (i.e. in the regime (1.1)), mutations occur one at a time, and the number of successive resident traits from the wild type to the type gathering k successively beneficial mutations is k. This is not the case if mutations are faster, in which case it is possible to observe either more or less successive resident traits. We will show this in Examples 2 and 3. − In Example 4, we show that adding a new possible mutation path towards a fit trait can increase the time until it appears macroscopically. This is in the spirit of the paradox called price of anarchy in game theory or more specifically Braess paradox in the study of traffic networks congestion. Motter showed that this paradox may often occur in biological and ecological systems [30]. He studies the removal of part of a metabolic network to ensure its long term persistance, with applications to cancer, antibiotics and metabolic diseases. Another field of application is the food webs management, where selective removal of some species from the network can potentially have a positive outcome of preventing a series of further extinctions [36]. − Another counter-intuitive phenomenon arising from the mutation rate (1.2), presented in Example 5, is the possibility to observe, for a cyclic clockwise oriented mutation graph, successive counter-clockwise resident populations. This means that the macroscopic succession of resident traits is not necessarily representative of the mutation graph. In particular, this may call into question the interpretation in terms of mutation graphs of some experiments in experimental evolution (see [27] for instance). − In Examples 6 and 7, we show that the mutation rate (1.2) does not restrict the range of the corresponding adaptive flights on the trait space, i.e. the distance that the limiting process can jump, to α . − We finally study the framework of fitness valley crossings. Combining our results with Theorem 3.3 of [6], we construct Examples 8 and 9, where effective random walks on the trait space appear on the time scale K β , for some positive β. Those limiting adaptive flights arise as a result of a "fast" equilibration on the log K time scale followed by exponential waiting times until fitness valleys get crossed. This makes sense biologically, since there may be traits with positive invasion fitness that can be reached through several consecutive mutation steps [26,13].
The remainder of this paper is organised as follows. In Section 2 we define the model and present our results. In Section 3 we illustrate our results by a series of examples describing surprising phenomena arising from the geometry and/or the rate of mutations. Section 4 is devoted to the proofs. In the Appendix, we present and extend some technical results.
2. CONVERGENCE ON THE log K-TIME SCALE 2.1. Model. We consider an individual-based Markov process that models the evolution of a haploid, asexually reproducing population. The space of possible traits is given by the vertices of a (possibly directed) finite graph G = (V, E).
For all traits v, w ∈ V and every K ∈ N, we introduce the following parameters: Remark 1. We could also allow for µ K to depend on v ∈ V as long as µ K (v) = µ K h(v) for some strictly positive function h that is independent of K. However, this would not change the characterisation of the limit, and hence we assume a constant µ K to simplify the notation.
Moreover, we assume that, for every v ∈ V, c v,v > 0. The parameter K is scaling the competitive pressure and, through this self-competition, fixes the equilibrium size of the population to the order of K. K is sometimes called carrying capacity and can be interpreted as a scaling parameter for the available sources of food or space.
As a consequence of our parameter definitions, the process N K is characterised by its infinitesimal generator: where φ : N V → R is measurable and bounded. Such processes have been explicitely constructed in terms of Poisson random measures in [21]. Due to the scaling of the competition c K , the equilibrium population is of order K. Since the mutation probability µ K tends to zero as K → ∞, the process N K /K converges (on finite time intervals) to the mutation-free Lotka-Volterra system (2.3) involving all initial coexisting resident traits. We are interested in the long-term evolution of the population and want to study successive invasions by new mutant populations. Given the fact that a mutant population that is initially of order K γ , γ < 1, needs a time of order log K to grow exponentially to the order of K, we have to rescale the time by log K to obtain a non trivial limit.
It is convenient to describe the population size of a certain trait v ∈ V by its Kexponent Since the population size is restricted to order K by the competition, β K v ranges between 0 and 1, as K → ∞ (see Corollary A.6 for a rigorous statement).
For the sake of readability, we now introduce the terminology we will use in the sequel.
The set of living traits is the set {v ∈ V : β K v > 0}. When K is large enough, the macroscopic traits interact on any finite time interval according to the corresponding mutation-free Lotka-Volterra system (see Chapter 11, Theorem 2.1 in [20] for the proof of this law of large numbers): Let v ⊂ V, then the mutation-free Lotka-Volterra system associated to v iṡ For a subset v ⊂ V of traits, we denote byn(v) ∈ R V + the unique equilibrium of the Lotka-Volterra system (2.3), when it exists, and where to simplify notations, we extend it byn w (v) = 0 for w v. In the case where v = {v}, we obtain from classical results on Lotka-Volterra models (see [7] for instance) If v denotes the set of macroscopic traits, we call the traits v ∈ v such thatn v (v) > 0 resident.
The approximate rate at which a mutant of trait w grows in a population of coexisting resident traits v is called invasion fitness and is denoted by f w,v , where If f w,v > 0, the trait w is called fit. If f w,v < 0, the trait w is called unfit. The case f w,v = 0 will be excluded (see Remark 2).
Mutants can be produced along (directed) edges of the graph. We denote by d(v, w) the graph distance, i.e. the length of the shortest

2.2.
Results. Let a finite graph G = (V, E) be given and assume that α ∈ R * + \ N and f w,v 0 for any w ∈ V and v ⊂ V (see Remark 2). The two following results concern the convergence of the orders of the different subpopulation sizes to a piecewise linear trajectory, whose slopes and times of slope changes can be explicitely expressed in terms of the parameters.
Theorem 2.2. Let a finite graph G = (V, E) and α ∈ R * + \ N be given and consider the model defined by (2.1). Assume that f w,v 0 for any w ∈ V and v ⊂ V. Let v 0 ⊂ V and assume that, for every w ∈ V, , which is defined as follows: (i) If the mutation-free Lotka-Volterra system (2.3) associated to v 0 has a unique positive globally attractive equilibrium, the initial condition of β is set to β w (0) : Otherwise, the construction is stopped and T 0 is set to 0. (ii) The increasing sequence of invasion times is denoted by (s k ) k≥0 , where s 0 := 0 and, for k ≥ 1, Here, v k denotes the set of coexisting resident traits of the Lotka-Volterra system that includes v k−1 and the trait w ∈ V\v k−1 that satisfies β w (s k ) = 1.
where, for any w ∈ V, is the first time in [s k−1 , s k ] when this trait arise. (iv) The inductive construction is stopped and T 0 is set to s k if (a) there is more than one w ∈ V\v k−1 such that β w (s k ) = 1; (b) the Lotka-Volterra system including v k−1 and the unique w ∈ V\v k−1 such that β w (s k ) = 1 does not have a unique stable equilibrium; Remark 2. Notice that conditions (a), (c), and (d) of point (iv) are here to exclude very specific and non generic cases where one coordinate reaches 1 while another reaches 1 or reaches 0 from above, or a new trait arises at the exact same time. They are difficult to handle for technical reasons.
Moreover, we exclude the cases where α ∈ N. They would produce mutant populations, at distance α from the resident traits, that can neither be approximated by subnor super-critical branching processes. The same applies to the case f w,v = 0, where the population can both grow and shrink due to fluctuations.
Remark 3. The t w,k do not keep track of traits that die out in [s k−1 , s k ] and then reappear. However, since the fitnesses do not change between invasions, such a trait would have a negative invasion fitness (else it would not die out). Hence, it would not start growing on its own if it reappears, but only follow along another trait due to mutants. It would therefore not contribute to the maximum over u ∈ V in (2.5). Proposition 2.3. Under the same assumptions and with the same notations as in Theo- (ii) The invasion times s k and the times t w,k when new mutants arise can be calculated as follows. We define the increasing sequence (τ , Given τ and M −1 , we set M : Remark 4. We could allow for more general initial conditions of the form ∅. An inductive application of Lemma A.2, similar to the induction proving (4.9), implies that within a time of order 3. The rest of the results remains unchanged.
Remark 5. The limiting jump process N(t) resembles an adaptive walk or flight, as studied in [34,31,37,33,1]. For a constant competition kernel c v,w ≡ c, we consider the fixed fitness landscape given by Since in this case f w,v = r w − r v , the process jumps along edges towards traits of increasing fitness r.
The above results are in the vein of Theorem 2.1 and Corollary 2.3 in [10]. There are however many differences between the setting considered in [10] and our setting.
Due to the horizontal transfer between individuals, Champagnat and coauthors obtained trajectories where a "dominant" population, i.e. with the size of highest order, could be non resident, i.e. of order negligible with respect to K. They could also witness extinction on a log K time scale as well as evolutionary suicide. The absence of horizontal transfer in our case prevents such behaviours.
We consider a general finite graph of mutations with possible back mutations, whereas their graph was embedded in Z and did not allow for back mutations. We also allow for the coexistence of several resident traits in the population at equilibrium. The two main difficulties in the proofs compared to [10] are thus to handle the generality of the graph of mutations, and to extend some approximation results to the multidimensional case.

SURPRISING PHENOMENA ARISING FROM GEOMETRY AND MUTATION RATE
In this section, we present some non intuitive behaviours of the population process, which stem from the mutation scale or the generality of the mutational graph that we allow for. They are direct applications of Theorem 2.2 and Proposition 2.3, and provide explicit computations of exponents (2.5) and time intervals (2.7).
Several examples are build on directed graphs. Although this is not a necessary condition to obtain the desired phenomena, it allows a simplified study (especially of the decay phases).
We first introduce some notations for the sake of readability.
We write (1) with high probability to mean "with a probability converging to 1 as K → ∞", 3.1. Back mutations before adaptation. In the following, we build an example where the ancestry of the resident population comes from back mutations from an ancestral trait, even if the mutations happening in between are not deleterious. and a fitness landscape given by In this case, Proposition 2.3 implies that on the log K time scale, the rescaled macroscopic population then jumps from traits 0 − 1 − 2 then to coexistence between 0 and 2, followed by the invasion and fixation of 3 which is produced with high probability, due to Condition (3.4), by individuals of type 0 which have the sequence 0 − 1 − 2 as ancestry. In other words, the final resident individuals of trait 3, although they can be produced by individuals of trait 0 directly, come from a sequence of mutations which went around the loop 0 − 1 − 2 of G. Conditions (3.1), summarized on Figure 1, imply phase portrait number 8 in the classification of Zeeman [41]. Condition (3.2) ensures that trait 1 becomes resident before 2. Condition (3.3) is not necessary but allows to simplify the setting. The exponents are drawn on Figure 1.
Note that this scenario can also happen in the rare mutation regime considered in [7] (for example α ∈ (0, 1)): the average waiting time until a mutant of type 1 appears is Once it has appeared, it survives with positive probability and the succession of invasions and fixations above takes place on the log K time scale, separated by mutation events on the K −1+1/α time scale. What is new in our case is that such a scenario can still take place for higher mutation rates than the ones considered in [7], and on a log K time scale.

3.2.
Non-intuitive mutational pathways in the high mutation framework.
3.2.1. Longer or shorter path than expected. If evolution and mutation time scales are separated (i.e. in the rare mutation regime), mutations occur one at a time, and the number of successive resident traits from the wild type to the type gathering k successively beneficial mutations is k. This is not the case if mutations are faster, in which case it is possible to observe either more or less resident traits, as the following examples show.  [10,11]}. Let α > 2, an initial condition given by (n(00), 0, 0, 0) and a fitness landscape given by 00 01 10 11, and 01 11 10, 11 00 f 11,00 < f 01,00 (3.5) f 10,01 > f 11,01 (3.6) In this case, in the rare mutation regime, the rescaled macroscopic population jumps along 00 − 01 − 11.
In the regime of Theorem 2.2, Proposition 2.3 implies that the rescaled macroscopic population directly jumps from 00 to 11 on the log K time scale. More precisely, the exponents are drawn on Figure 2. Condition (3.8) ensures that 11 fixates before 01.   [3, 2b]}. Let α > 3, an initial condition given by (n(1), 0, 0, 0) and a fitness landscape given by 1 2a 3, and 2a 2b (3.9) 1 < 2b, and, 1, 2b < 3 In this case, if the edge set is E 1 , Proposition 2.3 implies that the rescaled macroscopic population jumps along traits 1 − 2a − 3 in a time t 1 on the log K time scale. But if the edge set is E 2 , the population jumps along 1−2a−2b−3 and the time to reach 3 is t 2 > t 1 . More precisely, the exponents are drawn on Figure 3. Condition (3.10) ensures that 2b . Graph G and exponents β(t) for Example 4, with edge set E 1 (above) and E 2 (below).
invades first when the edge set is E 2 but not when it is E 1 , in other words β 2b reaches 1 before β 3 if started at 1 − 1/α but not at 1 − 2/α. And Condition (3.11) enlarges the time of fixation of 3. Note that the first inequality in Condition (3.10) is not necessary but allows to simplify the second one. Moreover, observe that equation (2.7) implies s 2 −s 1 = 1/ f 2b,2a and s 2 − s 1 = 1/ f 3,2a . Note that in the rare mutation regime we can observe this phenomenon on the mutation time scale, but only with probability strictly smaller than 1, since both 2b and 3 are fit with respect to 2a and can both invade with positive probability once they are produced.
In this case, Proposition 2.3 implies that the rescaled macroscopic population jumps along traits 1 − 3 − 2 (in the clockwise sense) although the mutations are directed counterclockwise. More precisely, the exponents are drawn on Figure 4. Moreover, if Conditions (3.12) below are fulfilled the period is shorter and shorter, an acceleration takes place, as it is depicted on Figure 4.
Note that in the rare mutation regime, with the chosen parameters, there would be no evolution since 2 < 1. Moreover, there are no parameters such that counter cyclic or accelerating behaviors could arise.
3.3. Arbitrary large jumps on the log K-time scale. A natural question to ask is if the "cut-off" α restricts the range of the jumps, on the log K time-scale, to traits which are at a distance less than α. The answer is no, as the following example shows. Example 6. Let us consider the graph G depicted on Figure 5, where V = {0, 1, 2, 3, 4} and E = {[0, 1], [1,2], [2,3], [3,4]}. Let α ∈ (3, 4), an initial condition given by (n(0), 0, . . . , 0) and a fitness landscape given by In this case, the cut-off is in between traits 3 and 4 (meaning that Kµ i K → 0 for i > 3) thus population of trait 4 vanishes at time 0. However, Proposition 2.3 implies that the rescaled macroscopic population jumps from trait 0 to trait 4 in a time on the log K time scale. More precisely, the exponents are drawn on Figure 5. Condition (3.13) ensures that trait 4 fixates before trait 3.
It is easy to generalize this example to construct jumps to any distance L larger than α, by taking larger and larger fitnesses after the negative fitness region. The condition implying emergence of trait L is then a little more technical to write, since one has to compute the time for the piecewise affine function β L (t) (with multiple slope-breaks) to reach 1 before the other traits. Example 6 constitutes the simplest non-trivial example of this phenomenon. Example 7 is a further case where a more distant trait fixates, and two intermediate times t 4,1 and t 5,1 occur (recall the definition in (2.6)).
3.4. Effective random walk across fitness valleys.  β 0 (t) We suppose that whenever there are several outgoing edges from a vertex v, the mutation kernel is uniform among the nearest neighboring vertices. Let α ∈ (0, 1), an initial condition given by (n(0), 0, . . . , 0) and a fitness landscape given by In this case, according to [6], the time to cross the fitness valley is of order O(1/Kµ 2 K ) = O(K −1+2/α ) log K, thus the first mutant of type 1a will appear on this time scale, and will invade with positive probability. Then, in a time of order O(log K), type 1b fixates, and one has to wait again a time of order O(K −1+2/α ) until the appearance of the next mutant of type 0. Thus, on the time scale O(K −1+2/α ), the population process converges to a jump process between the two states 0 and 1b with positive jump rates although the fitness f 1b,0 is negative. More precisely, following [6] we define Theorem 3.2. As K → ∞, the following convergence holds

3 effective sites.
Example 9. Let us consider the graph G depicted on Figure 7. We suppose that whenever there are several outgoing edges from a vertex v, the mutation kernel is uniform among the nearest neighboring vertices. Let α ∈ (0, 1), an initial condition given by (n(0), 0, . . . , 0) and a fitness landscape given by Thus, following [6], on the time scale O(K −1+2/α ), the population process converges to a jump process between the three states {0, 1b, 2b} with positive jump rates. More precisely, Theorem 3.3. As K → ∞, the following convergence holds where X t is a continuous time Markov chain on {0, 1b, 2b} with transition rates:

PROOF OF THEOREM 2.2 AND PROPOSITION 2.3
This section is dedicated to the proofs of our main results. As they are technical and involve many stopping times, we begin with a rough outline of the strategy of the proof.
Throughout the proof, we define several stopping times to divide the times between invasions into sub-steps. Heuristically they correspond to the following events: − σ K k , the time when the k th invasion has taken place and a new equilibrium is reached. − θ K k,m,C , the first time after σ K k−1 when either the macroscopic traits stray too far from their equilibrium or at least one of the (formerly) microscopic traits becomes macroscopic (recall Definition 2.1) − s K k , the first time after σ K k−1 when a microscopic trait becomes almost macroscopic, i.e. reaches an order of K 1−ε k . − t K w,k , the first time after σ K k−1 when trait w has a positive population size. (t K w,k = σ K k−1 for all traits that are alive at this time.) As in Proposition 2.3, (τ K , ≥ 0) is the collection of both (s K k , k ≥ 0) and (t K w,k , k ≥ 0, w ∈ V). Figure 8 visualises the different stopping times for the case of one macroscopic and two microscopic traits.
The proof consists of five parts: (1) In the longest and most involved part of the proof, we study the growth dynamics of the different subpopulations in the time interval [τ K −1 ∧ T ∧ θ K k,m,C , τ K ∧ T ∧ θ K k,m,C ], making use of several results from [10], which are restated in the Appendix, and generalised when needed. Similar to [24], we prove lower and upper bounds for β K w (t) via an induction, successively taking into account incoming mutants originating from traits of increasing distance to w. We prove that β K w (t) follows the characterisation of β w (t) in Theorem 2.2 up to an error of order ε k for large K.
(2) We construct the sets M K and calculate the value of τ K − τ K −1 , proving part (ii) of Proposition 2.3.
(3) We prove that s K k and θ K k,m,C are equal up to an error η k that goes to zero as ε k → 0 and conclude that s K k converges to s k when K → ∞. (4) We prove that the stopping time θ K k,m,C is triggered by a (formerly) microscopic trait reaching order K, and not by the macroscopic traits deviating from their equilibrium. (5) Knowing that we have non-vanishing population sizes at θ K k,m,C , we finally consider the Lotka-Volterra phase involving v k−1 and the trait l K k that has newly reached order K, proving that the initial conditions for the next step, characterised in the definition of σ K k , are satisfied after a time of order 1. This concludes the proof of Theorem 2.2. Since part (i) of Proposition 2.3 is a direct corollary of Theorem 2.2, this concludes the proofs of both results.
Recall the definitions provided in Theorem 2.2 and Proposition 2.3, and for a given set v ⊂ V, introduceṽ the support of the mutation free Lotka-Volterra equilibrium associated to v, that is to say w ∈ṽ ⇔n w (v) > 0. Similarly as in [10], the strategy of the proof consists in performing an induction on successive phases k, during which the population sizes of the set of traitsṽ k are close to their equilibrium value and the population sizes of the set of traits V \ṽ k are small with respect to K. To be more precise, we will introduce a sequence of stopping times (σ K k log K, k ∈ N) (see definition in (4.26)) satisfying the following conditions, as soon as s k < T : (1) σ K k → s k in probability when K goes to infinity (2) For any 0 < ε k < 1 ∧ inf w∈ṽ kn w (v k ), with high probability (a) For every w ∈ṽ k , To be more precise, for k ≥ 1, the time interval [σ K k−1 log K, σ K k log K] will be divided into two parts: to reach a size of order K, − a 'deterministic phase' [θ K k,m,C log K, σ K k log K] needed for the mutation free Lotka-Volterra system associated toṽ K k−1 ∪ l K k to reach a neighbourhood of its equilibrium.
Define σ K 0 := T (ε 0 )/ log K. We can check that σ K 0 is a stopping time converging in probability to s 0 = 0 and satisfying Assumption 1. Moreover we know that the processes β K w , w ∈ V, vary on a time scale of order log K (see [7,10] for instance). In particular, they do not vary during the time T (ε 0 ) in the large K limit. This entails that σ K 0 satisfies Assumption 1.
• σ K k , k ≥ 1: Assume that s k−1 < T 0 and that σ K k−1 log K is a stopping time satisfying Assumption 1. We will now construct σ K k .

Definitions and first properties.
Let us introduce a small ε k > 0 as well as a stopping time θ K k,m,C log K via (4.1) The conditions satisfied by m > 0 and C > 0 will be precised later on. m is typically small, see (4.23). The conditions satisfied by C will be specified in Section 4.4.
We will now finely study the population dynamics on the time interval [σ K k−1 log K, (θ K k,m,C ∧ T ) log K]. To this aim, we will couple the subpopulations of individuals with a given trait with branching processes with immigration and use results on these processes derived in [10] and recalled (and generalized when needed) in the Appendices. The main difficulty of this step comes from the fact that as we allow for any finite graph of mutations, the immigration rate for a particular subpopulation may vary a lot on the time . This is why we introduced in Proposition 2.3 the sequence of times (τ , ∈ N), which corresponds to the times when mutants of a new type arise or a formerly microscopic trait becomes of order K.
Notice that although we make extensive use of the techniques and results developed in [10], the authors of this paper considered a specific graph embedded in Z, and their proof structure, in particular inductions, relies on their graph structure. The current inductions are more involved and more in the proof spirit of [24].
To begin with, let us recall the rates of the different events for the population N K w , with w ∈ V, at time t: • Reproductions without mutation: • Death: • Reproductions with mutations towards the trait w: Notice that for K large enough, as σ K k−1 satisfies Assumption 1 and by definition of where we have introduced the following notations, for any w ∈ V and * ∈ {−, +}, Hence the rate of reproduction without mutation, as well as the death rate do not vary significantly during the time interval [σ K k−1 log K, (θ K k,m,C ∧ T ) log K]. The difficulty comes from the rate of mutations towards a given trait, which depends on the population sizes of its neighbours in the graph G, which themselves depend on the population sizes of their neighbours and so on.
Let us introduce the times τ K and the sets M K , which correspond respectively to the times of invasion or appearance of new mutants (and will be the time steps of the algorithm to be described shortly later) and to the sets of living traits in the time interval To be more precise, The sequences (τ K , ≥ 0) and (M K , ≥ 0) are defined as follows: that is to say the minimum between the time when a previously microscopic population becomes (almost) macroscopic, and the time of appearance of a new mutant. From the definition of the sequence (τ K , ≥ 0) we can now define the sequence of sets of living traits (M K , ≥ 0) via

Dynamics of the process on
We will first prove that there exists a finite and positive constant C such that with high probability, for every w ∈ M K −1 and t ∈ To obtain the lower bound in (4.9), we show by induction that, for any n ≥ 0 and with high probability, Induction lower bound: • n = 0: let w ∈ M K −1 . From (4.5) and (4.6), we see that we can couple N K w with a process Z K with law . Hence, from Corollary A.4, we obtain that with high probability, Remark 6. Notice that the application of Lemma A.1 (which has been derived in [10]) would require β K w (τ K −1 ) > 0 and that this condition may not be satisfied for one of the w ∈ v k−1 (the trait which becomes macroscopic at time τ K −1 log K). However, the population of individuals w grows exponentially due to the mutations coming from another trait and there exists a finite c such that, for small δ > 0, N K w ((τ K −1 + δ) log K) ≥ K cδ . We could thus apply Lemma A.1 at this time, and later on let δ go to 0 to get the result. This is in words the statement of Corollary A.4.
• n → n + 1: Let w, u , u ∈ M K −1 such that d(u , w) = 1 and d(u, u ) ≤ n. From now on, we will use the notation BPI K , which is defined in Section A.2. From (4.5), (4.6), and (4.7), by looking only at the immigration coming from u , we see that we can couple N K w with a process Z K with law in such a way that By the induction hypothesis, with high probability, which implies that we can couple Z K with a process Y K with law in such a way that . Hence, from Corollary A.4, even if we have to work in a time interval [τ K −1 + δ, T ], for a small positive δ, in the spirit of Remark 6, as w ∈ M K −1 we obtain that with high probability, As this is true for any u such that d(u , w) = 1 and as the above bound is a decreasing function of d(u, u ), by taking the supremum over such u we obtain Thus, with high probability, which ends the induction for the lower bound.
Let us now proceed to the induction for the upper bound. We again take t in [τ K −1 ∧ T ∧ θ K k,m,C , τ K ∧ T ∧ θ K k,m,C ] and we will show that for any n ∈ N there exists a finite constant C n, such that with high probability, Notice that, since ε k can be chosen small enough such that 1−(n+1)/α+(n+2)ε k < 0, for all n > α , all terms with d(u, w) > α are negative and this equation is equivalent to the upper bound in (4.9).

Induction upper bound:
Throughout the induction for the upper bound, we will several times make use of the fact that we can approximate the total immigration to one trait, which is the sum of the mutants coming from its neighbours, from above by the number of neighbours times the largest incoming mutation. More precisely, if I w is the number of incoming neighbours of w, Since the trait space is finite, for K large enough, we can assume that max w∈V log I w / log K ≤ ε k .
From (4.5), (4.6), and (4.7), we see that we can couple N K w with a process Z K with law BPI K b(w, k, +), d(w,ṽ k−1 , k, +), 0, 1 − 1 α + 2ε k , β K w (τ K −1 ) in such a way that N K w (t log K) ≤ Z K ((t − τ K −1 ) log K). Hence from Corollary A.4, even if we have to work in a time interval [τ K −1 + δ, T ], for a small positive δ, in the spirit of Remark 6, as w ∈ M K −1 we obtain that with high probability, • n → n + 1: For w, u ∈ M K −1 such that d(u , w) = 1, by the induction hypothesis we have the existence of a finite constant C n, such that, with high probability, From (4.5), (4.6), and (4.7), by looking at the maximal immigration coming from a neighbouring u and adding another ε k in the spirit of (4.12), we thus see that we can couple N K w with multiple processes Z K,u,u and Z K with respective laws and in such a way that Hence from Corollary A.4, even if we have to work in a time interval [τ K −1 + δ, T ], for a small positive δ, in the spirit of Remark 6, as w ∈ M K −1 we obtain that with high probability, (4. 13) In order to simplify the right hand side of the previous inequality, we will show that for any ∈ N there exists a finite and positive constant C such that for any (u, w) ∈ V 2 , with high probability (4.14) Combining (4.13) and (4.14) yields that with high probability, which ends the induction for the upper bound.
Let us now derive inequality (4.14). It is obtained by an induction on . If = 1, by (2.4) and the triangle inequality, As the convergence is in probability, it means that for K large enough, there exists a finite C u,w such that with a probability larger than 1 − ε k , As there are only finitely many traits, sup u,w∈V C u,w < ∞. Moreover, as ε k can be chosen as small as we want and as we want to prove a convergence in probability, we may focus on the event where inequality (4.15) is satisfied. We will do that later on without mentioning it again for the sake of readability. Now assume that (4.14) is true for − 1 ∈ N. Let us first prove that it still holds for . From the previous step on the time interval Now let us take u ∈ V. We also deduce from the previous step that for K large enough In particular there existsũ ∈ V such that d(ũ, u) ≤ α and for K large enough Thus, for K large enough, This entails (4.14).
As τ K − τ K −1 ≤ T , Equation (4.9) tells us that, with an error of order ε k which is as small as we want, with high probability, the growth of traits w ∈ M K −1 follows, for To avoid repetition, we will write in the sequel to indicate approximations with high probability, with an error of order ε k .

4.3.
Value of τ K and construction of M K . Let us assume for the moment (it will be proven in Section 4.4) that the following holds with high probability: Our aim now is to find the duration τ K − τ K −1 and to construct the set M K knowing the set M K −1 . To reach τ K , two events are possible. Either one living non resident trait reaches a size of order K, or a new mutant appears.
Let us consider the first type of event. In fact, we have to be more precise on the time when a new trait has a size which reaches order K, this is why we defined s K k as the time when one trait has a size which reaches order K 1−ε k . Notice that we may choose ε k small enough to be sure that it corresponds to the trait whose exponent reaches 1 at time s k in the deterministic sequence (s j , j ∈ N) defined in Theorem 2.2. (if there exist two such traits, condition (iv)(a) is fulfilled and T 0 is set to s k ). Notice that if f u,v k−1 < 0, for any w ∈ V, is decreasing and thus will not reach 1 − ε k if it is smaller than this value at time τ K −1 . Hence if we denote by u 0 the element of M K −1 such that β K u 0 (τ K ) = 1 − ε k , we get Now assume by contradiction that there is u 1 u 0 ∈ M K −1 such that:

This implies
,v k−1 > 1, as soon as ε k < 1/α, which yields a contradiction. This implies that if there exists ,v k−1 and with high probability, the value of τ K − τ K −1 satisfies, Let us now consider the second type of event, that is to say that there exist u 0 M K −1 and u 1 ∈ M K −1 such that d(u 1 , u 0 ) = 1 and β K u 1 (τ K ) = 1/α. Notice again than if f u,v k−1 < 0, the function defined in (4.17) is decreasing and thus will not reach 1/α if it is smaller than this value at time τ K −1 . By definition we have Denote by u 2 ∈ M K −1 the trait realizing the maximum in the previous equation, that is to say This equality can be rewritten as Let us now make a reductio ad absurdum to prove that d(u 2 , u 1 ) + 1 = d(u 2 , u 0 ). Let us thus assume that and take u 1 such that d(u 2 , u 1 ) + 1 = d(u 2 , u 0 ).
Let us first assume (we will prove it later) that u 1 ∈ M K −1 . In this case, using the proof for the lower bound, we obtain that with high probability As d(u 1 , u 0 ) = 1, this means that u 0 becomes a living trait before the time τ K , which is in contradiction with the definition of τ K . Let us now assume that u 1 M K l−1 and consider a sequence of vertices v 0 = u 2 , v 1 , ..., v d(u 2 ,u 1 ) = u 1 such that d(u 2 , v k ) = k and d(v k , u 1 ) = d(u 2 , u 1 ) − k. Let and with high probability and thus v k 0 +1 becomes a living trait before the time τ K , which again is in contradiction with the definition of τ K . We thus obtain a contradiction and deduce that (4.19) is not satisfied. We conclude that Hence, when τ K corresponds to the arrival of a new mutant, (4.20) Combining (4.18) and (4.20), we finally obtain: To obtain M K from M K −1 , we suppress the traits w ∈ M K −1 such that β K w (τ K ) = 0 (if condition (iv)(c) is not satisfied, otherwise T 0 is set to s k ) and if τ s k , we add the traits which are at distance 1 from the w ∈ V satisfying w ∈ arg min

4.4.
Value of θ K k,m,C and convergence of s K k to s k . Recall the definition of θ K k,m,C in (4.1). We thus have constructed, on the time interval , the times (τ K , ∈ N) and the sets (M K , ∈ N) of living traits between times τ K and τ K +1 . We will now study the dynamics of the process on the time interval [(σ K k−1 ∧ T ) log K, (σ K k ∧ θ K k,m,C ∧ T ) log K] (σ K k to be defined later in order to satisfy Assumption 1). Recall that l K k is the trait w ∈ V such that β K w (s K k ) = 1 − ε k and introduce We will first prove that lim K→∞ P s K k ≤ θ K k,m,C ≤ s K k + η k s K k < T = 1. (4.21) The first step consists in showing that By definition of s K k , we have sup Moreover, applying Lemma A.5 to v k−1 we obtain that From the two last inequalities we deduce that with high probability, 4.6. Construction of σ K k and Assumption 1. Let us now introduce the stopping time σ K k , via: The last step of the proof consists in showing that σ K k indeed satisfies Assumption 1. First σ K k log K is a stopping time. Second, from (4.23), (4.24), (4.25) and an application of Lemma A.5 there exists T (ε k ) < ∞ such that Moreover, during a time of order one, the order of population sizes does not vary more than a constant times ε k (result similar in spirit to Lemma B.9 in [10]). Adding that s K k converges to s k in probability when K goes to infinity, as well as (4.21), we obtain that Assumption 1 holds. It ends the proof of Theorem 2.2 and Proposition 2.3.

PROCESSES WITH IMMIGRATION
The aim of this section is to collect various couplings of the populations with simpler processes like branching processes and logistic processes with immigration, and to state some properties of these simpler processes. These results have been derived in [10] (note that we need to slightly generalise some of them), and we state them for the sake of readability. For simplicity we keep the notations of [10].
A.1. Branching process. In this subsection, we recall Lemma A.1 of [10], which describes the dynamics of a birth and death process on a log K time scale. For b, d, β ≥ 0, let BP K (b, d, β) denote the law of a process (Z K (t), t ≥ 0) with initial state Z K (0) = K β − 1 , individual birth rate b and individual death rate d.
A.2. Branching process with immigration. In this subsection, we recall Lemma B.4 and Theorem B.5 of [10], illustrated in Figure B.1 therein, which describe the dynamics of birth and death processes with immigration on a log K time scale. For b, d, β ≥ 0, a, c ∈ R, BPI K (b, d, a, c, β) denotes the law of a process (Z K (t), t ≥ 0) with initial state Z K (0) = K β − 1 , individual birth rate b, individual death rate d, and immigration rate K c e as at time s ≥ 0.
In addition, in the case where c 0 or a 0, for all compact intervals I ⊂ R + which do not intersect the support ofβ, lim K→∞ P Z K (t log K) = 0, ∀t ∈ I = 1.
We will mostly use a corollary of those two lemmas, which is valid without the assumption c ≤ β but on a time interval [δ, T ], for any δ > 0. The idea of the proof has been explained in Remark 6.
A.3. Logistic birth and death process with immigration. We recall that for a subset v ⊂ V of traits that can coexist at a strictly positive equilibrium in the Lotka-Volterra system (2.3),n(v) ∈ R v + denotes this equilibrium. The next result states that if all traits in v have an initial population of order K and the immigration of individuals with traits in v is small enough, the equilibriumn(v)K is reached in a time of order 1 and the populations of individuals whose traits belong to v will keep a size close to its equilibrium during a time of order larger than log K This result is a generalisation of Lemma C.1 in [10] to the multidimensional case and with (slightly) varying rates.
We denote by LBDI K (b v , d v , c v , g v ) the law of a logistic birth and death process with immigration Z K := ((Z w (t) K , w ∈ v), t ≥ 0) where, at time t, an individual with a trait w ∈ v has a birth rate b w (t), a death rate d w (t) + x∈v c w,x (t)Z K x (t)/K and an immigration rate g w (t).
Lemma A.5. Let T > 0, v ⊂ V and assume that the mutation-free Lotka-Volterra system (2.3) associated to v and with rates (b v ,d v ,c v ) ∈ (R + * ) v × (R + * ) v × (R + * ) v 2 admits a unique positive globally attractive stable equilibriumn w (v). Assume that Z K follows the law and g w (t) ≤ K 1−η for all t ∈ [0, T log K], w ∈ v for some ε, η > 0.
Proof. The case where the functions b v , d v , c v are constant is a direct generalisation of Lemma C.1 in [10], whose proof follows arguments similar to the ones given in [7,9] or in the Proposition 4.2 in [8] to handle the addition of (negligible) immigration. We do not provide it. Let us explain how we deal with varying rates for point (i). Let us choose w 0 ∈ v, and introduce for w 1 , w 2 ∈ v: Then we can couple a process Z K with the law LBDI K (b v ,d v ,c v , g v ) with a process Z K with the law LBDI K (b v ,d v ,c v , g v ) such that for every t ≥ 0,Z K w 0 (t) ≤ Z K w 0 (t) and Z K w (t) ≥ Z K w (t) for every w ∈ v \ w 0 . Moreover, as the equilibrium of a Lotka-Volterra system is continuous with respect to its coefficient, there is a positiveC such that for ε small enough, and if we denote byn (w 0 ) (v) the equilibrium of the Lotka-Volterra system with the coefficientsb v ,d v ,c v we have just introduced, n (w 0 ) (v) −n (w 0 ) (v) ≤Cε. Hence applying the point (i) for the processZ K , we obtain upper bounds for coordinates w w 0 and a lower bound for the coordinate w 0 , for the process Z K . Doing the same and the reverse bounds for the other elements of v gives the result for some C >C that takes into account the fluctuations around the varied equilibria.
We end this section with a result stating that the time needed for the total population size of a logistic birth and death process (with or without mutations) to reach (and stay smaller than) an order K is of order one for K large enough.
Corollary A.6. Let us consider a subset v ⊂ V of traits, ) v 2 and let Z K follow the law LBDI K (b v , d v , c v , 0), and Z K denote the total population size of the process Z K . For every ε > 0 there exists T (ε) < ∞) such that for t > T (ε) lim K→∞ P log(1 + Z K (t)) log K < 1 + ε = 1.