Quasi-equilibria and click times for a variant of Muller’s ratchet

Consider a population of N individuals, each of them carrying a type in N 0 . The population evolves according to a Moran dynamics with selection and mutation, where an individual of type k has the same selective advantage over all individuals with type k ′ > k , and type k mutates to type k + 1 at a constant rate. This model is thus a variation of the classical Muller’s ratchet: there the selective advantage is proportional to k ′ − k . For a regime of selection strength and mutation rates which is between the regimes of weak and strong selection/mutation, we obtain the asymptotic rate of the click times of the ratchet (i.e. the times at which the hitherto minimal (‘best’) type in the population is lost), and reveal the quasi-stationary type frequency profile between clicks. The large population limit of this profile is characterized as the normalized attractor of a “dual” hierarchical multitype logistic system, and also via the distribution of the final minimal displacement in a branching random walk with one-sided steps. An important role in the proofs is played by a graphical representation of the model, both forward and backward in time, and a central tool is the ancestral selection graph decorated by mutations.


Introduction
A well-known model in population genetics named Muller's ratchet (cf.[22,28,9,26,21] and references therein) considers, in its bare bones version, the interplay between selection and stepwise (slightly) deleterious mutation in the absence of recombination, for a population of constant size N ≫ 1. Briefly spoken, selection decreases the expected reproductive success of those individuals that have a higher mutational load.Since the mutational load is inherited, selection tends to decrease the overall mutational load in the population.Because of the assumed unidirectional mutation and randomness in the reproduction, the currently lightest load eventually disappears from the population.This is phrased by H.J. Muller in his pioneering paper [22] as follows: ". . .an irreversible ratchet mechanism exists in the non-recombining species . . .that prevents selection, even if intensified, from reducing the mutational loads below the lightest that were in existence when the intensified selection started, whereas, contrariwise, 'drift', and what might be called 'selective noise' must allow occasional slips of the lightest loads in the direction of increased weight."Thanks to recombination however, sexual organisms are able to avoid such an accumulation of deleterious mutations.This could be one of the explanations of the ubiquity of sexual reproduction among eukaryotes despite its many costs [11].
We will now give a short description of the model and the main questions addressed in this paper.A more detailed presentation of the model, a formulation of the main results and a preview of the proof strategy will be given in Section 2.
Each individual carries, as its current type, a number κ of deleterious mutations.The number of mutations along each lineage increases by 1 at a rate m N , and as soon as an individual reproduces, its current type is inherited to its 'daughter'.Reproduction happens according to a Moran dynamics with selection, where the fitness difference between two individuals of type κ and κ ′ is s N N Φ(κ ′ − κ) for a selection parameter s N and a non-decreasing antisymmetric function Φ : Z → R. (For the classical variant of Muller's ratchet, Φ is the identity function on Z, which means that the effect of selection is proportional to the difference of mutational loads.)The individual-based dynamics in this model, which we briefly call the Φ-ratchet, arises as an independent superposition of the following three ingredients: Moran resampling.For each pair of individuals (i, j), irrespective of their types, j is replaced by a newborn daughter of individual i at rate 1 2N .

Selective reproduction.
For each pair of individuals (i, j) of types κ and κ ′ , individual j is replaced by a newborn daughter of individual i at rate s N N Φ(κ ′ − κ)1 {κ ′ >κ} .Stepwise mutation.For each individual, its type is increased by 1 at rate m N .This dynamics leads to a sequence of times at which the currently lowest (and thus selectively 'best') type in the population disappears.These times will be referred to as 'click times of the ratchet'.In certain regimes of the parameters s N and m N , as N becomes large the click times happen rarely and a quasi-stationary type frequency profile builds up between the click times.The following questions thus call for an answer: A. What is the rate (of the click times) of the ratchet?B. What is the quasi-stationary type frequency profile?
For the classical variant of Muller's ratchet, a fully rigorous asymptotic analysis of these problems beyond existence results is a notorioulsy difficult and in large parts still unsolved task, see e.g.[19].
We propose a variant of the fitness function Φ which leads to a model that turns out to be tractable by modern probabilistic techniques, allowing for quantitative results for the rate of the ratchet and the quasi-stationary type profile.This specific choice, denoted by φ, is φ(κ ′ − κ) := 1 {κ ′ −κ>0} − 1 {κ ′ −κ<0} . (1.1) The essential difference to the classical variant of Muller's ratchet thus is that the selective advantage does not depend on the value of the difference of κ ′ and κ, but only on the difference's sign.This corresponds to replacing proportional selection by binary tournament selection [4,25], whose effect may be imagined as due to pairwise fights between randomly chosen individuals, where the individual of 'better' type outcompetes that of worse type.From a biological point of view, this provides a clear cut and reasonable alternative to the classical variant.From the perspectives of mathematis, the tradeoff between the complexity and the tractability of the model is attractive.The present work analyses a subcritical regime in which the mutation-selection ratio m N s N remains constant and below 1.The near-critical regime, in which m N s N ↑ 1, presents additional technical challenges.We have started to study this regime and some interesting links to the classical variant of Muller's ratchet in [16].
To obtain quantitative results for the click rates and the quasi-stationary type profile, we will throughout the paper consider the case of moderate selection and mutation where µ < α, f (N ) → ∞ and f (N ) = o N ln ln N as N → ∞.In particular, this implies that m N → 0, N m N → ∞, and m N and s N are of the same order.Theorem 2.2 (Asymptotic rate of clicks).Assume that all individuals at time 0 are of type 0, i.e. ξ (N ) such that the sequence of time-rescaled click time processes converges in distribution as N → ∞ to a rate 1 Poisson counting process (when restricted to [0, T ] for each T > 0.) In particular, for the case of nearly strong selection s N = 1/l(N ), where l(N ) is any slowly varying function that converges to infinity with N , Theorem 2.2 says that the expected time between clicks is only slightly smaller than exponential in N .In contrast to this, for nearly weak selection with (ln ln N )/N ≪ s N ≪ (ln N )/N , Theorem 2.2 says that the timescale f (N )θ N of clicks is asymptotically only slightly larger than the evolutionary timescale N .(2.8) Then the empirical type frequency profile defined in (2.5) obeys for all k ∈ N 0 where (p k ) k∈N0 is a sequence of probability weights given by the recursion (2.9) b) The recursion (2.9) is equivalent to the (mutation-selection equilibrium) system with the boundary conditions p −1 = 0, p 0 > 0, c) Two alternative probabilistic descriptions of (p k ) k∈N0 given by (2.9) are as follows: • Consider a Yule tree with splitting rate α whose branches are decorated by a rate µ Poisson point process.Then p k is the probability that there is an infinite lineage carrying exactly k points but no infinite lineage with less than k points.
• Consider a branching random walk on N 0 starting with one individual at the origin with binary branching at rate α (and no death) and with migration of individuals from k to k + 1 at rate µ.Then, as t → ∞, the minimal position of the individuals alive at time t converges in law to a random variable with distribution (p k ) k∈N0 .
d) With ρ := µ/α, the tails of the probability weights (p k ) given by (2.9) are represented by the iterations where (2.12) e) Let (p k ) k∈N0 be the probability weights given by (2.9).Then Remark 2.4.a) Eq. (2.10) characterizes the type frequency profile (p k ) k∈N0 as the fixed point of a deterministic mutation-selection equilibrium, with the out-flux due to mutation on its right hand side and the in-flux due to selection on its left hand side.The latter can be written as [14]) the solution of (2.10) would be the Poisson weights with parameter µ/α.b) An essential advantage of the form (1.1) of the fitness function is that it opens the way to a mathematically tractable analysis of the probabilistic system via a dual process within a graphical representation.The latter will be developed in Sections 3 and 4, and the dual process will appear as a hierarchy of competing logistic populations in Sections 5 and 6.As an appetizer and short preview, let us point to the duality between the size of the initially best class, and the process Z (N ) 0 whose jump rates are given by (6.1).According to (2.1)-(2.3), the process Y (N ) 0 has the jump rates .This constitutes the ground level (k = 0) in a duality between the type frequency process (N ξ (N ) k ) k∈N and a "hierarchy of logistic competitions" (Z (N ) k ) k∈N .The latter will be introduced in Section 6, and the duality will appear in the "grand graphical picture" developed in Sections 3-5.
We end this section by a description of the proof strategy of Theorems 2.2 and 2.3, with some of the ideas illustrated by Figure 1.
Figure 1: This cartoon reflects the strategy for proving Theorems 2.2 and 2.3.The time arrow for the evolution of the Ancestral Selection Graph points from right to left.The "minimal load ASG" is symbolized by the grey band amidst the total population that is confined by the two solid horizontal lines.ASGs started from the entire population merge on the scale f (N ) ln N into the minimum load ASG.Along the latter, clicks (depicted by thick circles) happen asymptotically at (the much slower) scale θ N f (N ), and these can be coupled locally on that scale with the clicks forward in time.Most of the variability on the number of mutations among contemporaneous individuals comes from mutations acquired in their recent past.These recent mutations can be studied via duality by means of a Yule process approximation on the scale f (N ), leading to the quasi-stationary type frequency profile (p k ) k∈N0 .
Think of some late time u = u N being fixed, located at the far right of Figure 1, and denote the population living at this time by P u =: P. For t < u, the Ancestral Selection Graph (ASG) A t = A P t consists of all the potential ancestors of P that live at time t, see Definition 4.2.Each individual in A t has a certain "mutational distance" (M-distance, see Definition 4.3) from P. Those individuals in A t that have the smallest M-distance from P among their contemporaneans, make up the minumum load ASG Āt , see Definition 4.6.
• Every once in a while, as t decreases (i.e.wanders to the left in Figure 1) the M-distance between Āt and P increases by one.These backward click times (see Definition 8.1 and Proposition 8.2) turn out to be asymptotically Poisson as N → ∞, with the rate appearing in Theorem 2.2.A key step in proving this is the insight that, between backward click times, the size of Āu−r , r ≥ 0 is an autonomous Markov process with the "logistic" jump rates (6.1).As N → ∞, the expected time to extinction out of the quasi-equilibrium of this Markov process is given by (2.7) up to logarithmic equivalence, see Lemma 6.4.• Another key result is Proposition 7.1.This ensures in particular that the "load zero" ASG of the entire population P T that lives at some time T < u (i.e. that part of the ASG which has M -distance zero from P T ) merges quickly with the minimum load ASG Ā .This helps to show that backward and forward click times are asymptotically close on a suitable scale, and to complete the proof of Theorem 2.2 in Section 9.
It remains to explain our strategy of proving Theorem 2.3.For this, let us have a look at the random number (say, K) of deleterious mutations which an individual sampled uniformly at random at time t N has acquired in its recent past, i.e. before the ASG of the sampled individual starts to exhibit coalescences and to interact with the ASG's of other individuals sampled uniformly at random at time t N .
• During this time span the ASG of the sampled individual looks like a Yule tree with splitting rate s N .We are thus facing a question of 'first passage percolation' along a Yule tree that is decorated with a Poisson point process of intensity m N .This question, which is of independent interest (see Remark 11.2) is the subject of Proposition 11.1.Here, the probability weights (p k ) k∈N0 appear in a natural manner as the distribution of the above described random variable K.
• The link to identify them as the quasi-stationary type frequency profile appearing in Theorem 2.3 is the insight that the ASG of the sampled individual, as soon as it has become large, contains, with high probability as N → ∞, an individual of the currently best class.This handshake between the backward and the forward point of view is carried out in Section 12, thus completing the proof of the main part of Theorem 2.3.
Remark 2.5.Another route of proving Theorem 2.3a), which we only sketch briefly here, builds on the time reversibility of the so called equilibrium Ancestral Selection Graph which was discovered by Pokalyuk and Pfaffelhuber in [27].This carries over to the time reversibility of the equilibrium ASG decorated with the Poisson process of mutations.As described in Section 6, the (joint) center of attraction of the sizes of the load k-ASG's is asymptotically given by (2N s N p k ) k∈N0 .By time reversibility, this backward-in-time concept turns into a statistics of the M-distances between the equilibrium ASG's at times 0 and t N , showing that the type frequency profile within the equilibrium ASG at time t N is asymptoticallly given by (p k ) k∈N0 .This, however, is representative for the entire population at time t N , since the equilibrium ASG at time t N is a random sample that is "measurable from the future".

Graphical representation of the model
The type frequency process ξ (N ) of the φ-ratchet, which was introduced at the beginning of Section 2, can be constructed (in a similar way as in [13,8,12]) on top of a Moran graph with selection parameter s N , with mutations added by means of an independent Poisson process.

Definition 3.1 (Graphical elements).
For fixed N ∈ N, we consider three independent Poisson point processes, C (N ) , S (N ) and M (N ) .The processes C (N ) and S (N ) are supported by {(i, j Here and below we use the abbreviation [N ] := {1, 2, . . ., N } Remark 3.2.When there is no risk of confusion, we will suppress the index N and write G, C, S, M. We will speak of the points (i, t) ∈ G, i ∈ [N ], as the individuals living at time t.Each point (i, j, t) ∈ C ∪ S can be visualized as an arrow pointing from line i to line j at time t.At an (i, j, t) ∈ C, the individual (i, t), irrespective of its type, bears a daughter (j, t) who replaces the individual (j, t−).At an (i, j, t) ∈ S, the same happens, but only provided the individual (i, t) carries less mutations than the individual (j, t−).The process M describes the mutations occurring along the lines; each point of M increases the mutational load along the lineage by 1.This is made precise in Definition 3.3, and illustrated in Figure 2.
being given at some time t 0 .At times t > t 0 , the three Poisson point processes C, S and M act as follows: • if the point (i, j, t) belongs to C, then η(j, t) = η(i, t−) (= η(i, t) a.s.) In the next section and thereafter we will use the Poisson point processes (C, S, M) also for a transport (of potential ancestral paths and mutational loads) backwards in time.In order to clearly distinguish between forward and backward concepts, we define two filtrations generated by (C, S, M).
and let M ≤t be the restriction of F t and P t be the σ-algebras generated by C ≤t , S ≤t , M ≤t and C ≥t , S ≥t , M ≥t , respectively.The forward filtration is F := (F t ) t≥0 and the backward filtration is P := (P t ) t∈R ; note that P t increases as t decreases.A random time T is called a P-stopping time if {T ≥ t} ∈ P t for all t ∈ R. The σ-algebra P T consists of all those sets E ⊂ σ t∈R P t Click times for a variant of Muller's ratchet for which E ∩ {T ≥ t} ∈ P t for all t ∈ R. All these objects are understood for fixed population size N ; sometimes we will write P (N ) and F (N ) instead of P and F , to make the dependence on N explicit.
Remark 3.5.With t 0 := 0 and η(i, 0) := 0, i ∈ [N ], and with η(., t), t ≥ 0, constructed according to Definition 3.3, the process of type frequency evolutions that figures in Theorems 2.2 and 2.3 can now be represented as the F -adapted process Indeed this process has the jump rates given in (2.1), (2.2), (2.3).In terms of η, the best type at time t (defined in (2.4)) has the representation

Potential ancestral paths and their loads
While the graphical representation given in the previous section was a forward in time construction, we now take a backward in time point of view.This is based on the concept of potential ancestral lineages which goes back to the pioneering work of Krone and Neuhauser [18,23].The key idea is to construct in a first stage an untyped version of the (potential) genealogy backwards in time and decide in a second stage forwards in time which lineages become "real".Specifically, a "selective arrow" (i, j, t) ∈ S introduces the two potential parents (i, t−) and (j, t−) of the individual (j, t).Thus, a potential ancestral lineage backwards in time should jump from (j, t) to (i, t−) as soon as it ecounters the head j of a "neutral arrow" (i, j, t) ∈ C, and should branch into two selective lineages as soon as it ecounters the head j of a "selective arrow" (i, j, t) ∈ S.
We will formalize this by the concept of (potential ancestral) paths.Definition 4.1 (Paths and potential ancestors).Let (i, t 0 ), (j, t) ∈ G with t 0 < t.A (potential ancestral) path connecting (i, t 0 ) and (j, t) is a subset of G of the form with n ∈ N and the following properties We write (i, t 0 ) ≺ (j, t) if there is a path connecting (i, t 0 ) and (j, t).In this case we say that (i, t 0 ) is a potential ancestor of (j, t).
In words, the conditions mean that jumps between different levels h, h ′ ∈ [N ] may only occur at time points of either neutral or selective arrows, and that none of the time intervals (t g−1 , t g ), g = 1, . . ., n, may be hit by a neutral arrow whose arrow-head is at i g .
As a consequence of this definition we observe: • If the point (i, j, t) belongs to C, the point (j, t) is disconnected with (j, t−) and connected with (i, t−).
• If the point (i, j, t) belongs to S, the point (j, t) is connected both with (i, t−) and (j, t−).
Definition 4.2 (Ancestral selection graph (ASG)).For t ∈ R and J t ⊂ [N ] × {t} we define, suppressing the index N , for r ≥ 0, Thinking of A Jt as a union of paths jointly with the graphical elements from C and S by which it was induced, we call A Jt the ASG back from J t .For a singleton J t = {(j, t)} we write A j,t t−r instead of A {(j,t)} t−r , and for J t = [N ] × {t} we briefly write A t t−r instead of A [N ]×{t} t−r , and A t instead of A [N ]×{t} .

Definition 4.3 (Load and M-distance). (i)
The load of a path is the number of points of M carried by the path.
(ii) The M-distance d M ((i, t 0 ), (j, t)) of two points (i, t 0 ), (j, t) ∈ G with t 0 < t is the minimal load of all paths connecting them, with the convention that the minimum over an empty set is infinity.We say that (i, Remark 4.4.For any three points (i, t 0 ), (j, t), (g, u) ∈ G with t 0 < t < u one has d M ((i, t 0 ), (g, u)) ≤ d M ((i, t 0 ), (j, t)) + d M ((j, t), (g, u)).
To see this we may assume w.l.o.g. that (i, t 0 ) ≺ (j, t) and (j, t) ≺ (g, u) (since otherwise the r.h.s. of the inequality is infinite).Then the concatenation of a path of minimal load connecting (i, t 0 ) and (j, t) with a path of minimal load connecting (j, t) and (g, u) is a path connecting (i, t 0 ) and (g, u); hence d M obeys the claimed triangle inequality.
Figure 3: This figure contains the same graphical elements than Figure 2, but now the paths are followed backwards.Let us think of the left hand side of each of the four panels corresponding to time t 0 and the right hand side corresponding to t > t 0 .At time t 0 , the M-distance between the set {1, 2} × {t} and its potential ancestors at time s is annotated.A comparison with Figure 2 shows differences and similarities between the backward and forward transport.
The graphical construction entails that, for t 0 < t, the type configuration η(., t) is a function of η(., t 0 ) and the graphical elements between times t 0 and t: types are transported forward in time, and whenever there is a "selective encounter" between two ancestral paths of (j, t), the "better type" is passed on.Specifically, this results in Remark 4.5 (Flow of type configurations).Let η = η (N ) be as specified in Definition 3.3.Then for any j ∈ [N ], 0 ≤ t 0 < t, one has a.s. and is the set of individuals (i, t − r) which are load k potential ancestors of some individual in J t (cf.Definition 4.3).Taking the union over r ≥ 0 we define A Jt (k) as the set of all individuals which are load k potential ancestors of some individual in J t .b) Minimum load potential ancestors.We define the set of minimum load potential ancestors at time t − r of the population J t as where To ease notation we write Ā t t−r instead of , and Ā i,t t−r instead of c) The definitions in a) and b) extend directly from deterministic t and J t to a Pstopping time T and a P T -measurable random set J T ⊂ [N ] × {T }.

Percolation of loads along the Ancestral Selection Graph
In this section we fix a population size N ∈ N which we suppress in the notation.For r ≥ 0 and k ∈ N 0 we define H, • Selective branching: Click times for a variant of Muller's ratchet Due to the symmetry properties of the dynamics (backwards in time) that is induced by the just described transitions, we may focus our attention on the configuration of cardinalities of the sets A Jt T −r (k), and define The following lemma is a consequence of the above described actions of the Poisson point processes (C, S, M) on the sets A J T T −r (k).Lemma 5.1.For T and J T as in Definition 4.6 c), the process (A is Markovian when randomized over (C, S, M).Its state space is the set (5.1) Its jump rates from z ∈ Z N are (with e k as in Section 2 and s N , m N as in (2.6)) • Selective branching: for any k ∈ N 0 , (5.3) • Selective competition: for any pair of integers (k, (5.5)

A hierarchy of logistic competitions
Throughout this section we consider, for any given N ∈ N, a Markovian jump process Z (N ) := (Z (N ) (r)) r≥0 , whose state space is Z N defined in (5.1) and whose jump rates are given by (5.2) to (5.5).When there is no risk of confusion, we suppress the superscript N and write e.g.Z 0 instead of Z (N ) .The following remark stems from the jump rates (5.2) to (5.5).Remark 6.1.For each k ∈ N 0 the process (Z 0 , . . ., Z k ) is Markovian with jump rates given by (5.2) to (5.5).In particular: a) The process Z 0 is Markovian and jumps with the rates b) The process (Z 0 , Z 1 ) is Markovian and jumps with the rates (0) is of order N/f (N ), the process ((f (N )/N )Z (N ) f (N )r ) r≥0 is on each time interval [0, r 0 ] for large N close (uniformly in r ∈ [0, r 0 ]) to the solution of the dynamical system with n −1 := 0. Without going into all details here, let us mention that two steps are needed to prove this convergence.First, we consider a modified version of the process Z (N ) , namely Z (N ) , where the rates in (5.3) and (5.4) are replaced by s N z k and 0 respectively.Choosing as the mass rescaling parameter the carrying capacity N/f (N ), we can directly apply Theorem 11.2.1 in [10] to the process ( Z (N ) (f (N )r)) r≥0 .Then applying Lemma C.1 in [7] as in the proof of Lemma 6.3, we obtain that the sum of the components of the process ( Z (N ) (f (N )r)) r≥0 does not reach a size of order N within a time of order ln N with a probability close to 1 for large N .The modification of the jump rates is thus negligible on a time scale of order 1, and the claimed convergence holds for the process (Z (N ) (f (N )r)) r≥0 .b) The system (6.3) has a unique attracting equilibrium (n k ) k∈N0 which follows the recursion n0 := 2(α − µ) and N .We thus obtain that for any r ≥ 0, lim Summing over k in (6.4) and defining n : c) Let (n k ) k∈N0 be defined by the recursion (6.4).In view of (6.4) and (6.5) it is clear that nk 2α k∈N0 is a sequence of probability weights which satisfies the recursion (2.9) and thus coincides with the probability weights (p k ) k∈N0 , that are defined in Theorem 2.3a).
The next lemma roughly says that for any k ∈ N 0 the process Z (N ) k with high probability grows quickly to a size of order N/f (N ) and stays there at least for a time of order f (N ) ln N , provided only that for some ℓ ≤ k the initial size of Z (N ) ℓ is not too small.In view of Lemma 5.1, the quantity N/f (N ) thus characterizes the typical size of the ASG on the f (N ) ln N -timescale.Lemma 6.3.Let (n k ) k∈N0 be given by the recursion (6.4).Let (Z 0 , Z 1 , ...) = (Z , ...) be a process with jump rates given by (5.2) to (5.5), and let R > 0. Then for any k ∈ N 0 and ε > 0, there exist finite constants C k and C k (ε) such that lim inf where δ(ε) → 0 as ε → 0.
The proof of this lemma will be given in Section 14.
The two just stated lemmas (which will be proved in Section 14) are key for obtaining the renewal structure of the dynamics of the potential ancestors with minimal load.They imply in particular that when the set of potential ancestors with the currently minimal load gets extinct, the number of minimum load potential ancestors that "come next" is large enough for reaching a size of order N/f (N ) given by the quasi-stationary distribution ν N .As we will see in Section 9, this will ensure, using duality, that the succession of several clicks (in the sense of Definition 2.1) within a time frame of order smaller than f (N )θ N is not likely.

Quick merging along the Ancestral Selection Graph
The main result of this section, which will be a key ingredient in the proofs of Theorems 2.2 and 2.3a), is an upper estimate for the time it takes for the merging of the sets of load k potential ancestors of two P T -measurable random sets J 1 T and J 2 T of [N ] × {T }, where T is a P-stopping time (recall Definition 3.4).Roughly stated, this result says that this merging happens with high probability as N → ∞ within a time frame of order f (N ) ln N , provided only that the sets J 1 T and J 2 T are sufficienly large.
With reference to Definition 4.6, we define the (random) merging time of the two load k ASG's A J 1 T (k) and A J 2 T (k) as Proposition 7.1.Let T be a P-stopping time and let J 1 T , J 2 T be P T -measurable random subsets of [N ] × {T }.Then, for any k ≥ 0 and ε > 0, there exists a finite constant C(ε) s.t.
Proof.The strategy of the proof consists in showing by induction that for all k ≥ 0 the sets merge with high probability within a time of order f (N ) ln N .Let us begin with the case k = 0. We will write A i T −r := A J i T T −r (0), r ≥ 0, i = 1, 2, and will study the dynamics of the set-valued process Click times for a variant of Muller's ratchet and of its cardinality as r increases.For the sake of readability, we define subsets H 1 , H 2 by Five possible types of transitions of H 1 △H 2 may result from the elements of the processes (C, S, M): • (i, j, T − r) ∈ S with j ∈ H 1 △H 2 and i / ∈ H 1 ∪ H 2 ; then i becomes an element of H 1 △H 2 .This type of transition adds one element to H 1 △H 2 and has a rate This type of transition removes one element from H 1 △H 2 and has a rate • (i, j, T − r) ∈ C with i and j either both belonging to H 1 \ H 2 or both belonging to This type of transition removes one element from H 1 △H 2 and has a rate • (i, j, T − r) ∈ C with one of i and j belonging to H 1 \ H 2 and the other belonging to H 2 \ H 1 ; then both i and j are removed from H 1 △H 2 .This type of transition removes two elements from H 1 △H 2 and has a rate This type of transition removes one element from H 1 △H 2 and has a rate The sum of q 2 , q 3 , q 4 and q 5 equals From Lemma 5.1 and Lemma 6.3 we know that for any ε > 0 there exists a constant C(ε) , then for any R > 0, with a probability close to 1 for ε small enough and N large enough, 3) and (7.7), in such a time window we have for N large enough.The process #(A 1 T −r △A 2 T −r ) r≥0 is thus stochastically dominated by a branching process with individual birth rate α/f (N ) and death rate The extinction time of such a process, with an initial state smaller than N , is smaller than 2 f (N ) ln N α − µ − 5ε with a probability converging to 1 when N goes to infinity (see e.g.[5] Lemma A.1).This concludes the proof of the proposition for the case k = 0.
Assume now that the sets From Lemma 6.3 we know that there exists R < ∞ such that for any K < ∞, with a probability close to one the size of this union is close to and remains to be so during any time frame of order f (N ) ln N .We also know that the sizes of A J 1 T (k) and A J 2 T (k) are close to N nk /f (N ) during the same time frame.Let us again use the abbreviations A 1 T −r and A 2 T −r , now for By definition of T k−1 we have the equality Another crucial observation is that the upward and downward jump rates of the process are the same as those of the process resulting from (7.3) -(7.6).(In particular, for T − r ≤ T k−1 , the mutational events only affect the set A The rest of the proof now follows that same lines as in the case k = 0. Similarly as in (7.1), we define the (random) merging time of the ASG's A J 1 T and Since in the special case m N = 0 the load zero ASG A J T t (0) equals the 'untyped' ASG A J t , we immediately obtain the following corollary by putting µ = 0 and k = 0 in Proposition 7.1: EJP 0 (2020), paper 0. Corollary 7.2.Let T, J 1 T , J 2 T be as in Proposition 7.1.Then for any ε > 0, there exists a finite constant C(ε) such that (7.2) also holds for

Click times on the Ancestral Selection Graph
In this section we will define for each N ∈ N a process of click times along the ASG back from some large time u N .The main result of the section will be Proposition 8.2, whose proof will build on results in Sections 5, 6 and 7. Roughly stated, this proposition says that the process of click times on the ASG, back from times that are large on the f (N )θ N -scale, converges on that scale locally around time 0 to a standard Poisson process.This result is key for the proof of Theorem 2.2.Indeed, in Section 9 we will argue that the process of (forward) click times figuring in Theorem 2.2, which are represented as the jump times of the counting process K * N defined in (4.2), is locally on the f (N )θ N -scale with high probability (as N → ∞) close to the process figuring in Proposition 8.2.This latter process, however, can be read off from the ASG decorated with the points of M. See Figure 4 and also Figure 1 for illustrations.Definition 8.1 (Backward click times).For N ∈ N, u ∈ R and ℓ = 0, 1, . . .we define (again partially suppressing N in the notation) the click times on the ASG back from [N ] × {u} as follows We thus get a point process For later reference we will consider a sequence (u N ) of time points with the property Putting T N,u N 0 := 0, we have for n ∈ N the following convergence in distribution as N → ∞: where (W g ) g∈N is a sequence of i.i.d.standard exponential random variables.Consequently, the sequence of processes N N , N ∈ N, defined by A large part of the remainder of this section is devoted to the proof of Proposition 8.2.
With A T t denoting the cardinality of the set A T t of load k potential ancestors (as defined just before Lemma 5.1, see also Definition 4.6), for any N ∈ N and any P (N ) -stopping time T we define the P (N ) -stopping time S N,T S N,T := sup t ≤ T : Click times for a variant of Muller's ratchet In words, among all times at which all the potential ancestral paths of the population that lives at time T carry at least one mutation, the time S N,T is the one which is closest to T .Let us also note that for fixed N the distribution of T − S N,T does not depend on the choice of the P (N ) -stopping time T , cf.Lemma 5.1.A key step in the proof of Proposition 8.2 is provided by Lemma 8.3.For any sequence of P (N ) -stopping times T N the sequence converges in law as N → ∞ to an exponential random variable with rate parameter 1.
Proof.The process Z (N ) 0 (r) := A T N T N −r (0), r ≥ 0, has the jump rates (6.1) and starts in N .Lemma 6.3 shows that the quasi-equilibrium of Z (N ) 0 builds up within a time of order f (N ) ln N when started in Z (N ) 0 (0) = N .Let now (θ N ) be as in Lemma 6.4.Since this (θ N ) obeys (2.7) and f (N ) satisfies f (N ) = o N ln ln N , we conclude from (2.7) that ln θ N ≫ ln ln N , and hence f (N ) ln N ≪ f (N )θ N .The asymptotic exponentiality of T N − S N,T N with the claimed time scaling thus follows from Lemma 6.4.
Let us now consider a sequence of (deterministic) times u N as in Proposition 8.2 and recall the definition of S N,T in (8.2).For each fixed N ∈ N define recursively The following corollary is now immediate from Lemma 8.3.Proof of Proposition 8.2.From Definition 8.1 we recall the point process T N,u N of click times on the ASG back from [N ] × {u N }.The strategy of the proof will be to compare this process "locally on the f (N )θ N -timescale" to the process S N,u N which on that scale according to Corollary 8.4 is approximately Poisson.
To this purpose we define for each N ∈ N and each time point t > 0 Let Ā T t be the set of minimum load potential ancestors at time t of the population at some (deterministic or random) time T , as specified in Definition 4.6 .For abbreviation we put For any fixed C > 0 we abbreviate t N := Cf (N )θ N .We will use the following properties (where always δ(ε) → 0 as ε → 0): (8.4) According to Lemmata 6.4 and 6.5 lim inf • A similar reasoning yields lim inf Then according to part a) of Corollary 8.4, lim sup • Finally, according to Proposition 7.1 (on the quick merging of load zero ASG's), lim inf From these facts we deduce that lim We proceed in a similar way to cover the time frame [0, t N ], which contains a random number of points of S N,u N that has a finite expectation.We thus add a sum of errors that converges to 0 as N → ∞, which allows us to conclude the proof.
9 Click rates: Proof of Theorem 2.2 The next lemma relates the click times of the ratchet, defined as the jump times of the process K * N given by (4.2), to the times T N,u N g obtained from the point process T N,u N of backward click times, see Definition 8.1 and Proposition 8.2.As will become clear from the following proof, each time T N,u N g with high probability 'announces' a click time of the ratchet, with the difference between those two times tending to zero in probability as N → ∞ on the f (N )θ N -scale.Lemma 9.1.For any τ ≥ 0, Proof.Since K * N (0) = 0 and T N,u N g > 0 a.s.for g > 0, we need only to consider the case τ > 0. For abbreviation we put t N := f (N )θ N τ .Let us recall the definition in (4.3) of the set Then one readily observes EJP 0 (2020), paper 0.
One more application of Lemma 10.1, now to the times T N in place of t N , implies Because of the definition of T N , the individual v is a load zero potential ancestor of some v * ∈ J (N ) t N . Consequently, with probability tending to 1 as N → ∞, which ends the proof.

First passage percolation in Poisson-decorated Yule trees
In this section we consider a Yule tree Y with splitting rate α, and regard Y as the union of the (infinitely many) lineages l leading from the root to ∞.More formally, we regard a realisation of Y as the union is the Ulam-Harris index set (meaning that the branch with index ι 1 . . .ι g ∈ U is the ι g -th branch born by the branch with index ι 1 . . .ι g−1 ), and τ ι is the birth time of the branch with index ι.We think of (ι, h) as a node in the tree y, and refer to h as its heigth.Given Y , let Π be a Poisson process on Y whose intensity is µ times the length measure on Y .(In Section 12 we will prove that these Poisson-decorated Yule trees indeed appear in the ASG as N → ∞, see Figure 1 for an illustration).Again we assume µ < α and define the minimal Π-load in Y as L := min{Π(l) : l is a lineage of Y }. (11.1)In this section we will use the abbreviation Proposition 11.1.Let the random variable L be defined by (11.1).a) Recall the Definition of G in (2.12).Then there exists some constant C ρ > 0 such that b) The probability weights of L (π k ) k∈N0 := (P(L = k)) k∈N0 satisfy the recursion (2.9) with π 0 = 1 − ρ, and thus are equal to the weights (p k ) k∈N0 , appearing in Theorem 2.3.
Proof. 1.The Yule tree Y together with its Poisson decoration Π (that enter in the definition of L in (11.1)) define a binary branching supercritical Galton-Watson tree G as follows: when moving away from the root of Y , every first encouter with a point of Π stands for a death in G , while every splitting point of Y stands for a birth in G .Hence G has offspring distribution P (no child) = q = 1 − P (two children) .
The event that G is finite equals the event that there is no lineage l in Y with Π(l) = 0, which in turn equals the event {L > 0}.A first step decomposition shows that the extinction probability of G is q/(1 − q) = ρ (cf.also (11.7) and (11.8) below), hence Exploring the lineages of Y beyond the points of Π that are closest to the root of Y , we encounter a self-similar situation: any such point can be seen as the root of an independent copy of G , and the event {L > 1} equals the event that all of these Galton-Watson trees are finite, which in view of (11.4) has probability where m 1 is the number of leaves of G .We put A first generation decomposition gives g(u) = qu + (1 − q)g(u) 2 . (11.7) From the two solutions of this equation only the function G given by (2.12) is admissible, since from (11.6) and (11.4) we have that g(1) = ρ < 1.Consequently, we have Combining (11.5) and (11.8) for u := ρ gives (11.2) for ℓ = 1.Proceeding further, {L > 2} means that all of the m 1 many Poisson points are founders of lineages that carry more than one point of Π.This event has probability which is (11.2) for ℓ = 2.For general ℓ ∈ N, formula (11.2) follows by induction.
3. From (2.12) we have For ℓ ∈ N 0 let a ℓ be defined by the right hand side of (11.2).Since G is twice continuoulsly differentiable with G(0) = 0 and G ′ (0) = q, we have for some 0 Hence with R ℓ := G ′′ (χ ℓ )ξ ℓ we obtain Since G is Lipschitz continuous on [0, 1] with Lipschitz constant ρ < 1 and G(0) = 0, we have a ℓ ≤ ρ ℓ+1 .From the fact that G ′′ is nonnegative and increasing we deduce that 0 In view of (11.2) this proves that the random variable L has the tail asymptotics (11.3).
4. Let e be the edge that is between the root of Y and its closest branch point.The random variable M := Π(e) satisfies P(M ≥ ℓ) = q ℓ , ℓ ∈ N 0 . (11.10) The random variable L satisfies the stochastic fixed point equation where L 1 and L 2 have the same distribution as L and L, L 1 , L 2 , M are independent. Hence From the independence of L 1 , L 2 we have From (11.10) we have Inserting this into (11.12)and observing (11.11) we obtain Observing that (1 − q)(1 + ρ) = 1 we arrive at In view of (11.3) we have P(L < ∞) = 1; hence Thus (11.14) together with (11.4) shows that (π k ) satisfies the recursion (2.9).
Remark 11.2.The setting of Proposition 11.1 gives an instance of Example 40 in [1]: our stochastic fixed point Equation (11.11) corresponds to Eq. ( 49) in [1] with a geometrically distributed "toll" random variable η.Thus, the results of Proposition 11.1 apply to a specific case of a situation which, according to [1], "does not seem to have been studied generally".As stated in Theorem 2.3d) and explained in Section 13, this connects to the asymptotic minimum of a branching random walk whose increment distribution is supported on R + .(See [15] and references therein for the asymptotics of minima of random walks with two-sided increment distributions.) EJP 0 (2020), paper 0.
In order to prepare for the connection between Proposition 11.1 and the decorated ASG of a sampled individual, we need some more notation.Definition 11.3.Let Y be the Yule tree described at the beginning of the section.For a node v ∈ Y , let a(v) be the path from v to the root, and for h > 0, k ∈ N 0 let Y h (k) be the set of nodes v of Y that have height h and obey Π(a(v)) = k.Finally, we define, as an analogue to (11.1), the minimal Π-load in Y up to height h as The following lemma says that the minimal Π-load of the (infinite) lineages in Y can with high probability be observed already at a height w N which is large but of smaller order than ln N as N → ∞; moreover at this height there are many nodes of the Yule tree whose ancestral paths collect this load.
Proof.As described in the proof of Proposition 11.1, an equivalent representation of a binary branching Galton-Watson tree with mutation at rate µ is a sequence of trees of different types killed at rate µ.Descendants of the root are of type 0. Every death of an individual of type 0 leads to a new binary branching Galton-Watson tree of type 1 and so on.Let us consider the event {L w N = k}.This event implies that all the trees of types l ≤ k − 1 are extinct at time w N .Every such tree is a supercritical tree with birth rate α and death rate µ.Conditioned on extinction, it is thus a subcritical tree with birth rate µ and death rate α, and it has a mean number of leaves )) and a finite mean extinction time.Hence lim Moreover, by definition, any tree of type k still alive at time w N is born before the death of the last alive type k − 1 individual.Let us denote by Y a binary Galton-Watson tree of type k with birth rate α and death rate µ.On the event of survival (see for instance [2] p.112), lim t→∞ (ln Y t )/t = α − µ.
On the event • There is a finite mean number of independent copies of Y and a positive number of them survive after time w N which goes to infinity with N .• These independent copies have a root born between the times 0 and w between heights 0 and t.Denoting by M (t) the minimum of the position of all the walkers alive at time t, we see that M (t) increases to the N 0 ∪ {∞}-valued random variable K := min{Π(l) : l ∈ T }, i.e. the minimum over the numbers of Poisson points carried by the infinite lineages in T .
Part d) This is part of the assertion of Proposition 11.1.
14 Proof of Lemmata 6.3, 6.4 and 6.5 This section is dedicated to the study of the process (Z (N ) (r), r ≥ 0).The proof of Lemma 6.3 relies essentially on the fact that a stochastic Lotka-Volterra process with large carrying capacity K resembles a supercritical process when its size is small and once close to its carrying capacity, stays in a neighboorhoud of the latter during any time of order ln K.This last property is stated in Lemma C.1 in [5], and will be instrumental in the following proof.
Notice that for N large enough and n ≤ e(α, µ, ε)N/f (N ), the birth and death rates defined in (6.This proves the lemma for k = 0 with C 0 (ε) = 6/ε and C 0 = 4ε.
Let us now take g ∈ N and assume that (6.6) holds true for k = 0, ..., g − 1. Jointly for all these k = 0, ..., g −1 we can take a time frame (which may be as long as we want on the f (N ) ln N -time scale) on which f (N )Z k /N ∈ [n k − C k ε, nk + C k ε], and Z g ≤ 4αN/f (N ).During this time interval, the birth rate of the Z g -population is larger than .
Likewise, the death rate is larger than and smaller than .
The remaining part of the proof is the same as in the case g = 0, again with an application of [5,Lemma C.1].
jump times of the counting process K * N are called the click times of the ratchet.

Theorem 2 . 3 (
Quasi-stationary type frequency profile).a) Assume that all individuals at time 0 are of type 0. Let (t N ) be a deterministic sequence of times such that t N f (N ) ln N → ∞ as N → ∞.

Figure 2 :
Figure 2: Graphical elements and their impact on the transport of types.Time is running from left to right, and in each of the four panels two levels (i = 1, 2) are considered, with the initial type configuration (0, 0).Selective and neutral arrows are drawn with dashed and solid shafts, respectively.Mutations are drawn as circles.Because of the rules described in Definition 3.3, some of the mutations do not have an effect on the outcome of the types at the final time; these mutations are represented as filled black circles.

Definition 3 . 4 (
Forward and backward filtrations).For t ∈ R let C ≤t and S ≤t be the restrictions of C and S to i,j∈[N ],i̸ =j

(4. 2 )
Definition 4.6.a) (Load For a P-stopping time T and a P T -measurable set J T ⊂ [N ] × {T } (cf.Definition 3.4), the joint dynamics of the set-valued processes A J T T −r (k) r≥0 , k ∈ N 0 , as specified in Definition 4.6, is driven in a P-adapted manner by the Poisson point processes (C, S, M) introduced in Definition 3.1.With regard to Definitons 4.1 and 4.2 we will now describe the actions of (C, S, M) on the sets A Jt t−r (k) (note the analogy and the differences to Definition 3.3 for the transport of type configurations, which there was forward in time).

. 4 )•
Mutation: for any k ∈ N 0 , z → z + e k+1 − e k with rate m N z k and for any k ∈ N, z → z + e k − e k−1 with rate m N z k−1 .

Figure 4 :
Figure 4: Key to the analysis of the click times of the ratchet are the instances at which (seen backward in time) the paths with minimal load are lost.In this figure we observe how the click times forward and backward are different, but strongly related to each other and also close in time.Both panels contain the same graphical elements, with the left panel showing the forward transport and the right panel showing the backward transport of M-distances.In each case, the points with M-distance 0 from the left respectively from the right boundary are shown by thick lines, and clicks are indicated by a large circle.

. 2 )
Remark 6.2.a) An inspection of the rates (5.2) to (5.5) and an application of a dynamical law of large numbers[10, Theorem 11.3.2]show that if Z (N ) 0

(8. 1 ) 8 . 2 .
Proposition Let (u N ) obey (8.1) and (T N,u N g ) 1≤g≤g N be the points contained in the set T

Corollary 8 . 4 .
Let S N,u N ℓ be defined by (8.3).a) The sequences S N,u N ℓ−1 − S N,u N ℓ f (N )θ N ℓ≥1 converge as N → ∞ in the sense of finite dimensional distributions to a sequence of i.i.d.standard exponential random variables.b) Let C be an arbitrary positive constant.The sequence of point processes converges, when restricted to [0, C] in distribution to a standard Poisson point process restricted to [0, C].