Haldane's formula in Cannings models: The case of moderately weak selection

We introduce a Cannings model with directional selection via a paintbox construction and establish a strong duality with the line counting process of a new \emph{Cannings ancestral selection graph} in discrete time. This duality also yields a formula for the fixation probability of the beneficial type. Haldane's formula states that for a single selectively advantageous individual in a population of haploid individuals of size $N$ the prob\-ability of fixation is asymptotically (as $N\to \infty$) equal to the selective advantage of haploids $s_N$ divided by half of the offspring variance. For a class of offspring distributions within Kingman attraction we prove this asymptotics for sequences $s_N$ obeying $N^{-1} \ll s_N \ll N^{-1/2} $, which is a regime of"moderately weak selection". It turns out that for $ s_N \ll N^{-2/3} $ the Cannings ancestral selection graph is so close to the ancestral selection graph of a Moran model that a suitable coupling argument allows to play the problem back asymptotically to the fixation probability in the Moran model, which can be computed explicitly.


Introduction
In population genetics the standard model for the neutral reproduction of haploid individuals in discrete time and with constant population size N is the classical Wright-Fisher model : the offspring numbers from one generation to the next arise by throwing N times a die with N faces and are thus Multinomial(N ; 1/N, . . . , 1/N )-distributed. This is a special instance within the general class of Cannings models, see [4] and [8,Chapter 3.3], where the offspring numbers are assumed to be exchangeable and to sum to N .
In the Wright-Fisher model directional selection can be included by appropriately biasing the weights of individuals that have a selectively beneficial type or a wildtype. While for general Cannings models it is not completely clear how to incorporate selection, a biasing of weights can be done in a natural way for the large class of Cannings models that admit a paintbox representation (in the sense that the N -tuple of offspring numbers is mixed multinomial, see Section 2.1).
For this class of models we introduce a graphical representation which extends to the case with directional selection, and leads to a time discrete version of the ancestral selection graph that was developed by Krone and Neuhauser in [16] for the (continuous time) Moran model.
Recently González Casanova and Spanò constructed in [11] an ancestral selection graph for a special class of Cannings models. While their construction relies on analytic arguments, we provide here a probabilistic construction which works for a wider class of models and also gives a clear interpretation of the role of the geometric distribution of the number of potential parents in this context. This construction will be explained in Section 4. We will prove a sampling duality between the Cannings frequency process and the line counting process of the discrete ASG (alias Cannings ancestral selection process or CASP ), see Theorem 3.1. This also allows to obtain a handsome representation of the fixation probability of the beneficial type in terms of the expected value of the CASP in equilibrium, see Corollary 3.3.
The calculation of fixation probabilities is a prominent task in mathematical population genetics; for a review including a historical overview see [20]. A classical idea going back to Haldane, Fisher and Wright (see [12], [9] and [22]) and known as Haldane's formula, is to approximate the probability of fixation of a beneficial allele with small selective advantage s by the survival probability π(s) of a supercritical, near-critical Galton-Watson branching process, π(s) ∼ s ρ 2 /2 as s → 0 (1.1) where ρ 2 is the offspring variance and 1 + s is the expected offspring size. In Remark 3.6b) we will briefly discuss perspectives and frontiers of a derivation of (1.1) in terms of a branching process approximation. Couplings with Galton-Watson processes were used by González Casanova et al. in [10] to prove that (1.1) indeed gives the asymptotics for the fixation probability for a class of Cannings models (with mixed multi-hypergeometric offspring numbers) that arise in the context of experimental evolution. This was achieved under the assumption that sN ∼ N −b with 0 < b < 1/2, i.e. for a moderately strong selection. There, the question remained open if (1.1) also captures the asymptotics of the fixation probability for sN ∼ N −b with 1/2 ≤ b < 1. For the case 1/2 < b < 1, Theorem 3. 5 gives an affirmative answer for subclasses of Cannings models admitting a paintbox representation, and in particular also for the Wright-Fisher model with selection. In Theorem 3.5a) we prove Haldane's formula under the condition (with η > 0) for a class of paintboxes that satisfy in particular Möhle's condition (which guarantees that the coalescents of the neutral Cannings model are in the domain of attraction of Kingman's coalescent, see [19]). Under these assumptions we show in Section 5 that the CASP is close to the ASG line counting process of a corresponding Moran model over a long period of time. Indeed, for a Moran model with directional selection, a representation of the fixation probability in terms of the ASG line counting process is valid, and the fixation probability can be calculated explicitly; this we explain in Section 2.4. Relaxing (1.2), in Theorem 3.5 b) we prove Haldane's formula for sequences (sN ) with under more restrictive moment conditions on the paintbox. Examples fulfilling these moment conditions include the Wright-Fisher case as well as paintboxes that are of Dirichlet-type, for more details see Section 3.2. The main tool of the proof under these conditions is a concentration result on the equilibrium distribution of the CASP, see Section 7. This yields a sufficiently good estimate of the expected value of the CASP in equilibrium to show Haldane's formula by means of the above-mentioned Corollary 3.3.

Cannings models and Moran models with selection 2.1 A paintbox representation for the neutral reproduction
In a neutral Cannings model with population size N , the central concept is the exchangeable N -tuple ν = (ν1, . . . , νN ) of offspring sizes, with non-negative integer-valued components summing to N . A reasonably large class of such random variables ν admits a paintbox construction, i.e. has a mixed multinomial distribution with parameters N and W , where W = (W1, W2, . . . , WN ) is an exchangeable random N -tuple of probability weights taking its values in ∆N = (x1, x2, . . . xN ) : xi ≥ 0, While this is clearly reminiscent of Kingman's paintbox representation of exchangeable partitions of N, here we are dealing with a finite N . As such, obviously, not all exchangeable offspring sizes are mixed multinomial -consider e.g. a uniform permutation of the vector (2, . . . , 2, 0, . . . , 0). On the other hand, the exchangeable mixed multinomials cover a wide range of applications; e.g., they can be seen as approximations of the offspring sizes in a model of experimental evolution, where at the end of each reproduction cycle N individuals are sampled without replacement from a union of N families with large i.i.d. sizes; see [10] and [2], where the distribution of the family sizes was assumed to be geometric with expectation γ = 100. This leads to a mixed multi-hypergeometric offspring distribution, whose analogue for γ = ∞ would be a mixed multinomial offspring distribution with L (W ) the Dirichlet(1, . . . , 1)-distribution on ∆N . Let us now briefly review the graph of genealogical relationships in a Cannings model. In each generation g, the individuals are numbered by i ∈ [N ] := {1, . . . , N } and denoted by (i, g). A parental relation between individuals in generation g and g − 1 is defined in the following way. Let W (g) , g ∈ Z, be i.i.d. copies of W . Every individual (j, g) is assigned a parent (V (j,g) , g−1) in generation g−1 by means of an [N ]-valued random variable V (j,g) with conditional distribution P(V (j,g) = i|W (g−1) ) = W . For each g ∈ Z, the random variables V (j,g) , j = 1, . . . , N , are assumed to be independent given W (g−1) . Also, for each g ∈ Z, due to the exchangeability of (W (g−1) 1 , . . . , W (g−1) N ), the random variables V (1,g) , . . . , V (N,g) are uniformly distributed on [N ], and in general are correlated. With this construction within one generation step we produce an exchangeable N -tuple of offspring sizes, i.e. the number of children for each individual (i, g − 1), i ∈ [N ]. Due to the assumed independence of the random variables W (g) , g ∈ Z, the offspring sizes as well as the "assignments to parents" (V (1,g) , . . . , V (N,g) ) are independent along the generations g.
Later in the paper we will deal with a sequence of Cannings models indexed by N , which will come along with a sequence (W (N ) ) and i.i.d. copies W (N,g) of W (N ) . For notational simplicity we will sometimes suppress the superscript N and keep writing W and W (g) instead of W (N ) and W (N,g) ; this will not lead to any ambiguities.

A paintbox representation incorporating selection
We now build directional selection with strength sN ∈ (0, 1) into the model. Assume that each individual has one of two types, either the beneficial type or the wildtype. Let the chances to be chosen as a parent be modified by decreasing the weight of each wildtype individual by the factor 1 − sN . In other words, if individual (i, g) has the wildtype the weight reduces to . Given the type configuration in generation g − 1, the parental relations are now generated in a two-step manner: First, assign the random weights W (g−1) to the individuals in generation g − 1, then follow the rule Individual (j, g) then inherits the type from its parent. Note that W (g−1) is measurable with respect to W (g−1) together with the type configuration in generation g − 1. Because of the assumed exchangeability of the W (g−1) i , i = 1, . . . , N , the distribution of the type configuration in generation g only depends on the number of individuals in generation g − 1 that carry the beneficial type. Thus, formula (2.1) defines a Markovian dynamics for the type frequencies. We will denote the number of wildtype individuals in generation g by Kg, and will call (Kg)g=0,1,... a Cannings frequency process with parameters N , L (W ) and sN . In particular, (2.1) implies that given {Kg−1 = k}, Kg is mixed Binomial with parameters N and P (k, W ), where

The Cannings ancestral selection process
Again let N ∈ N, W as in Section 2.1, and sN ∈ (0, 1). The Cannings ancestral selection process (CASP) A = (Am)m=0,1,... with parameters N , L (W ) and sN counts the number of potential ancestors in generation g − m of a sample taken in generation g. We will give a graphical representation in Section 4; in the present section we define the CASP as an [N ]valued Markov chain whose one-step transition is composed of a branching and a coalescence step as follows: Given Am = a, the branching step takes a into a sum H = a =1 G ( ) of independent Geom(1 − sN )-random variables; in other words, the random variable H has a negative binomial distribution with parameters a and 1 − sN , and thus takes its values in {a, a + 1, . . .}. (Here and below, we understand a Geom(p)-distributed random variable as describing the number of trials (and not only failures) up to and including the first success in a coin tossing with success probability p.) The coalescence step arises (in distribution) through a two-stage experiment: first choose a random W according to the prescribed distribution L (W ), then, given W and the number H from the branching step, place H balls independently into N boxes, where Wi is the probability that the first (second,. . . , H-th) ball is placed into the i-th box, i = 1, . . . , N . The random variable Am+1 is distributed as the number of nonempty boxes.
To emphasize the dependence of A on N we sometimes will write ) r≥0 , which we call the Moran ancestral selection process (or MASP for short), is a Markov jump process with jumps from k to k + 1 at rate ksN N −k N for 1 ≤ k ≤ N − 1 and from k to k − 1 at rate γ N k 2 , see [16]. The well-known graphical representation of the Moran model yields a strong duality between (Y

Haldane's formula in the Moran model
where t ≥ 0 and k, n ∈ [N ]. Specializing (2.3) to k = N − 1 and n = N gives Taking the limit t → ∞ in (2.4) leads to denotes the stationary distribution of (B (N ) r ) r≥0 . As observed in [5], B (N ) eq is a binomially distributed random variable with parameters N and pN := 2s N 2s N +γ that is conditioned not to vanish. In particular, For sN = α N , α > 0 (the case of weak selection) this specializes to Kimura's formula [15] For N −η ≥ sN ≥ N −1+η (the case of moderate selection) we obtain Haldane's formula and for s > 0 (the case of strong selection) this results in 3 Main results

Duality of Cannings frequency and ancestral selection process
For N ∈ N, W = W (N ) as in Section 2.1, and sN ∈ (0, 1), let (Kg) g≥0 = ((K (N ) g ) g≥0 ) be the Cannings frequency process with parameters N , L (W ) and sN as defined in Section 2.2, and let (Am) m≥0 be the Cannings ancestral selection process with parameters N , L (W ) and sN as defined in Section 2.3. (3.1)

Remark 3.2.
A strong (pathwise) version of the duality relation (3.1) will be provided by formula (4.6) in Section 4, which roughly spoken says that "A sample from generation g is entirely of wildtype if and only if all of its potential ancestors in generation 0 are of wildtype".
Expressed in terms of Kg and Ag, the sampling duality relation (3.1) reads (as in the case of the Moran model, see Section 2.4) In the very same way as we derived (2.5) from (2.3) we obtain (first specializing (3.2) to k = N − 1 and n = N and then taking the limit g → ∞) the following With a single beneficial mutant in generation 0, the beneficial type goes to fixation if and only if the beneficial mutant is among the potential ancestors in generation 0 of the population at a late generation g. In the limit g → ∞ the number of these potential ancestors is distributed as A (N ) eq , and given A (N ) eq , the probability that the beneficial mutant is among them is N ) be as in Section 2.1, and assume that for some

Haldane's formula for Cannings models with selection
with the O(·)-terms referring to N → ∞. (The requirement ρ 2 ≥ 1 is natural because We consider the following condition: Note that (3.4) and (3.5) imply (3.6) For Cannings processes admitting a paintbox representation, (3.6) is equivalent to Möhle's condition, see [19], which, in turn, is equivalent to the neutral Cannings coalescent being in the domain of attraction of a Kingman coalescent as N → ∞. We also consider the following condition which (see Remark 3.6 a)) is not implied by (3.4) and ( such that for all sufficiently large N and all n ≤ 2hN (3.8) for some K ≥ 1 not depending on N and n.
provided one of the following additional requirements a) or b) is satisfied: The proof of part a) will be given in Section 5 and that of part b) in Section 6. Here we give brief sketches of the proofs. Our method of proof of Theorem 3.5 employs Corollary 3.3, for part a) by a coupling of the Cannings ancestral selection process (CASP) (Am) and the corresponding ancestral selection process of the Moran model (MASP) (Br), and for part b) via a concentration analysis of the equilibrium distribution of (Am). In both parts it is essential that sN is not too large; this is guaranteed by Conditions (1.2) and (1.3), respectively. As described in Section 2.3, the size of sN is responsible for the size of the upward jumps of (Am), and thus it turns out that the random variable A (N ) eq depends on sN in a stochastically monotone way.
In particular, our analysis of A (N ) eq which makes Corollary 3.3 applicable for the derivation of (3.10) relies on the assumption sN N −1/2 . Part a): Coupling of CASP and MASP We know from Section 2.4 that the asymptotics (3.9) holds for the fixation probabilities π M N (starting from a single beneficial mutant) in a sequence of Moran(N )-models with neutral reproduction rate ρ 2 /2 (or equivalently with pair coalescence probability ρ 2 ) and selection strength sN . We show in Section 5 that thanks to Conditions (1 + o(1)), which proves Theorem 3.5a).
The relevance of Condition (1.2) for our proof of part a) of Theorem 3.5 can heuristically be seen as follows. An inspection of the jump probabilities described in Section 2.3 shows that in a regime of negligible multiple collisions the quantity a Part b): Concentration analysis of the CASP equilibrium distribution To show that the expectation of the CASP in equilibrium is 2 ρ 2 N sN (1 + o(1)), we show in Lemmata 6.4 and 6.5 that the CASP needs an at most polynomially long time, i.e. a time of order N c for some c > 0, to enter a central region (i.e. an interval of moderately large size around the center of attraction) of the CASP from outside, but (as proved in Proposition 6.3) does not leave this central region up to any polynomially long time with sufficiently high probability. For this purpose we couple A with a random walk that makes jumps only of limited size. To show that large jumps (upwards or downwards) are negligible, we make use of the assumption sN N −1/2 . The probability that the CASP in a state not too far from the "center" (that is in a state of the order of N sN ) makes a jump at least of size hN (with 1 hN N sN ) upwards can essentially be estimated by the probability that at least hN individuals each give rise to at least two branches in the branching step (described in Section 2.3). The probability that in the branching step an individual gives rise to at least two branches is ≈ sN . Thus the probability that for individuals 1, ..., hN at least two branches are generated at the branching step is Consequently, the probability for a jump of at least of size hN upwards can be estimated from To arrive at a similar lower bound for the probability of large jumps downwards Condition (3.8) is applied. Higher moments of the weight W1 have to be controlled in this case, since for a downwards jump several individuals have to choose the same parent(s), for more details see Lemma 6.2. b) (Moderately strong selection and Galton-Watson approximations) A regime for which it is possible to derive (3.9) by means of a Galton-Watson approximation is that of moderately strong selection 1 sN N −1/2 . A proof of this assertion (under somewhat more restrictive moment conditions on the weights W than those in Theorem 3.5 b) ) is the subject of the paper [3]; see also the discussion in the paragraph following (1.1) in the Introduction. Together with the approach of the present paper this does not yet cover the case sN ∼ N −1/2 ; we conjecture that Haldane's formula is valid also for this particular exponent.
Here is a quick argument which explains the relevance of the exponent 1/2 as a border for the applicability of a Galton-Watson approximation. The beneficial type is with high probability saved from extinction if the number of individuals of the beneficial type exceeds (of the order of ) s −1 N . Hence, for a proof via approximations with Galton-Watson processes one wants couplings of the CASP with GW-processes to hold until this number of beneficial individuals is reached. However, a Galton-Watson approximation works only until there is an appreciable amount of "collisions" between the offspring of the beneficial individuals in a branching step, since collisions destroy independence. By wellknown "birthday problem" considerations, such an amount of collisions happens as soon as there are (of the order of ) N 1/2 beneficial individuals in the population. Consequently, for the GW-approach we require s −1 c) (Possible generalisations) The introduced duality method for Cannings models with selection may well prove beneficial also in more general settings. The construction of the Cannings ancestral selection graph given in Section 4 can for example also be carried out in a many-island situation, with migration between islands in discrete or continuous time. This should then lead to generalizations of Theorems 3.1 and 3.5. Under Assumption (1.2), and with an appropriate scaling of the migration probabilities, one might expect that the Cannings ancestral selection graph is again close to the (now structured) Moran ancestral selection graph.
We conclude this section with the following lemma, which provides a class of examples of weights that fulfil Conditions (3.4), (3.5) and (3.8), and whose proof will be given at the end of Section 6.
Lemma 3.7. Consider the following choices of weight distributions L (W (N ) ): where Y1, ..., YN are independent copies of a non-negative random variable Y for which there exists a c > 0 such that i) Then, in both cases a) and b), the requirements of Theorem 3.5 b) are satisfied.
In [3], Haldane's formula was proved in the case of moderately strong selection for weights as in Lemma 3.7 b), with the condition E Y −c < ∞ relaxed to P(Y > 0) = 1. There, weights obeying (3.10) were termed of Dirichlet type. Obviously, the Wright-Fisher model is a special case, with Y ≡ 1.

The Cannings ancestral selection graph. Proof of Theorem 3.1
We now define the Cannings ancestral selection graph, i.e. the graph of potential ancestors in a Cannings model with directional selection as announced in Section 2.3. The final harvest of this section will be the proof of Theorem 3.1.
While the branching-coalescing structure of the Moran ancestral selection graph and the sampling duality stated in Section 2.4 serve as a conceptual guideline, the ingredients of the graphical construction turn out to be quite different from the Moran case, not least because of the discrete generation scheme. We first describe how, given W (g−1) and the configuration of types of the individuals (i, g − 1) in generation g − 1, the parent (as well as the type) of an individual (j, g) is constructed from a sequence of i.i.d. uniform picks from the unit square. After this we describe how, given W (g−1) (and without prior knowledge of the type configuration in generation g − 1), the just mentioned i.i.d. uniform picks from the unit square lead to the potential parents of an individual (j, g). The latter form a random subset To this purpose, as illustrated in Figure 1, think of the two axes of the unit square as being partitioned in two respectively N subintervals. The two subintervals that partition the horizontal unit interval are [0, 1 − sN ] and (1 − sN , 1]. The N subintervals of the vertical unit interval have lengths W (4.1) and g ∈ Z, let U (j,g,1) , U (j,g,2) , . . . be a sequence of independent uniform picks from [0, 1] × [0, 1] and put γ(j, g) := min{ : U (j,g, ) ∈ Γ (g−1) }.
Decreeing that individual (j, g) inherits the type of its parent, we obtain that a.s.
We thus get the transport of C (g−1) to the next generation g by putting . is a Cannings frequency process with parameters N , L (W ) and sN , as defined in Section 2.2. Indeed, given C (g−1) and W (g−1) , the random variables U (j,g,γ(j,g)) , j = 1, . . . , N , are independent and uniformly distributed on Γ (g−1) , hence (4.2) and the exchangeability of the components of W (g−1) implies that given {|C (g−1) | = k} (and with an arbitrary allocation of these k elements in the set [N ]), the random variable |C (g) | has a mixed Binomial distribution with parameters N and P (k, W ) specified by (2.2).
Let us now turn to a situation in which the type configuration of the previous generation is not given, i.e. in which the sets B (g−1) and C (g−1) and hence also the set Γ (g−1) is not know a priori.
U (j,g,G (j,g) ) = U (j,g,4) 0 0 1 Figure 1: This figure illustrates a case in which C (g−1) = {1, . . . , k}, B (g−1) = {k + 1, . . . , N }, γ(j, g) = 2, G(j, g) = 4. The individual (j, g) is of beneficial type. Since in this example γ(j, g) is strictly smaller than G(j, g), the individual (j, g) must be of beneficial type. i) For fixed j ∈ [N ] and g ∈ Z, let U (j,g,1) , U (j,g,2) , . . . be as in Definition 4.1 and define G(j, g) := min{ : is a potential parent of a potential parent of (j, g). By iteration this extends to the definition of the set A (j,g) m of potential ancestors of (j, g) in generation g − m, m ≥ 1, with A (j,g) 0 The a.s. equality of events asserted in the following lemma is both crucial and elementary.
Lemma 4.4. For j, g, U (j,g,1) , U (j,g,2) . . . , γ(j, g) as in Definition 4.1 and G(j, g) as in (4.3), (4.4) Proof. To see that the l.h.s. almost surely implies the r.h.s., consider the first pick that falls into the area Γ (g−1) and assume that it lands in a horizontal stripe belonging to a wildtype individual in generation g − 1. Then this must be also the first one of the picks that lands If all the picks up to and including this pick have landed in a horizontal stripe belonging to a wildtype individual in generation g − 1, then also the first pick that falls into the area Γ (g−1) must land in a horizontal stripe belonging to a wildtype individual in generation g − 1.
Combining (4.2) and (4.4) with Definition 4.3 we see that for all g ∈ N and all J ⊂ [N ] Iterating (4.5) we arrive at It is obvious that the random variables G(j, g) defined in (4.3) are independent of the W (g ) , g ∈ Z, and have the property This leads directly to the following observation on the number of potential ancestors.
where the first equality follows from Remark 4.2, the second one from (4.6) and the third one from Remark 4.5.
Another consequence of (4.6) together with Remark 4.5 is the following moment duality, which is interesting in its own right, not least because this was the route through which [11] discovered the "discrete ancestral selection graph" in the "quasi Wright-Fisher case", i.e. for P(W1 = · · · = WN ) → 1. Corollary 4.6. Let (Kg) and (Am) be as in Section 3.1, let k, n ∈ [N ] and assume that the number of wildtype individuals in generation 0 is k. Then the probability that a sample of n individuals taken in generation g ≥ 1 consists of wildtype individuals only is 5 Coupling of the Cannings and Moran ancestral selection processes. Proof of Theorem 3.5a.
In this section we provide a few lemmata preparing the proof of part a) of Theorem 3.5, and conclude with the proof of that part. In particular, in Lemma 5.9 we give a coupling of the Cannings ancestral selection process, for short CASP, (Am) m≥0 defined in Section 2.3 and the Moran ancestral selection process, for short MASP, (Br) r≥0 whose jump rates we recalled in Section 2.4. Assume throughout that the ∆N -valued random weights N ) fulfil the Assumptions (3.4) and (3.5) required in the first part of Theorem 3.5. Let (sN ) N ≥0 be a sequence in (0, 1) obeying (1.2). Frequently, we will switch to the notation For fixed N , and j ∈ [N ] let G (j) be independent and Geom(1 − sN )-distributed; these will play the role of the random variables G(j, g) defined in (4.1), see also (4.7). (Here and whenever there is no danger of confusion, we will suppress the superscripts N and g.) where the first term on the r.h.s. is the pair coalescence property of the neutral Cannings coalescent with the paintbox W .
Proof of Lemma 5.1. Recall that each transition of the CASP consists of a branching and a coalescence step. To arrive at the transition probabilities (5.2) -(5.5) we first estimate the probabilities that k individuals give rise to a total of k, k + 1 or more than k + 1 branches and then analyse the probabilities that a single individual is chosen multiple times as a parent.
Since each individual has a Geom(1 − sN )-distributed number of branches, the probability that k individuals give rise to a total of k branches in the branching step is and the probability that the individuals give rise to k + 1 branches is Adding the probabilities in (5.6) and (5.7) yields Let us now calculate the probabilities of collisions in a coalescence step, that is the probability that an individual is chosen as a potential parent more than once. For two branches the pair coalescence probability cN is given by In the same manner we obtain the probability for a triple collision as Using (5.8) and (5.9) we control the probability of the event E that there are two or more collisions, with k individuals before the coalescence step. There are two possibilities for this event to occur, either there is at least a triple collision or there are at least two pair collisions. This yields In order to estimate the probability of having exactly one collision we use the second moment method for the random variable X = k i=1 k j>i Xi,j, where Xi,j = 1 {i and j collide} . With Furthermore, the second moment of X can be written again due to (5.8) and (5.9) as This together with (5.11) yields where the first inequality follows by applying the Cauchy-Schwarz inequality to X and I {X>0} .
Together with (5.10) we obtain for the random variable X which counts the number of collisions (for k individuals before the coalescence step) Let H := Ag j=1 G (j) . Then the above calculations allow us to obtain (5.2): The remaining transition probabilities (5.3) -(5.5) are derived analogously.
Lemma 5.4 claims that the CASP comes down from N to (the still large state) N 1−b+ε within a time interval of length o(N b ) with a probability that converges quickly to 1 for N → ∞. The proof relies on the following Lemma 5.3. Therein, we show that with respect to coalescence the ancestral selection process of the Wright-Fisher model is extremal among the CASPs, in the sense that for this process the number C k of distinct occupied boxes after the coalescing half-step is stochastically the largest. . Then for each k ∈ N the random variable C k := |{Z1, . . . , Z k }| is stochastically largest for w1 = · · · = wN = 1 N .
Proof. We interpret the random variables C k in terms of the "coupon collector's problem". For ∈ [N ] let T := min{k : C k ≥ }. Then we have the obvious identity [1, Theorem 2] states that P(T ≤ k) is largest for w1 = · · · = wN = 1 N .
The quantities b, Am, τ appearing in the next lemma all depend on N ; we will suppress this dependence in the notation.
where for (5.13) we use the probability generating function of the negative binomial distribution and for (5.14) we use an estimate for the remainder of the corresponding Taylor expansion. Let 0 < ε < ε. From (5.15) follows . This yields For any m ≥ c1N b−ε ln N , for some appropriate constant c1 > 0 we have thus the estimate E [Am|A0 = N ] ≤ N 1−b+ε . By Markov's inequality we obtain as N → ∞. If (Am) m≥0 did not reach N 1−b+ε after c1N b−ε ln N steps we can start the process in N again and wait another c1N b−ε ln N steps and check whether the process did reach the level N 1−b+ε . By using this argument N δ 1 times this yields, for any 0 < δ1 < ε , the following upper bound for the probability to stay above N 1−b+ε for the generations m ≤ c1N b−ε +δ 1 : Since ε > ε and N b−ε +δ 1 < N b , we have (N ε −ε ) N δ 1 = O(exp(−N δ )) for some appropriate δ > 0 from which the assertion follows.
From Lemma 5.4 we obtain the following corollary: Corollary 5.5. Let (Am) m≥0 be a CASP. Then for any m0 ≥ 0 there exists a C > 0 such that for all N ≥ 1 and all j ≥ N 1−b+ε Proof. For simplicity assume that m0 = 0, but the same proof works for any m0 ∈ N as , with δ as in Lemma 5.4. By Lemma 5.1 we can compare the jump probabilities and obtain that there exists some x0 ≤ N 1−b+ε/2 such that above x0 the upward drift is smaller than the downward drift. This yields that the process stopped in x0 is a supermartingale. Consequently since x0 < N 1−b+ε , we have for any m ∈ N by the strong Markov property Hence by Markov's inequality we obtain which shows the first part. For the second part observe that The following three lemmata provide some properties about the Moran process and the coupling of a Moran process to a Moran process in stationarity. For the remainder of this section we will fix three constants δ1 ∈ (0, 1), 0 < δ3 < δ2/2. (5.16) The role of δ1 will be to specify a region 2s N 2s N +ρ 2 N (1 − δ1), 2s N 2s N +ρ 2 N (1 + δ1) around MASP's center of attraction. The constant δ2 will appear in factors N δ 2 that stretch some time intervals, and the constant δ3 will be an exponent in small probabilities O(exp(−N δ 3 )).
Proof. We proceed in a similar manner as [21] and separate the proof into two cases For case i) the proof relies on a stochastic domination of the MASP by a birth-death process, while for case ii) we construct a pure birth process that is stochastically dominated by the MASP. We start by proving case i). Assume the most extremal starting point B0 = N . We couple the process (Br) r≥0 with a birth-death process (Br) r≥0 which stochastically dominates (Br) r≥0 until (Br) r≥0 crosses the level 2s N 2s N +ρ 2 N (1 + δ1). (Br) r≥0 is defined as the Markov process with state space N0 and the following transition rates Note that βk ≥ sN k(N − k)/N and αk ≤ k 2 ρ 2 N for any k ≥ 2s N 2s N +ρ 2 N (1 + δ1). Hence, we can couple (Br) r≥0 and (Br) r≥0 such that Br ≤ Br a.s. as long as Br ≥ 2s N 2s N +ρ 2 N (1 + δ1). In particular, we have when we set τ 0 := inf r ≥ 0 : Br = 0 and k ≥ 2s N 2s N +ρ 2 N (1+δ1). For the birth-death process Br we can estimate τ 0, by a classical first step analysis Observe that α − β = s N ρ 2 for any δ3 < δ2. This proves part i).
Observe that βk ≤ sN k(N − k)/N and αk ≥ k  δ1). The extinction probability ξ0 of (B r ) r≥0 is the smallest solution of that is ξ0 = α β < 1. Let (B I r ) r≥0 be the pure birth process consisting of the immortal lines of (B r ) r≥0 , i.e. each line branches at rate (1 − ξ0)β. Let τ = inf{r ≥ 0 : Br ≥ 2 ρ 2 sN N (1 − δ1)} be the time when (Br) r≥0 reaches the level 2 ρ 2 sN N (1 − δ1) and define τ I and τ in the same way for the processes (B I r ) t≥0 and (B r ) r≥0 respectively in place of (Br) r≥0 , then τ I ≥ τ ≥ τ a.s. In order to prove ii) it remains to show P(τ I ≥ N b+δ 2 ) = O(exp(−N δ 3 )) for δ3 > 0. We have We can estimate for δ2 > 0 by separating the time interval of length N b+δ 2 into N δ 2 /2 time intervals of length N b+δ 2 /2 and realizing that if (B I r ) t≥0 did not reach the level 2 ρ 2 sN N (1 − δ1) in a time interval of length N b+δ 2 /2 then in the worst case (B r ) r≥0 is 1 at the start of each time interval.
Lemma 5.7 (MASP's leaving time of the central region). Let (Br) r≥0 be a MASP started in x ∈ [ 2s N 2s N +ρ 2 N (1 − δ1), 2s N 2s N +ρ 2 N (1 + δ1)] and assume in addition to (5.16) that 0 < δ1 < 1 2 and 0 < δ2 < η 3 . Let S = inf{r ≥ 0 : To prove (5.20) we couple (Br) r≥0 with a symmetric (discrete time) random walk (Sn) n≥0 , and thus ignore the drift to 2s N 2s N +ρ 2 N . An application of Theorem 5.1 iii) of [13] yields that (Br) r≥0 makes at most N 1−b+2δ 2 many jumps in a time interval of length N b+δ 2 with probability 1 − O(exp(−N 1−b+2δ 2 ), see also the estimate (5.24) in Lemma 5.9 below, where we analyse the jumps and jump times of the MASP in more detail. Hence, for some appropriate c > 0 independent of N . To obtain equation (5.21) and inequality (5.22) we used the reflection principle and Hoeffding's inequality. This finishes the proof. Proof. We follow a similar strategy as the one used in the proof of Lemma 2.10 in [21]. Let (B eq r ) r≥0 be a MASP started in the stationary distribution. Assume that in the graphical representation at time 0 either the lines of B0 are contained in B eq 0 or vice versa. Then Br ≤ B eq r , for all r ≥ 0, or vice versa B eq r ≤ Br. Then P( N δ 3 )) follows, once we show that at time N b+δ 2 both processes are equal with probability The tuple (B eq r , Br) r≥0 , and the tuple (Br, B eq r ) r≥0 resp., have the following transition rates: jumps from (k, ) for 1 ≤ k ≤ ≤ N to We begin with Case i). Consider the process (Zr) r≥0 defined as Zr := Br − B eq r and condition on the two events that the process B eq 0 is started in a state in [ 2s N 2s N +ρ 2 N (1 − δ1), 2s N 2s N +ρ 2 N (1+δ1)] and stays in [ 2s N 2s N +ρ 2 N (1−2δ1), 2s N 2s N +ρ 2 N (1+2δ1)] for some 0 < δ1 < 1 2 . The probability of each event can be estimated by 1 − O(exp(−N δ 2 )), the former event by Hoeffding's inequality and the latter with Lemma 5.7. The process (Zr) r≥0 jumps from z to z + 1 at most at rate snz and under the above condition (Zr) r≥0 jumps from z to z − 1 at least at rate ρ 2 2s N 2s N +ρ 2 (1 − 2δ1)z: If (Zr, Br, B eq r ) = (z, , k) jumps to (z − 1, − 1, k) occur at rate ρ 2 N ( z 2 + zk) and jumps to (z − 1, , k + 1) at rate ksN −k N . Therefore, the process (Zr) r≥0 jumps from z → z − 1 at rate rz,z−1 = ρ 2 N ( z 2 + zk) + ksN z N . Due to the condition and the assumption that ≥ k ≥ 2s N 2s N +ρ 2 N (1 − 2δ1) we can bound Hence, we can couple (Zr) r≥0 to a birth-death process (Z r ) r≥0 with individual birth rate sN =: β and individual death rate ρ 2 2s N 2s N +ρ 2 (1 − 2δ1) =: α , such that Zr ≤ Z r a.s. Let ξ := inf{r ≥ 0 : Zr = 0} and ξ := inf{r ≥ 0 : Z r = 0}. Obviously it holds P(ξ ≥ r) ≤ P(ξ ≥ r) for all r ≥ 0. As in the proof of Lemma 5.6 we estimate Since Z0 ≤ N the probability that all lines go extinct before time N b+δ 2 can be estimated by which proves Lemma 5.8 in Case i).
In Case ii) we first wait until (Br) r≥0 reaches the level 2 ρ 2 sN N (1 − δ1) within a time interval of length O(N b+δ 2 ) with probability 1−O(exp(−N δ 3 )) due to Lemma 5.6 and we assume that B eq 0 is started in at least 2 ρ 2 sN N (1 − δ1), which happens with probability 1 − O(exp(−δ 2 1 N )) due to Hoeffding's inequality. Then due to Lemma 5.7 both processes remain bounded from below by 2 ρ 2 sN N (1 − 2δ1). When (Br) r≥0 has reached at least the level 2 ρ 2 sN N (1 − δ1)) consider Zr = Br − B eq r . Then the same arguments as in Case i) show the claim.
As mentioned in the sketch of proof of Theorem 3.5 in Section 3 we aim to couple the CASP with the MASP. We have seen in the calculations before that in the regime where the number of potential ancestors is at most of order N 1−b+ε for ε sufficiently small the transition probabilities of these two processes are essentially the same for a time interval of length of order O(N b+ε ). In particular in a time interval of length O(N b+ε ) we can exclude jumps of size 2 or bigger in the CASP with probability O(N −δ )). Proof. Let A0 = B0 = k0 ≤ N 1−b+ε . We will show that the CASP and the MASP can be coupled such that the jump times of the CASP and the MASP occur consecutively with probability 1 − O(N −δ ). Since the transition probabilities of the CASP and the MASP are essentially the same we can also couple the jump directions with high probability. To show that the jump times occur consecutively we first show the following claim.  N δ 3 )). Denote by r k,k+1 and r k,k−1 the jump rates for the MASP from k to k + 1 and from k to k − 1 respectively with γ = ρ 2 . Then Define r k = r k,k+1 + r k,k−1 the total jump rate and the maximal jump rate. We aim for the coupling to hold for an interval of length N b+ε . The jump times of (Br) r≥0 are exponentially distributed with a parameter bounded from above by r . To estimate the number of jumps falling into an interval of length N b+ε we use Theorem 5.1 iii) in [13]. Let (Xi) i≥1 be a family of independent Exp(r ) distributed random variables. For c = 1 − b + 4ε Theorem 5.1 iii) yields that is the number of jumps is bounded by be the i-th jump of the CASP with the convention be the i-th jump of the MASP again with the convention that T B −1 = 0. We have , the latter being the error terms from (5.3) and (5.4). Note that e k,N , f k,N ≥ 0 because the CASP can make jumps of size 2 or larger. Set d k,N = e k,N + f k,N . We show that we can couple the times T A i and T B i , such that T B i+1 < T A i for i = 1, ..., N 1−b+4ε with probability 1−O(N −δ ). From that follows the Assertion (5.23) of the lemma by coupling the jump directions. We couple the jump times T A i and T B i such that for all i ∈ {1, ..., N 1−b+3ε } from which follows the assertion. We explicitly construct the coupling for i = 1, and the same holds for any i ∈ {1, ..., N 1−b+4ε }. To show (5.25) observe that, if A0 = k = B0 we can couple T A 1 and T B 1 by setting 1 almost surely. The coupling holds due to a) if We can upper bound the probability in (5.26) if we assume T B 2 − T B 1 ∼Exp(r k+1 ), thus we obtain for E2 ∼Exp(r k+1 ) which proves (5.25). Together with Claim 1 this proves the assertion of the lemma.
We are now able to complete the proof of Theorem 3.5a. and a suitable δ > 0. This yields We analyse the two expectations above separately, the first one will give us the desired Haldane formula, whereas the second is an error term of order o(sN ). By Lemma 5.8 we get that with B (N ) eq d = Bin(N, 2sN 2sN + ρ 2 ) conditioned to be strictly positive, (5.29) o(sN )).
It remains to bound the second expectation on the r.h.s. of (5.28), with the worst case being A − N b+ε = N . Then using the second part of Corollary 5.5 gives us since ε > 0 can be chosen small enough such that δ > ε. This finishes the proof of Theorem 3.5.
where Z is a standard normal random variable.
Remark 5.11. Using the technique of [17] it is not difficult to show that A (N ) converges in distribution as N → ∞ uniformly on compact time intervals to the solution of a dynamical system whose stable fixed point is 1. One might then also ask about the asymptotic fluctuations of the process A (N ) . Although available results in the literature (like [18,Theorem 8.2] or [7, Theorem 11.3.2]) do not directly cover our situation (because e.g. of boundedness assumptions required there), the coupling between A (N ) and B (N ) analysed above is a promising tool to obtain weak convergence of properly rescaled ancestral processes A (N ) to an Ornstein-Uhlenbeck process, which in view of Corollary 5.10 should include also time infinity. Let us mention in this context [5], which contains a fluctuation result (including time infinity) for the Moran frequency process under strong selection and two-way mutation.
6 A concentration result for the equilibrium distribution of the CASP. Proof of Theorem 3.5b. The proof of Theorem 3.5b is then immediate from Corollary 3.3. Let us describe here the strategy of our proof. We will show that the distribution of A (N ) eq is sufficiently concentrated around the "center" 2 ρ 2 sN N as N → ∞. Throughout, we will fix a sequence (hN ) obeying (3.7) such that (3.8) is satisfied. As in the previous section we will switch to bN defined by (5.1). The Assumption (1.3), which is now the standing one, thus translates into Frequently we will suppress the subscript N in bN , thus denoting the sequence sN N simply by N 1−b . We will show in the subsequent lemmata that the CASP A (N ) needs only a relatively short time to enter a small box around 2 ρ 2 sN N , compared to the time it spends in this box. The former assertion is provided by Lemmata 6.4 and 6.5. The behaviour of A (N ) near the center is controlled by Proposition 6.3. This is prepared by Lemmata 6.1 and 6.2 which bound the probability of jumps of absolute size larger than hN near the center. The estimates achieved in the lemmata allow to bound from above and below the process A := A (N ) by processes A u and A on an event of high probability. The process A u moves only in the box I u = [n (γ) , n (α) ] which is close to the center (i.e. n (γ) − 2 ρ 2 N 1−b and n (α) − 2 ρ 2 N 1−b N 1−b ). All (upward or downward) jumps of A of size 2, . . . , hN are replaced in A u by an upward jump of size hN , furthermore A u is reset to its starting value n (β) near the lower boundary of the box I u , see also Figures 2 and 3 for illustrations. The precise definitions of A u and A are given in the proof of Proposition 6.3.
The following lemma controls the probability of large upward jumps of A near the center, using the construction of the branching step of the CASP described in Section 2.3.
reset to n (β) I u Figure 2: A sketch of the transition dynamics of the process A u which upper bounds stochastically the CASP with high probability. Mainly A u makes jumps only of size ±1, occasionally it jumps upwards by h N and whenever it reachesñ (γ) , it is reset to its starting point n (β) . The precise definitions of the quantities n (β) andñ (γ) as well as of the process A u are given in the proof of Proposition 6.3. Lemma 6.1 (Probability for large jumps upwards). Let k = κN 1−b for some κ > 0, then Proof. We want to estimate for independent Geom(p) with p = 1 − sN distributed random variables G (i) , i ≥ 1. With k = k + hN , S k a Bin(k , p)-distributed random variable, and a = k k , we can estimate the r.h.s. from above by P (S k ≤ k) = P S k ≤ ak , since the probability that at least k trials are necessary for k successes can be estimated from above by the probability to have at most k successes in k trials. Using the Chernoff bound for binomials we can estimate P S k ≤ ak ≤ exp(−k I(a)), (6.2) with rate function I(a) = a ln a p + (1 − a) ln 1−a 1−p . Inserting our parameters yields The dominating term above is 1 κ N b−1 hN ln(N 2b−1 ) and plugging this back into (6.2) one obtains Next, we set out to bound the probability for downward jumps of size at least hN near the center. In view of the construction of the coalescence step described in Section 2.3 this is settled by the following lemma.
Proof. We will suppress the superscript (N ) and write k := k (N ) , W := W (N ) . For h := hN let p h be the probability of the event that no more than k −h boxes are occupied. This is equal to the event that at least h collisions occur, where we think of the balls with numbers 1, . . . , k being subsequently sorted into the boxes and say that the ball with number ν produces a collision if it lands in an already occupied box. In the following we record the occupation numbers of (only) those boxes that receive more than one ball. These are of the form β = (β1, ..., β ) ∈ {2, ..., h + 1} with ∈ {1, ..., h} and β1 + ... + β − = h. For a given β of this form, and given boxes with |β| := β1 + ... + β , assume that β1 balls are sorted into the first box, β2 balls into the second box, etc, and the remaining k − |β| balls are sorted into arbitrary boxes (so that, as required, the number of occupied boxes is at most +k −|β| = k −h). Given the weights W1, . . . , WN , the probability to sort the first β1 balls into box 1, the following β2 balls into box 2, . . . , and finally β balls into box is There are N ! (N − )! many possibilities to choose different boxes out of N . Furthermore, there are k β 1 ,...,β ,k−|β| many possibilities to choose |β| many balls out of k balls and sort these balls into boxes, such that βi balls are sorted into box i. Hence, due to exchangeability of the weights W1, . . . , WN we get To obtain an upper bound of the r.h.s. of (6.3) we estimate the moments Since (W1, W2, ..., WN ) are negatively associated [14], we can use the property 2 in [14] of negatively associated random variables, which reads Applying Jensen's inequality we can estimate for β1 ≥ β2 Iterating the above argument, we obtain With regard to (6.3) and (6.4) we will now analyse the quantities For brevity we write (β) =: . Since |β| = h + and 1 ≤ ≤ h, we obtain from (3.8) for N sufficiently large and all β ∈ B the estimate For the rightmost term in (6.3) we have the estimate and the number of occupation vectors β appearing in the sum in (6.3) (i.e. the cardinality of B) can be estimated from above by (hN + 1) h N . Hence we obtain from (6.3), (6.5) and (6.6), Building on the previous two lemmata, the next result shows that the CASP does not leave the central region up to any polynomially long time, i.e. within a time frame of order N c for any c > 0, with high probability.

Proposition 6.3 (CASP stays near the center for a long time).
Consider α, β with 0 < α < β < 2b−1 3 , let and define τI := inf{m ≥ 0 : Am / ∈ I}. Then for all θ > 0 and all ε > 0 Figure 3: An example of a realisation of the processes A and A u , displaying that A u dominates A as long as it is below the level n (α) . Note that A u is reset to n (β) whenever it hits the levelñ (γ) ; see also Figure 2.
Proof. To show the above claim we bound stochastically A from above and from below by simpler processes, which in certain boxes close to the center of attraction of A follow essentially a time changed random walk dynamics with constant drift. In the first part of the proof we will construct a time-changed Markov chain A u that dominates A = (Am) m≥0 from above for a sufficiently long time. This construction will rely on Lemma 6.1. In this first part we will give all details; in the second part of the proof we will indicate how an analogous construction can be carried out "from below", then making use of 6.2. 1. Let γ ∈ (β, 2b−1 3 ) and write, in accordance with Figure 2, Observing that n (γ) < n (β) < n (α) , we consider the box and take n (β) as the starting point of both A and A u . The process A u will be such that A u makes only jumps of size −1, 1, hN and it is re-set to its starting point n (β) as soon as it hits the levelñ (γ) := n (γ) + hN .
Consider the following Markov chainĀ u . We decree that within the box (ñ (γ) , n (α) ] the processĀ u makes only jumps −1, +1 and +hN . Here, the probabilities for jumps −1, +1 of A u from an arbitrary state in (ñ (γ) , n (α) ] are set equal to the probabilities for jumps −1, +1 of A from the state n (γ) , and the probability for a jump +hN ofĀ u from an arbitrary state in (ñ (γ) , n (α) ] is set equal to the probability of a jump of A from the state n (γ) that has an absolute size larger than 1. More formally, we define and observe that c (γ) N = P |Am+1 − Am| > 0|Am = n (γ) = sN n (γ) + n (γ) 2 We define the jump probabilities as follows: which results in a small downwards drift ofĀ u in I u . The process A u is defined as follows. Denote by τi the time of the i-th non-trivial jump (that is a jump of size = 0) of A for i ≥ 1 and let τ0 = 0. We set for τi−1 ≤ m ≤ τi − 1, with i ≥ 1, We can now couple A and A u such that on the event EN ∩ FN and for all m ≤ TN we have Am ≤ A u m . In order to show that the probability of the event {A reaches n (α) before time TN } is bounded by the r.h.s. of (6.7) it thus suffices to show that From Lemmata 6.1 and 6.2 it is obvious that We claim that there exists a δ > 0 such that Write p hit for the probability that A u , when started from n (β) , hits (or crosses) n (α) before it hitsñ (γ) . The jump size of the process A u is at least -1 at each generation. Therefore, at least N 1−b−β generations are necessary to reach the level n (γ) when starting from level n (β) . Thus, with the θ given in the proposition, within N θ generations the process A u makes at most N θ /N 1−b−β transitions from n (β) to n (γ) . The probability that A u crosses n (α) within a single excursion from n (β) that reachesñ (γ) is obviously bounded from above by the probability that the process Am := m i=1 Xi, m = 0, 1, . . ., crosses the level n (α) , where the Xi are i.i.d. and distributed as the jump sizes of the processĀ u in a state x >ñ γ ; thus, other than the processĀ u the process A is not reset to n (β) at an attempt to crossñ (γ) . For λN = N −2γ one has for N large enough, since γ < 2b−1 3 . Therefore the process Ym = exp λN m i=1 Xi is a supermartingale. Defineτ = inf{m ≥ 0 : Am ≥ n (α) }, then by the Martingale Stopping Theorem which yields the bound p hit ≤ n (β) exp(−λN n (α) ). Thus, we can estimate which proves (6.10). Clearly, (6.10) and (6.9) imply (6.8), which completes the first part of the proof.
2. It remains to prove also the "lower part" of (6.7), i.e. to control the time it takes A to leave I in downwards direction. We argue similarly by defining a process A which bounds A from below in a box I correspondingly chosen to the box I u . This process is again a Markov chain that makes jumps of size 1 and -1 and (rarely) jumps of size −hN and whose drift coincides with that of A at the upper boundary of the box I . Due to Lemma 6.2 jumps downwards at least of size hN occur with exponentially small probability and hence, these jumps can be ignored in the time frame of interest and A is stochastically dominated by A with sufficiently high probability.
We now show that the time to reach the center from state N does not grow faster than polynomially in N , i.e. is of the order N c for some c > 0 with high probability. Lemma 6.4 (Coming down from N ).
and τB := inf{m ≥ 0 : Am ∈ B}. Then for any ε < ε < 2b−1 Proof. The proof will be divided in three steps. 1. The coalescence probabilities of the Wright-Fisher model are the smallest in our class of Cannings models with selection, see Lemma 5.3. Therefore, the stopping time τB for the CASP is stochastically dominated from above by the corresponding stopping time in a Wright-Fisher model with selection. Consequently, we assume in the following that W = (1/N, ..., 1/N ) and thus ρ 2 = 1. 2. We analyse the drift of (Am) m≥0 in each point y in B = [2N 1−b + 2N 1−b−ε , N ]. We will show in part 3 of the proof that for all y ∈ B . (6.11) The estimate (6.11) on the drift of A in B yields that for m0 Hence, after time N 2b+ε the process started in N is with a probability of at least 1 − N −ε (1 + o(1)) in B. If the process A did not enter B until time N 2b+ε , in the worst case the process is still in the state N . Therefore, recalling that δ ≤ ε − ε, the probability that the process is after time N 2b+2ε still above B can be estimated from above via 3. It remains to show (6.11). Recalling the "balls in boxes" description of the one-step transition probability of the CASP as described in Section 2.3, let 1C i be the indicator of the event Ci that the i-th box is occupied by at least one ball. We can rewrite sN ). The expectation in (6.14) is the generating function of a negative binomial distribution with parameters y and (1 − sN ) evaluated at 1 − 1 N , which allows to continue (6.14) as in (6.14) and using (6.16) yields that the r.h.s of (6.15) can be estimated from below by In order to show that the polynomial h(y) is bounded away from 0 as claimed in (6.11) we check that h is positive at the lower boundary and that the derivative h is positive on the interval B . We can factorise h as h(y) = yh(y), with It is straightforward to check for y0 = 2N 1−b + 2N 1−b−ε that h(y0) > 0. Thus it suffices to show that h is strictly positive on B . We have Hence h (y) > 0 is implied by the inequality which is fulfilled for all y ∈ B . Hence the drift E [A1|A0 = y] − y is negative for all states y ∈ B with minimal absolute value bounded from below by which proves (6.11).
Similar as Lemma 6.4 we show now that the time to reach the box B from below is also no longer than polynomial with high probability. Then as long as m0 is at most of polynomial order in N , since due to Proposition 6.3 after entering the box B the process Am does not leave the box up to any polynomially long time with probability 1 − O((N 1−2b+ε ) h N ). In particular, for m0 = cN b+ε ln N and some constant c > 0, we have E [Am 0 ] ≥ 2 ρ 2 N 1−b − N 1−b−ε . Now applying Markov's inequality yields (1)). (6.18) In the worst case at time m0 the process A is still in state 1. Iterating the argument in (6.18) yields that the probability that after time N b+ε m0 the process is still below 2 ρ 2 N 1−b − N 1−b−ε is of order O(exp(−N δ )), as claimed in the lemma. 2. Now it remains to show (6.17). For y ∈ B we observe that We analyse the first summand in the expectation separately, since the denominator and nominator both contain the random variable W1. where in the first inequality we used that y ∈ B is bounded from above. This gives the desired estimate (6.17) for the drift, and completes the proof of the lemma.
Completion of the proof of Theorem 3.5b. 1. For proving (6.1) we will make use of Corollary 3.3, and to this purpose derive asymptotic upper and lower bounds on the expectation of Aeq via stochastic comparison from above and below. Consider a time-stationary version A stat = (A stat m ) m∈Z and a CASP A = (Am) m≥0 that is started in N . We can couple both processes such that a.s. Am ≥ A stat m for all m ≥ 0. This implies Fix 0 < α < β < 2b−1 3 , and consider the box B α = [ 2 ρ 2 N 1−b ±N 1−b−α ] as well as the (smaller) box B β = [ 2 ρ 2 N 1−b ± N 1−b−β ]. Define τ B β := inf{m ≥ 0 : Am ∈ B β }, the first hitting time of B β , andτBα := inf{m ≥ τBα : Am / ∈ B α }, the first leaving time of B α . Choosing the time horizon m0 = N 2b+2ε with 0 < ε < 1 − b, we obtain By Lemma 6.4 the second summand on the right hand side is of order O(N exp(−N δ )) and by Lemma 6.3 the third summand is of order O(N (N 1−2b+α ) h N ). Concerning the first summand we obtain P(τBα > m0 ≥ τ B β ) = 1 − O(N (N 1−2b+α ) h N ). Observing that Am 0 = 2 ρ 2 N 1−b (1 + o(1)) whenever Am 0 ∈ B α we thus conclude (1)).
This yields the desired upper bound on E A (N ) eq . The same argument applies for the lower bound, where we use a CASP started in the state 1 and apply Lemma 6.5 instead of Lemma 6.4.  N N +n) . From this it is easy to see that (3.5) and (3.8) are satisfied. In particular, the second moment is α N +1 N (N α N +1) , which by assumption obeys (3.4). b) For the Dirichlet-type weights, assume that W (N ) is of the form (3.10) with Y obeying Conditions (i) and (ii) in Lemma 3.7b). It is straightforward to see that (3.4) and (3.5) are satisfied with ρ 2 := E Y 2 /(E [Y ]) 2 . We now set out to show (3.8). To this end we first observe that To bound the first factor E [Y n 1 ] from above, we note, using Condition (i) of Lemma 3.7 on the exponential moments of Y1 and the assumption n ≤ 2hN in (3.8), that From this we obtain that , which by Jensen's inequality is bounded from above by By a Taylor expansion, and using Condition (ii) which implies that E exp(− n N −1 log Y1) is finite for N big enough, we obtain