Asymptotic behaviour of sampling and transition probabilities in coalescent models under selection and parent dependent mutations

The results in this paper provide new information on asymptotic properties of classical models: the neutral Kingman coalescent under a general finite-alleles, parent-dependent mutation mechanism, and its generalisation, the ancestral selection graph. Several relevant quantities related to these fundamental models are not explicitly known when mutations are parent dependent. Examples include the probability that a sample taken from a population has a certain type configuration, and the transition probabilities of their block counting jump chains. In this paper, asymptotic results are derived for these quantities, as the sample size goes to infinity. It is shown that the sampling probabilities decay polynomially in the sample size with multiplying constant depending on the stationary density of the Wright-Fisher diffusion and that the transition probabilities converge to the limit of frequencies of types in the sample.

through mutation and coalescence events, but also through branching events. A branching of a lineage, into a true and a virtual lineage, corresponds to a selection event: two individuals are chosen as potential parents, the most viable of them becomes the true parent.
Coalescent models, i.e. the Kingman coalescent and its numerous generalisations, have been extensively used for inference on genetic data sets, in combination with Monte Carlo methods used to approximate implicit likelihood functions in the non-PIM case, see e.g. [24] for an overview. The asymptotic behaviour of coalescent models in relation to samples of large size has recently gained attention due to the large size of modern study samples. While the size of samples continues to increase, due to advancements in DNA sequencing technology, inference methods based on coalescent models struggle to provide reliable results, since the coalescent does not scale well in terms of sample size [14]. Moreover, as discussed in [1], there is a distortion of some of the properties of the coalescent, which is a suitable approximation of certain models, e.g. Wright-Fisher model, provided the sample size is sufficiently smaller than the effective population size. In some cases this leads to inaccurate conclusions, e.g. concerning prediction of rare variants. Therefore, studying large-sample-size properties of coalescent models is not only an interesting theoretical problem, but also provides tools for addressing the above mentioned shortcomings, which are directly related to practical applications. More precisely, because of the large size of modern samples, inference methods based on coalescent models are widely used, even when the underlying assumption of sample size being sufficiently smaller than the effective sample size is violated, as discussed in [1]. Thus a theoretical large-sample-size efficiency analysis of algorithms based on coalescent models, such as the ones in [22,23,7,2,12,13,3,16,17], would be useful. The study of large-sample-size asymptotic properties of the coalescent, to which this paper and [11] aim to contribute, are relevant for such analysis. Furthermore, large-sample-size asymptotic results provide tools for the analysis of differences between the coalescent approximation and the original model. In fact, a direct theoretical comparison of the properties of these models is challenging, in [1] a numerical comparison is provided, whereas comparing the corresponding simpler limiting objects may provide interesting insights.
This paper provides an analysis of the asymptotic behaviour of the sampling probabilities under the ASG, and of the transition probabilities of its block counting jump chain. Note that the Kingman coalescent can be seen as a special case of ASG by setting the selection parameters equal to zero. A finite-alleles model for mutation is assumed, with d possible types.
A sample is represented by a vector n ∈ N d \ {0}, where n i represents the number of individuals in the sample carrying allele i, i = 1, . . . , d. The likelihood function, or sampling probability, p(n), corresponds to the probability that a sample taken from a population at equilibrium has a configuration of types given by the vector n. A recursion formula is known for p, see [18] for the ASG, [6] and the references therein for the particular case of the Kingman coalescent. However, an explicit formula is unknown in the general case of parent dependent mutations. When the sample size, n 1 = n 1 +· · ·+n d , is large, it is computationally too expensive to compute p using the recursion formula. In this paper a large sample of the form ny (n) , with n ∈ N \ {0} and y (n) ∈ 1 n N d \ {0}, is considered and the asymptotic behaviour of the sampling probabilities p(ny (n) ) is studied, as n tends to infinity and y (n) tends to some y ∈ R d >0 . In particular, it is proved that the sampling probabilities decay polynomially, that is, p(ny (n) ) ∼p y y 1 y where ∼ denotes asymptotic equality in the sense that a n ∼ b n if lim n→∞ a n /b n = 1, y 1 = |y 1 | + · · · + |y d |, andp is the stationary density of the Wright-Fisher diffusion that is dual to the ASG, see Section 2 for more details onp.
The intuition for the polynomial decay comes from the neutral scenario, when mutations are parent independent, in fact, in this case the sampling probabilities are explicitly known, see e.g. [6], and it is straightforward to show that their asymptotic decay is polynomial of degree d − 1, see Subsection 3.1 for an explicit calculation. In particular, the degree of the polynomial does not depend on the mutation parameters, although the multiplicative constant does, which hints that the same behaviour applies in general, even when mutations are parent dependent. To address the general case the following strategy is adopted. The classical representation formula of the sampling probabilities as expectations with respect to the stationary distribution of the Wright-Fisher diffusion, which is recalled in Section 2, is used in Subsection 3.2 to interpret the sampling probabilities as expectations with respect to a sequence of Dirichlet distributions. A local limit theorem for the sequence of Dirichlet distributions is derived in the Appendix. Finally, in Section 4 the general result is proved.
Establishing the asymptotic decay of p(ny (n) ), enables the study of the asymptotic behaviour of the transition probabilities of the block counting jump chain of the 'typed' ancestral selection graph, to which Section 5 is dedicated. The adjective 'typed' is used to put emphasis on the fact that, in this paper, the lineages of the ASG are always associated to a type, as in e.g. [8,10], whereas often, e.g. in [18], the ASG is first constructed backwards in time with no regard for types, which are superimposed afterwards on the graph. This two-steps construction does not allow to construct the genealogy of a sample with given types, which is of interest when performing inference based on a sample. The block counting process of the 'typed' ASG is the process that counts how many lineages of each type are present as time evolves from when the sample is taken until the most recent common ancestor is reached. In this paper the jump chain of this block counting process is considered, and the asymptotic behaviour of its transition probabilities is studied in Section 5, where the chain is properly scaled.

Framework
In Section 1, p(n) is defined as the probability of sampling n from a population at equilibrium, when the underlying model is the ASG. In this section, the ASG, its block counting jump chain and the related Wright-Fisher diffusion are introduced; furthermore, the properties that are relevant for the results in this paper are recalled.
To set the notation and recall the definition of the model, let θ be the mutation rate, P = (P ij ) d i,j=1 be the mutation probability matrix and γ i , i = 1, . . . , d, be negative selection parameters. Note that the selection parameters can also be chosen to be equal to zero, in that case the ASG becomes the classical Kingman coalescent.
The ASG describes the evolution of lineages over time. When m lineages, i.e. edges, are present, either one of them is lost by coalescence with another, at rate m 2 , or one is added, by branching of an existing lineage, at rate m γ 2 , where γ = max{γ i − γ j : i, j = 1, . . . , d}. Furthermore, one of the m lineages undergoes a mutation event at rate m θ 2 . The evolution continues, for an almost surely finite time, until only one lineage is left, representing the most common ancestor of the initial m lineages. To assign types to each lineage, a type is assigned to the ancestor and accordingly to all the descending lineages by following the graph structure, which includes mutation points, and using the mutation matrix, P , and the selection parameters, γ 1 , . . . , γ d , to determine the type of a lineage after mutation and selection (branching) events. For a complete and rigorous definition of the ASG, see [18,19].
In this paper it is enough to consider only the jump chain of the block counting process of the ASG, instead of the entire graph structure, and it is crucial to assign types to lineages while the 'typed' chain evolves backwards in time, as opposed to the a-posteriori type assignment described above. This enables the construction of the genealogy of a 'typed' sample, at the cost of an implicit backward transition distribution. The block counting process of the 'typed' ASG is described in detail in [8]. Its jump , which is the focus of Section 5, is the time homogeneous Markov chain with the following transition probabilities [8], for n ∈ N d \ {0}, where e j is the j th d−dimensional unit vector and π[j|n] is the probability of sampling an individual of type j after sampling n 1 individuals with types given by n. Note that a step +e j , resp. −e j , corresponds to the coalescence, resp. branching, of a lineage of type j, and e j − e i corresponds to the mutation of a lineage from type i to type j. The probability π can be written equivalently in terms of the sampling probabilities, p, as Similarly to the sampling probabilities, the probability π is unknown explicitly, unless mutations are parent independent. The transition probabilities, which are the focus of Section 5, are related to the recursion formula for p(n) mentioned in the introduction. In some cases, as in this paper, instead of using the recursion formula, it is convenient to work with a well-known representation for p(n) in terms of the Wright-Fisher diffusion, which is described in the following. While the ASG models the ancestral history of a sample taken from the population, the allele frequencies are modelled by the Wright- that is the solution to the following stochastic differential equation, [9]. When the mutation probability matrix P is irreducible, the stationary distribution of the Wright-Fisher diffusion X is unique and has a smooth density,p, with respect to the Lebesgue measure, to which Remark 2.1 is dedicated. In this paper we assume that P is irreducible.
The sampling probability, p(n), can be expressed as the expectation of a multinomial draw from the stationary distribution of the Wright-Fisher diffusion. LetX be distributed according to the stationary distribution of the Wright-Fisher diffusion, then This formula is a consequence of the duality relationship between the ASG and the diffusion, in fact, when such relationship is proved, it is also shown that the right hand side of (2.4) solves the recursion formula that defines the sampling probability, see for example [18,10]. Furthermore, since the sample is exchangeable, the formula can also be explained by de Finetti's representation theorem.
Remark 2.1 (Stationary density of Wright-Fisher diffusion). Equation (2.4) only holds if the Wright-Fisher diffusion admits a stationary distribution. The existence of a unique stationary distribution is related to the structure of the mutation mechanism, more precisely, to the existence of an invariant measure for the mutation probability matrix P . When mutations are parent independent, P ij = Q j > 0, i, j = 1, . . . , d, the stationary distribution, not only exists, but also has an explicitly known density: in the neutral case a Dirichlet density with parameters θQ = (θQ 1 , . . . , θQ d ), see e.g. [25,6,8], and when selection is included, a weighted Dirichlet density, see e.g. [8,10]. Unfortunately, the PIM case is the only case where the stationary distribution is explicitly known. Furthermore, the Wright-Fisher diffusion X has a degenerate elliptic generator, since the diffusion matrix has zero entries when one of the components of X is equal to zero. Thus the classical theory for the study of stationarity and for the study of solutions to the Fokker-Planck equation does not apply and an ad-hoc analysis, which is provided in [21], is needed. In [21,Thm 3.1] it is shown that the stationary distribution of X exists uniquely, assuming that P is irreducible, which implies the invariant measure of P exists uniquely.
Furthermore, in [21,Thm 3.2] it is also shown that, under the same assumption, the stationary distribution is absolutely continuous with respect to the Lebesgue measure and its probability density function,p, defined on ∆ = {x ∈ [0, 1] d−1 : In the light of the previous remark, in this paper it is assumed that P is irreducible, in fact, the existence of a smooth stationary density, even of an unknown form, proves to be sufficient for studying the asymptotic behaviour of the sampling probabilities through (2.4).

Sampling probabilities
The aim of this section is to provide a representation of the sampling probabilities that is convenient for the study of their asymptotic behaviour. This is a key step, in fact, once the representation is identified, the outline of the proof of the asymptotic result becomes intuitively clear.

Parent independent mutations under neutrality
Before focusing on the general case, the PIM case without selection is analysed in order to provide better insight. Assume P ij = Q j , γ j = 0, i, j = 1, . . . , d and let Q = (Q 1 , . . . , Q d ). As explained in Remark 2.1, in this case the stationary density is Dirichlet with parameters θQ, thus, computing the expectation in (2.4), yields the following explicit expression for the sampling probabilities where B is the multidimensional Beta function and Γ is the Gamma function, the expression above can be found in e.g. [6]. Applying Stirling's formula to the Gamma functions yields, as y (n) → y, p(ny (n) ) ∼ 1 B(θQ) ny (n) 1 + 1 Note that the sampling probabilities decay polynomially with degree d − 1. While the multiplicative constant depends on the mutation parameters, the degree, d−1, does not.
For this reason one may expect the same degree of decay when mutations are parent dependent, which is indeed correct, and not affected by selection, as shown in Section 4. Note also that the limiting behaviour in the last display corresponds to what was anticipated in (1.1), since the stationary density in this case is a Dirichlet density.

Interpreting the sampling probabilities
Despite the ease of the intuition, the proof of an asymptotic result for p(ny (n) ) in the general case is more involved, because of the lack of an explicit form for the stationary density of the Wight-Fisher diffusion. To study the asymptotic behaviour of the normalised probabilities n d−1 p(ny (n) ), it is tempting to interchange the limit and integration in (2.4) and consider The difficulty with this approach is to justify interchanging the limit and integration.
Moreover, as n → ∞, y (n) → y, one can show, using Stirling's approximation, that the integrand approaches a Dirac delta function at y. Consequently, the limit is not well defined and care must be taken to rigorously prove the asymptotic behaviour. To this end, an alternative representation of the sampling probabilities is preferred, as outlined below.

Asymptotic behaviour of the sampling probabilities
The study of the asymptotic behaviour of the Dirichlet random vectors D (n) appearing in (3.3) is reported in the Appendix and is summarized in the following proposition.
Then the following central limit theorem holds Furthermore, a local limit theorem for the corresponding probability density functions, φ n and φ, holds lim In order to prove an asymptotic result for the sampling probabilities, the asymptotic behaviour of the expectations (3.3) is studied in the next theorem, using Proposition 4.1, the main difficulty being that the stationary density is unknown and possibly unbounded near the boundary.

Theorem 4.2.
Let k be defined as in (3.1). Letp be the stationary density of the Wright-Fisher diffusion (2.3), assuming the mutation probability matrix, P , is irreducible. Let y (n) ∈ 1 n N d \ {0} such that y (n) → y ∈ R d >0 , as n → ∞. Then Proof. As explained in Remark 2.1, by [21], the stationary densityp is smooth on ∆ o , and thus bounded on any compact set contained in ∆ o . It could, however, explode on the boundary. In order to deal with this problem, the domain is divided in two parts.
For ε > 0, define ∆ ε = {x ∈ ∆ o : x i ≥ ε, i = 1, . . . , d}. Since y y 1 ∈ ∆ o , it follows that y y 1 ∈ ∆ ε , for all 0 < ε ≤ ε y , with ε y = 1 To show convergence for the first term in the RHS of (4.1), a change of variables yields where φ n and φ are the density functions defined in Proposition 4.1 with α (n) = ny (n) +1. By Proposition 4.1 and continuity ofp on ∆ o , the integrand above converges pointwise to φ(u)p y y 1 . Furthermore, sincep is bounded in ∆ ε by some c ε , the sequence is dominated by the sequence φ n (u)c ε , the integral of which is equal to c ε , for all n. Therefore, by the general dominated convergence theorem [20], It remains to show that the second term in the RHS of (4.1) converges to zero. First note that, if x ∈ ∆ \ ∆ ε , then x j < ε for some j, and, since i . Using the inequality above, the fact that the integral ofp is bounded by 1, and Stirling's formula yield where y min = min i=1,...,d y i > 0. The expression in the last display converges to zero by yi y 1 y i y min , and thus the second integral in (4.1) converges to zero, as n → ∞, completing the proof.
By applying Theorem 4.2, it is straightforward to show that the asymptotic decay of the sampling probabilities is indeed polynomial as expected. Proof. Since p(ny (n) ) = n y (n) 1 ny (n) k(ny (n) ), rewrite n d−1 p(ny (n) ) = n d−1 n y (n) 1 ny (n) B ny (n) + 1 k ny (n) B ny (n) + 1 .
Then, note that . This completes the proof.

Asymptotic behaviour of the transition probabilities
The goal of this section is to study the asymptotic behaviour of p(ny (n) − v | ny (n) ), the transition probabilities (2.1) of the block counting jump chain H of the ASG, as n → ∞, y (n) → y. By letting H (n) , n ∈ N, be independent copies of H, and Y (n) = 1 n H (n) ⊂ 1 n N d \ {0}, the asymptotic behaviour of the transition probabilities can be interpreted as the limit of the transition probabilities of Y (n) , ρ (n) (v | y (n) ) = p(ny (n) − v | ny (n) ).
As for the sampling probabilities, when mutations are parent dependent, an explicit expression for the backward transition probabilities is not available, in fact expression (2.1) is written in terms of the unknown probability π.
In the PIM neutral case, P ij = Q j , γ j = 0, i, j = 1, . . . , d, the sampling probabilities can be explicitly written as π[i|n] = ni+θQi n 1 +θ , see e.g. [22]. In this case thus π[i|ny (n) ] converges to yi y 1 . It turns out that in the general case the limit of the probabilities π is the same. To prove this, π is written in terms of the function k, defined in (3.1), by using (2.2) and (3.2), The results of the previous section, combined with the expression above, make it straightforward to study the asymptotic behaviour of π[i|ny (n) ], and, consequently, of the transition probabilities.
Proposition 5.1. Let π be defined as in (2.2). Assume the mutation probability matrix P is irreducible. Let y (n) ∈ 1 n N d \ {0}, such that y (n) → y ∈ R d >0 , as n → ∞. Then, for i = 1, . . . , d, Finally, knowing the asymptotic behaviour of π directly solves the problem of analysing the asymptotic behaviour of the transition probabilities. , for all i, j = 1, . . . , d.
Using these limits, together with basic limit calculations, in (2.1), gives the result.

Appendix: central and local limit theorems for a sequence of Dirichlet random vectors
This section is devoted to the study of the asymptotic behaviour of the sequence of Dirichlet distributed random vectors presented in Proposition 4.1. While this section can be considered as a standard, yet not obvious, exercise in probability theory, it is essential to derive central and local limit theorems for the specific sequence of random vectors in this paper.
Let D (n) = (D (n) 1 , . . . , D (n) d ) ∈ S be a Dirichlet distributed random vector with concentration parameters α (n) ∈ R d >0 such that lim n→∞ α (n) n = α ∈ R d >0 . (5.1) A central limit theorem for the sequence D (n) is obtained using the well-known, see e.g. [5], relationship between Dirichlet and Gamma distributions, where G (n) = (G (n) 1 , . . . , G (n) d ) is a vector of independent Gamma random variables with shape parameters α (n) i , i = 1, . . . , d and rate parameter β ∈ R >0 . Note that β is irrelevant in the transformation from G (n) to D (n) . Furthermore, since the Gamma distribution is infinitely divisible, it is possible to write each of the Gamma random variables as a sum of independent Gamma random variables, and thus for each variable G