SAMPLING FORMULAE FOR SYMMETRIC SELECTION

Random partitions of integers arise in various contexts of mathematics (see e.g. §2.1 of [1] for several combinatorial examples) and the natural sciences, like population genetics (e.g. [3]). Among various partition distributions (i.e., probability distributions on a set of integer partitions), we are concerned in this paper with partition structure, which was introduced by Kingman [9] in connection with the sampling theory in population genetics. In this theory, we observe a partition a = (a1, . . . , an) generated by a random sample of n genes from a population, i.e., ai is the number of alleles in the sample which appeared exactly i times. Thus ai are nonnegative integers such that ∑n i=1 iai = n. In Kingman’s papers [9], [10] on partition structures, Ewens distributions [3], which describe laws of random partitions An in the stationary infinitely-many-neutral-alleles model, play a central role. These distributions form a one-parameter family {Pθ : θ > 0} of partition structures, each of which admits an explicit expression


Introduction and the main result
Random partitions of integers arise in various contexts of mathematics (see e.g.§2.1 of [1] for several combinatorial examples) and the natural sciences, like population genetics (e.g.[3]).Among various partition distributions (i.e., probability distributions on a set of integer partitions), we are concerned in this paper with partition structure, which was introduced by Kingman [9] in connection with the sampling theory in population genetics.In this theory, we observe a partition a = (a 1 , . . ., a n ) generated by a random sample of n genes from a population, i.e., a i is the number of alleles in the sample which appeared exactly i times.Thus a i are nonnegative integers such that n i=1 ia i = n.In Kingman's papers [9], [10] on partition structures, Ewens distributions [3], which describe laws of random partitions A n in the stationary infinitely-many-neutral-alleles model, play a central role.These distributions form a one-parameter family {P θ : θ > 0} of partition structures, each of which admits an explicit expression often referred to as the Ewens sampling formula [4].It is well known (see e.g.[11]) that (1) can be described in terms of a random discrete distribution derived from the points of a Poisson point process, say Z 1 > Z 2 > • • • , on (0, ∞) with intensity θdz/(ze z ).More precisely, points {Y j } of the normalized process given by Y j = Z j /T with T = j Z j are interpreted as (random) ranked frequencies of alleles in the stationary infinitely-many-neutral-alleles model with mutation rate θ.Consider an infinite dimensional simplex which is equipped with the topology of coordinate-wise convergence.The above {Y j } is nothing but the ranked jumps of a (standard) Dirichlet process with parameter θ, and its law ν θ on ∇ is called the Poisson-Dirichlet distribution with parameter θ.The conditional probability that A n = a given an arbitrary sequence of allele frequencies {y j } ∈ ∇ is evaluated in general as where a stands for the totality of sequences m = (m 1 , m 2 , . ..) of nonnegative integers such that n = m 1 + m 2 + • • • and this equality (when ignoring the vanishing terms) defines the integer partition a, i.e., {α : where stands for the cardinality.The left side of ( 1) is given by In this paper, we discuss a more general class of partition structures of the form where s is an arbitrary real number, q ≥ 1 and F q ({y j }) := j y q j , which is a bounded measurable function on ∇.Obviously, (5) corresponds to a random element of ∇ whose law ν θ,s,q is determined by the relation dν θ,s,q dν θ ({y j }) = e sFq({yj }) /E ν θ e sFq({Yj }) , {y j } ∈ ∇.
In the special case where q = 2, F 2 (•) is known as the population homozygosity, and the distribution ν θ,s,2 arises in the population genetics model incorporating symmetric selection and mutation.In this contexts, s > 0 means that homozygotes are selectively advantageous relative to heterozygotes (underdominant selection) while s < 0 implies the opposite situation (overdominant selection).Watterson [19], [20] obtained a number of asymptotic results concerning P θ,s,2 for small values of s, proposing a powerful statistics for the test of neutrality.Further, for the symmetric overdominance model, Grote and Speed [5] recently derived certain approximate sampling formulae with numerical discussions of interest.
The main purpose of this paper is to give an explicit expression of P θ,s,q (A n = a) for arbitrarily fixed values of θ, s and q.It is expected that such exact formula, if available, must not only reveal mathematical structure behind the quantity studied but also be informative for applications.Note that the denominator in the right-hand side of (6) coincides formally with the numerator with a = 0 ('the partition of 0'; a 1 = a 2 = • • • = 0) if we define Φ 0 ≡ 1.In addition to this convention, it is useful to define for any n = 1, 2, . . .
and for any n ∈ {0, 1, 2, . ..} =: Z + and integer partition a of n , which is a variant of multinomial coefficient.(In particular, M 0 (0) = 1.)The main result of this paper is the following.
Theorem 1 Let θ > 0, s ∈ R and q ≥ 1 be arbitrary.Let a be a partition of n where I l (a) = I l (a; θ, s, q) is given by (y nα α e sy q α ) k+l α=k+1 (e sy q α − 1) except the case of n = 0 = l for which case I 0 (0) = 1.Also, it holds that We shall give some remarks on immediate implications of Theorem 1 for special values of parameters.
(ii) In case of n = 0 or a = 0, (7) yields the following formula for the denominator in the right side of ( 5).
(iii) Since F 1 ≡ 1, the left side of ( 7) with q = 1 is equal to e s P θ (A n = a).Direct verification that the right side of ( 7) coincides with this value is rather involved and will be given later.
Unfortunately, our formula (7) itself seems not useful for likelihood-based statistical inference because the right side is not of product form.In general, the condition for a partition structure to be of such a form is quite restrictive as shown in Theorem 42 of [14].On the other hand, (9) exhibits a rapid convergence of the series in (7) and hence its applicability in some numerical issues.
In the next section, we give a proof of Theorem 1 after providing some lemmas regarding technicalities.Main tools are calculus involving Poisson process and certain distinguished properties of the gamma process.The former calculus for a class of partition structures can be found in [13] (which was revised as [15]) and [6] (a condensed version of which is [7]).The latter ingredient, the use of which we call the 'Γ-trick', is now standard.(See e.g.[8], [12], [17], [18].)However, the crucial idea here is that this is exploited in an 'unusual' way: for each s = −σ < 0, it is shown that the expectation E e −σFq({Zj }) Φ a ({Z j }) with F q and Φ a being naturally extended can be expressed in terms of E ν θ e −uFq({Yj }) Φ a ({Y j }) , u ∈ (0, ∞), and after a procedure of inversion we arrive at (7).Also, at the end, the verification mentioned in the Remark (iii) will be given.

Calculus for Poisson and gamma processes
Throughout this section let a = (a 1 , . . ., a n ) be an integer partition of n and set k = a 1 + • • • + a n .Suppose that k positive integers n 1 , . . ., n k satisfy {α : Consider obvious extention of the functions Φ a and F q , which were defined originally on ∇, to the functions of any sequence {z j } of positive numbers, i.e., Note that these functions are symmetric in z 1 , z 2 , . ... By suitable change of order of the sum, the following expression of Φ a is derived.
where the sum extends over k-tuples (j 1 , . . ., j k ) such that j 1 , . . ., j k are mutually distinct.Our first task is calculation of the expectation of Φ a ({Z j }) for a class of Poisson point processes on (0, ∞).Let Λ(dz) be a continuous Borel measure on (0, ∞) such that Λ((0, ∞)) = ∞ and Assume that a realization Z 1 > Z 2 > • • • > 0 of the Poisson point process with mean measure Λ is given.That is, a random discrete measure ξ := j δ Zj has Laplace transform where f is an arbitrary non-negative Borel function on (0, ∞) and ξ, f = j f (Z j ).In the above and what follows, E Λ is used for notation of the expectation in order to indicate the process we are working on.
Lemma 2 Let Λ be as above.We suppose additionally that Then Proof.Observe that at least formally j1,...,j k :distinct Here almost sure convergence of ξ, log (1 + t 1 z n1 + • • • + t k z n k ) for any t 1 , . . ., t k ≥ 0 follows from (15) by virtue of Campbell's theorem (see e.g.[11], §3.2), and therefore the above equalities hold a.s.Moreover this theorem also justifies the following calculations.
Next, we show that the exponential factor in the expectation in (7) can be handled by changing the measure of Poisson point process.For any nonnegative Borel function f on (0, ∞), set Λ f (dz) = e −f (z) Λ(dz).
The following lemma can be found in [6] (Proposition 1) and the proof requires only (14).
Lemma 3 Let f and Λ f be as above.Then for all nonnegative Borel measurable functions So far, we prepared the auxiliaries regarding Poisson point processes on the half line.We now specify the process as in the previous section by fixing θ > 0 arbitrarily and setting The associated process {Z j } has the distinguished property that the total sum T := j Z j and the normalized process {Y j := Z j /T } are mutually independent.(See e.g.[8], [12], [17], [18].)For simplicity, we call the 'Γ-trick' use of this property.Recall also that the distribution of T on (0, ∞) is given by 1 Let q ≥ 1 be arbitrarily.Put for each σ ∈ R which we are going to evaluate.Here is an implicit version of (7).
Proposition 4 It holds that for any τ > 0 Proof.Let f q (z) = z q and Λ be as in (18).Given σ > 0, consider Since the symmetric function Φ a can be regarded also as a measurable function of ξ, Lemmas 2 and 3 imply that Noting that F q ({Z j }) = T q F q ({Y j }) and Φ a ({Z j }) = T n Φ a ({Y j }), we can also calculate J(σ) by the Γ-trick as follows.
Hence, in view of (1), ( 9) is implied by (8) together with Dirichlet's formula: An immediate consequence of ( 9) is that the right side of (7) defines a real analytic function of s.On the other hand, since F q is bounded on ∇, the left side of ( 7) is also real analytic in s.So, it is sufficient to prove (7) for 0 > s =: −σ only.Define where I l (l = 0, 1, . ..) are given by (8).By virtue of Proposition 4 and the uniqueness of Laplace transform, we only have to verify the equality (20) with J(•) being replaced by I(•).
For each l = 1, 2, . .., let and define a σ-finite measure m l on C l by which is invariant under arbitrary multiplications of components.For any u > 0, set ∆ l (u) = (z 1 , . . ., z l ) : With the above notation, we have by Fubini's theorem Since this last (one-dimensional) integral is exp we obtain for each l = 0, 1, . . .
This implies that it is possible to integrate (23) with σ replaced by u q term by term, and therefore Comparing this with (20), we completes the proof of Theorem 1.
At the end of this section, we give a direct proof of the fact claimed in the Remark (iii).That is, Proposition 5 For any θ > 0 and s ∈ R, let I l (a; θ, s, 1) be given by the right side of ( 8) with q = 1.Then Proof.For notational simplicity, put y l+1 = 1 − (y 1 + • • • + y l ) for (y 1 , . . ., y l ) ∈ ∆ l .First, we assume that s > 0. This assumption makes us possible to exchange sums appearing in the subsequent calculations.Expansions and reduce ( 8) with q = 1 to where the sum * is taken over k-tuples (m 1 , . . ., m k ) of nonnegative integers and l-tuples (p 1 , . . ., p l ) of positive integers such that m and therefore In view of (1), this proves (25) for all s > 0. All the calculations seen in the above hold true for s < 0 because all the series appeared are absolutely convergent.The proof of Proposition 5 is now complete.