APPLIED MATHEMATICS WORKING PAPER SERIESDistributional properties of means of random probability measures

The present paper provides a review of the results concerning distributional properties of means of random probability measures. Our interest in this topic has originated from inferential problems in Bayesian Nonparametrics. Nonetheless, it is worth noting that these random quantities play an important role in seemingly unrelated areas of research. In fact, there is a wealth of contributions both in the statistics and in the probability literature that we try to summarize in a unified framework. Particular attention is devoted to means of the Dirichlet process given the relevance of the Dirichlet process in Bayesian Nonparametrics. We then present a number of recent contributions concerning means of more general random probability measures and highlight connections with the moment problem, combinatorics, special functions, excursions of stochastic processes and statistical physics.


Introduction
Random probability measures are a fundamental tool in the Bayesian Nonparametrics since their probability distributions identify priors for Bayesian inference. Indeed, let X be a complete and separable metric space endowed with the Borel σ-algebra X and denote by (P X , P X ) the space of probability distributions defined on (X, X ) equipped with the corresponding Borel σ field. Suppose (X n ) n≥1 is a sequence of exchangeable random variables defined on some probability space (Ω, F , P) and taking values in (X, X ). According to de Finetti's where ϕ is a continuous and increasing real-valued function. The problem, then, reduces to studying the random quantity R xP(dx) given that R |x|P(dx) < ∞ almost surely. In particular, in [9] the authors introduce a series of tools and techniques that, later in [10], turned out to be fundamental for the determination of the probability distribution of X f (x)P(dx) whenP is a Dirichlet process. An appealing aspect of this topic is that distributional results concerning the random quantity in (1.1) are also of interest in research areas not related to Bayesian nonparametric inference. This fact was effectively emphasized in [15] and discussed in several later papers to be mentioned in the following sections.
There has recently been a growing interest in the analysis of distributional properties of random probability measures as witnessed by the variety of publications that have appeared in the statistics and probability literature. Since such an interest originates from different lines of research, we believe it is useful to provide, within an unified framework, an up to date account of the achieved results. The present survey aims at detailing the origin of the various contributions and at pointing out the connections among them. In pursuing this goal, we split the paper into two parts: the first one deals with means of the Dirichlet process, whereas the second part will focus on means of more general random probability measures. In both cases, we will provide, whenever known in the literature, the exact evaluation of the corresponding probability distribution and of other distributional characteristics such as, e.g., the moments. The last section provides some concluding remarks and some possible future lines of research. Finally, the Appendix concisely summarizes a few preliminary notions that play an important role throughout the paper.

Means of Dirichlet processes
Before stating the main results, and technical issues, related to the determination of the probability distribution of random Dirichlet means, we briefly recall the notion of a Dirichlet processD α with parameter measure α. There are several different constructions of the Dirichlet process: each one has the merit of highlighting a peculiar aspect of this important nonparametric prior. Here below we present four possible definitions of the Dirichlet process, which will be used in the sequel. Similar constructions can also be used to define more general classes of nonparametric priors and the Dirichlet process is typically the only member shared by all these resulting families of random probability measures. Besides the obvious historical reasons, this fact somehow justifies the common view on the Dirichlet process as a cornerstone of Bayesian Nonparametrics.
Following Ferguson's original definition in [26], one can construct the Dirichlet processD α in terms of a consistent family of finite-dimensional Dirichlet distributions. Recall that the Dirichlet distribution, D α with α = (α 1 , . . . , α k ), is the probability distribution on the (k − 1)-dimensional simplex ∆ k−1 := {(x 1 , . . . , x k−1 ) : where |x| := x 1 + · · · + x k−1 . Let α be a finite measure on X. A random probability measure is termed Dirichlet process,D α if for any k ≥ 1 and any measurable partition A 1 , . . . , A k of X one has (D α (A 1 ), . . . ,D α (A k )) ∼ D α with α = (α(A 1 ), . . . , α(A k )). The parameter measure can obviously be decomposed as α( · ) = θ P 0 ( · ), where θ > 0 is the total mass of α and P 0 = α/θ is a probability distribution on (X, X ). An alternative definition of the Dirichlet process, pointed out in [26], makes use of completely random measures (CRMs). See the Appendix for a concise account and some noteworthy examples of CRMs. Indeed, ifγ is a gamma CRM on X with finite parameter measure α, then it can be shown that the Dirichlet process coincides, in distribution, with the normalized gamma process As we shall see this definition suggests interesting generalizations of the Dirichlet process and is useful for investigating the probability distribution of the corresponding random means. See Section 3.2.
Another construction which has recently become very popular in Bayesian Nonparametrics is based on a stick-breaking procedure. This construction was originally proposed in [78] and it can be described as follows. Let (V i ) ≥1 be a sequence of independent and identically distributed (i.i.d.) random variables, with V i ∼ beta(1, θ) and define random probability weights (p j ) j≥1 as If (Y i ) i≥1 is a sequence of i.i.d. random variables whose common probability distribution is P 0 , then with α = θP 0 . This construction can be generalized to define interesting classes of random probability measures among which it is worth mentioning the twoparameter Poisson-Dirichlet process as a remarkable example. See Section 3.1.
Finally, we mention a definition which stems from applications to survival analysis. In this setting one has X = R + since the object of investigation are survival times. Suppose, for simplicity, that P 0 is a non-atomic measure and let S 0 (x) = P 0 ((x, ∞)) for any x. Ifμ is a CRM with Lévy intensity ν(ds, dx) = e −s θ S0(x) 1 − e −s −1 α(dx) ds, andP is a random probability measure on R + characterized by the cumulative distribution functionF (t) = 1 − e −μ((0,t]) , thenP( · ) d =D α ( · ). This provides a representation of the Dirichlet process as a neutral to the right process. See [17] and [27].
Each of the definitions we have briefly described are useful for determining distributional results about a Dirichlet random mean X f (x)D α (dx), where f is a real-valued measurable function defined on X. Indeed, the most convenient construction of the underlying Dirichlet process to resort to is suggested by the specific technique one applies for investigating the properties of X f (x)D α (dx). Before entering the main details, it should be recalled that an important preliminary step requires the specification of conditions ensuring finiteness of X f (x)D α (dx). In other words, one has to determine those measurable functions f for which the integral X |f (x)|D α (dx) is almost surely finite. As shown in [25] a necessary and sufficient condition is given by An alternative proof of this fact can be found in [12]. Let now α f = α • f −1 denote the image measure of α through f and note that, since with no loss of generality we can confine ourselves to considering distributional properties related to M (D α ) := R xD α (dx). Moreover, denote by M the set of non null, non degenerate and finite measures on R and define the following classes of measures indexed by the total mass parameter θ ∈ (0, ∞) Note that, if not otherwise stated, it will be tacitely assumed that the Dirichlet base measure α is an element of A θ for some θ > 0.

Cifarelli-Regazzini identity and related results
A first remarkable result about random Dirichlet means is an identity which was obtained by Cifarelli and Regazzini [9,10]. This provides a representation of the generalized Stieltjes transform of order θ of the law of the mean M (D α ) in terms of the Laplace functional of a gamma process. It should be recalled that the order θ > 0 coincides with the total mass of the parameter measure α, i.e. θ = α(R). The interest in (2.7) is motivated by the fact that one can resort to inversion formulae of generalized Stieltjes transforms and deduce a representation of the probability distribution of M (D α ). See Appendix B.1 for a concise account on generalized Stieltjes transforms. In the following Re z and Im z stand for the real and imaginary parts of z ∈ C, respectively. The important identity is then as follows.
Theorem 2.1. (Cifarelli-Regazzini identity) For any z ∈ C such that Im(z) = 0, one has It is worth noting that the identity (2.8) holds true when the order of the generalized Stieltjes transform coincides with the total mass θ of the parameter measure of the underlying Dirichlet process. An interesting case of (2.8) is obtained when θ = 1 since it reduces to (2.9) Indeed, in the terminology of [49], one says that the probability distribution Q( · ; α) of M (D α ) is the Markov transform of P 0 . And the moment problem for Q( · ; α) is determined if and only if it is determined for P 0 . See Corollary 3.2.5 in [49]. As one can see from [49] and [15], the interest in (2.8) and (2.9) also arises in other seemingly unrelated areas of research such as, for example, the growth of Young diagrams or the exponential representations of functions of negative imaginary parts. The proof of (2.8) can be achieved by means of different techniques. In [10] the result is obtained by resorting to analytic arguments whose preliminary application can also be found in [9]. Indeed, the authors consider the truncated functional U (τ, T ) : Letting Q τ,T ( · ; α) denote the probability distribution of U (τ, T ), they obtain the following series expansion . Differentiating the series in (2.10), term by term, with respect to τ it can be shown that the function s −θ S θ (1/s; τ, T ) satisfies a partial differential equation whose solution is exp − This concise description of the main ideas hides the technical difficulties one has to overcome in order to get to the final result. The interested reader should check [9] and [10] to appreciate their complete line of reasoning.
An approach based on combinatorial arguments has been pursued in [84] and in [15], under the additional assumption that supp(α) = [0, 1]. The result in [84] is more general since it allows to obtain similar identities for the twoparameter Poisson-Dirichlet process to be examined in Section 3.1. Indeed, in [87] one finds a relation between the moments of Q( · ; α) and those of α where H n is expressed in terms of Bell polynomials as follows In the above expression, one has f k = θ k and g j = (j − 1)!r j,α for any k ≥ 1 and j ≥ 1. If we define f (t) = n≥1 f n t n /n! = exp{θt} − 1 and g(t) = n≥1 g n t n /n! = − log(1 − tx) α(dx), a result from the theory of symmetric functions yields A similar argument is used in [15] where a series expansion of z −θ S θ [z −1 ; M (D α )] along with the so-called exponential formula (see, e.g., Corollary 5.1.9 in [81]) In [28] a simple procedure for deriving S θ [z; M (D α )] is proposed. In particular, starting from the case of α with bounded support, a first-type Volterra equation for the Laplace transform m of M (D α ) is introduced: it then follows that the Laplace transform of x θ−1 m(x), which corresponds to S θ [z; M (D α )], satisfies a certain first order ordinary differential equation, whose solution is explicitly given and coincides with (2.8). Then, the result is also extended to the case of α having support bounded from below.
An even simpler proof has been devised in [85] and it relies on the definition of the Dirichlet process in (2.1) as a normalized gamma process. Indeed, using the independence betweenγ(R) andD α one has where the last equality follows from the fact thatγ(R) has the gamma density . See Theorem 2 in [85].
We finally mention that (2.8) can be shown by means of a discretization procedure which consists in evaluating the Stieltjes transform of a Dirichlet random mean xD αm (dx) with α m = km i=1 α(A m,i ) δ xm,i being a suitable discretization of α. Moreover, α m converges weakly to α as m → ∞. This approach has been developed in [71] and later applied in [72]. A similar discretization procedure is also at the basis of the proof given in [36].
It should be remarked that in (2.8) the order of the Stieltjes transform on the right-hand side must coincide with the total mass of the parameter measure α. If interest relies in evaluating the probability distribution of M (D α ), it would be desirable, regardless of the value of θ, to work with the Stieltjes transform of order 1. This is due to the fact that inversion formulae for Stieltjes transforms of order different from 1 are much harder to implement. Extensions of (2.8) where the order of the generalized Stieltjes transform can differ from the total mass θ = α(R) are given in [59] and [41]. In [59], the authors make use of the connection between Lauricella multiple hypergeometric functions and the Dirichlet distribution. Before recalling this connection, set ∆ n = {(x 1 , . . . , x n ) : One then has the power series representation where F D denotes the fourth Lauricella hypergeometric function. See (2.3.5) in [23]. Another useful integral representation is whenever θ > 0 and θ − c > 0. In the case where θ = c the identity in (2.14) reduces to Letting α = n+1 i=1 b i δ xi be a measure with point masses at x 1 , . . . , x n and x n+1 = 0, from (2.13) and (2.15) one finds that Hence, one can easily recover a simplified form of the identity (2.8) from integral representations of F D . Let us now consider a measure α such that α(R) = θ ∈ (0, ∞) and with supp(α) contained in a bounded interval [0, t), for some t > 0. Using the above mentioned connection between the law of M (D α ) and F D , in [71] it has been proved that for any θ ≥ c and z ∈ C such that Im(z) = 0 when Re(z) ≤ 0 , the order c being not greater than the total mass θ of α, under the assumption that the support α is included in a bounded interval of R + . Accordingly, if θ > 1, one can fix c = 1 and invert (2.16) to obtain a representation for the density function of the mean M (D α ). Finally, in [59] the authors obtain an extension to the case α being an arbitrary member of A θ as defined in (2.5) and c being any positive number. In particular, using arguments similar to those exploited in [71] it has been shown that if θ ∈ (0, c), and the above integral is evaluated along the path, in the complex plane, starting at w = 0, encircling in a counterclockwise sense w = 1 and ending at w = 0. Hence, the new identities (2.17) and (2.18) express the Stieltjes transform of order c as a mixture of Laplace functional transforms of the gamma process, the mixing measure being a beta probability distribution on (0, 1) or on the complex plane according as to whether c is greater than or less than the total mass θ of the parameter measure α. Given that they have been directly deduced from integral representations of F D , the authors in [59] have termed them Lauricella identities. It should be recalled that this connection between random Dirichlet means and multiple hypergeometric functions has also been exploited in [50] to deduce a multivariate version of (2.8) involving a vector of means ( f 1 dD α , . . . , f d dD α ), with f 1 , . . . , f d being measurable functions such that log[1 + |f i |] dα < ∞ for any i = 1, . . . , d.
Finally, the extension provided in [41] is achieved by introducing a new process taking values on the space of measures on (X, X ). To this end, let Pγ denote the probability distribution of a gamma processγ with parameter measure α such that α(X) = θ. Next define, for any d ∈ (0, θ), another probability measure Pγ ,d on the space of measures on (X, X ) such that Pγ ,d Pγ and The random measureμ whose distribution coincides with Pγ ,d is named a Betagamma process by [41]. For any c > 0 it is shown that When θ − c > 0, (2.19) reduces to the Lauricella identity (2.17). On the other hand, when θ − c < 0, (2.19) provides an interesting alternative expression for S θ which can be compared to (2.18).

Probability distribution of random Dirichlet means
An expression for the Stieltjes transform of M (D α ), as discussed in the previous section, represents a very useful tool in order to evaluate the probability distribution of M (D α ). Indeed, one can resort to an inversion formula for S θ and recover Q( · ; α). Before dealing with the subject, it must be recalled that Q( · ; α) is absolutely continuous with respect to the Lebesgue measure on R and we will denote by q( · ; α) the corresponding density function. See Lemma 2 in [10]. Moreover, denote by A the distribution function corresponding to the base measure α, i.e. (2.20) where obviously lim x→∞ A(x) = θ. As outlined in the Appendix, if appropriate conditions are satisfied one can invert S θ for any θ > 0. Nonetheless, as already mentioned, one can more easily handle the inversion formula when α(R) = θ = 1.
for any y point of continuity of A.
The representation in (2.21) follows from Theorem 1(ii) in [10] under the assumption that, given inf supp(α) =: τ ∈ R, one has A(τ ) = 0. The same representation has been achieved in Proposition 9(iii) of [72] by removing this condition on α. The proof is based on a different approach and makes use of a simple and useful idea given in [33]. Indeed, from (2.1) one has that for any y ∈ R, where we recall thatγ stands for the gamma process. Note that such an equality will be extensively used later in Section 3.2 for studying means of more general random probability measures. Given this, an application of the inversion formula for characteristic functions in [31] yields an expression for the cumulative distribution function in (2.22) and, consequently, for the corresponding density function q( · ; α). Unfortunately, the evaluation of q( · ; α) becomes more difficult if θ = 1. In [10] one can find an expression for q( · ; α) of M (D α ) when α(R) = θ > 1 and the possible jumps of A, defined as in (2.20), are smaller than one and with the additional assumption that the support of α is bounded from below by some τ ∈ R and A(τ ) = 0. This assumption is dropped by [72] and [59] to prove the following Theorem 2.3. Suppose α ∈ A θ with θ > 1 and that A defined in (2.20) has jumps of size smaller than one. Then for any point y of continuity for A.
Explicit expressions for q( · ; α) for values of θ > 1 with A having discontinuities with jumps of size greater than or equal to one and for values of θ ∈ (0, 1) can be found in items (i) and (iv) of Proposition 9 of [72] and in Section 6 of [59].
As far as the evaluation of the cumulative distribution function F( · ) of M (D α ) is concerned, from Proposition 3 in [72] one has the following result Theorem 2.4. Let α ∈ A θ . Then, for any y in R, Here below we describe a few examples mainly based on an application of Theorem 2.2 and Theorem 2.3.
, then it can be deduced that q( ·, ; α) is the beta density with parameters (θ 0 , θ 1 ). On the other hand, this same result can also be achieved by resorting to (2.21) and to (2.23) when θ = 1 or θ > 1, respectively. The case θ < 1 can be dealt with by means of the expression provided in Proposition 9(iv) of [72]. 2 ] −1 dx be the Cauchy density with parameters (µ, σ). It can then be seen that for any y ∈ R. From (2.21) one has q(y, α) = σ Hence, in this case Q( · ; α) = α( · ) and such a result was first proved in [89]. Furthermore, it can be shown that Q( · ; α) = α( · ) if and only if α is Cauchy. See [10] and [59].

Determination of the parameter measure α
An interesting question to be raised concerns the connection between q( · ; α) and the parameter α. Indeed, one might wonder whether there is a one-to-one correspondence between the set of possible probability distributions of random Dirichlet means and the space of parameter measures α given by A θ in (2.5). Hence, denote by M θ the family of all possible density functions of M (D α ) as α varies in A θ . Indeed, the answer is positive and the identity (2.8) sets up a bijection between the set A θ and M θ as shown, e.g. in [59] and in [36]. Given this important result, one might try to provide an answer to the following question: is it possible to determine the parameter measure α ∈ A θ yielding a specific distribution in M θ ? This corresponds to reversing the issue we have been addressing up to now. An early answer can be found in [11]. The authors provided a general solution to the problem, valid for parameter measures whose support is included in (0, ∞). Unfortunately this contribution has been unnoticed in the literature since it has appeared on an unpublished technical report. If α ∈ A θ , from identity (2.8) it is possible to show that where a and b are continuity points of A. Below we provide three examples for which it is possible to determine explicitly α starting from a pre-specified law for the mean. See [11] for details and the discussion of the first two examples we describe below.
Example 2.5. Suppose that q(x; α) = B(dx; a, b) with a and b positive and a + b = θ, where, as before, B(dx; a, b) denotes a beta probability distribution.
In this case it can be seen that An application of (2.25) yields that the parameter measure α has support {0, 1} and it coincides with α(dx) = a δ 0 (dx) + b δ 1 (dx).
From the previous expression one easily deduces that Hence, the parameter measure α, corresponding to a random Dirichlet mean M (D α ) having a uniform distribution on (0, 1), admits density (2.26) (w.r.t. the Lebesgue measure on R). As noted in [75,76], the cumulative distribution function A corresponding to the density function in (2.26) also appears as the limiting curve of the sequence of fractional parts of the roots of the derivative of the polynomial p n (x) = (x − n) n+1 . In other words, letting p n (x) = n n−1 for any x ∈ (0, 1), where [y] stands for the integer part of y. Hence, the limiting curve obtained from the sequence of fractional parts of the roots of p n coincides with the Markov transform of the uniform distribution on (0, 1).
The previous three examples show that a uniform distribution for the mean may correspond to different parameter measures according to the specific total mass θ being chosen. This is not in contradiction with the fact that there is a bijection between M θ and A θ , since we let θ vary. Hence, if θ = 1 the parameter 13 measure has a density of the form described in (2.26), if θ = 1/2 then α = P 0 /2, where P 0 is a beta distribution with parameters (1/2, 1/2) and if θ = 2, then α is a discrete measure with point masses equal to 1 at 0 and 1.
The problem of determining α for fixed Q ∈ M θ has been also considered in combinatorics where random Dirichlet means arise in connection with Young continual diagrams which represent a generalization of Young diagrams for partitions. See [49] and [75,76] for some detailed and exhaustive accounts. In particular, in [75] one finds a result according to which, given a density q in M θ with θ = 1 and support in (a, b), then the corresponding α is such that Hence, combining the representations obtained by [11] and [75] with θ = 1, one has the following new identity which does not seem immediate to us. Although not concerning random Dirichlet means, we also point out an allied contribution. In [34] a constructive method, involving the sequential generation of an array of barycenters, is developed in order to define random probability measures with prescribed distribution of their means.

More distributional properties of random Dirichlet means
In the previous Section we have provided a description of the probability distribution of M (D α ) in cases where the expression of the density q( · ; α) admits a simple expression. In general, even though expressed in closed form, the density q( · ; α) might turn out to be difficult to handle and it is not helpful in order to determine some characteristics of the probability distribution of M (D α ) such as its moments or its shape. Nonetheless, one can still devise some method to analyze distributional properties of M (D α ) even if its density function looks cumbersome.
As for the determination of moments, some early contributions can be found in [87,88]. In these papers the evaluation of E (M (D α )) n is based on combi-natorial arguments and follows from the relation where g : X n → R is a measurable and symmetric function, x is a vector whose generic entry x i,j is repeated i times for any j = 1, . . . , m i and the set Z(n, k) is defined in (2.12). If one chooses g(x) ≡ n i=1 ( mi j=1 x i,j ) i , the n-th moment of the random Dirichlet mean is obtained. This can be summarized in the following statement is, for any k ≤ n, the partial exponential Bell polynomial.
The result just stated has been independently obtained in a few recent papers. For example, in [15] the expression in (2.30) is deduced as a corollary of the Ewens sampling formula which describes the partition of the first n elements of an infinite sequence of exchangeable random variables directed by a Dirichlet process. On the other hand, in [71] the connection between random Dirichlet means and multiple hypergeometric functions, as already described in the Section 2.2, is exploited. Indeed, under the assumption that the support of α is a bounded subset of R + , a power series representation for the moment generating function is derived from which the n-th moment is identified by The expectation on the right-hand side of the previous equality is evaluated by means of the Faà di Bruno formula and this explains the appearance of the partial exponential Bell polynomial B n,k in (2.30). A useful account on the Faà di Bruno formula, together with applications to various examples related to the material presented in this paper, is provided by [13].
Recursive formulae for the moments of random Dirichlet means can be found in [7] and [36]. They have been obtained without conditions on the support of α and under the assumption that |x| nD α (dx) is finite. In [7] the starting point is the identity (2.8), whereas the proof in [36] makes use of the fact that the Dirichlet process is the unique solution of the stochastic equation whereP is a random probability measure on R, the random variable X 1 has probability distribution α/θ, w is a beta distributed random variable with parameters (1, θ), X 1 and w are independent and the random probabilityP on the right-hand side is independent from (w, X 1 ). See [78]. From this one deduces See [36] for details. Another interesting distributional property that has been investigated is the connection between the symmetry of q( · ; α) and the symmetry of the corresponding α ∈ A θ . If the parameter measure α ∈ A θ is symmetric about c ∈ R, then M (D α ) has a symmetric distribution about c. Such a result has been independently achieved [87] and [33]. In [87] the proof is easily deduced from the representation of the Dirichlet process as provided in (2.3). Indeed, In [33] a proof is given under the assumption that the symmetry center is c = 0. The argument relies on the relation between M (D α ) and a linear functional of the corresponding gamma process recalled in (2.22). Hence, the evaluation of the characteristic function of the linear functional ofγ allows one to The authors also hint at (2.33) as an alternative proof of the result. See also [36] where similar arguments are used. In [72], the authors for any y ∈ R and this obviously implies symmetry of q( · ; α). Finally, in [59] symmetry is proved by showing that the characteristic function of M (D α ) is real.
Recently, in [42] it has been proved that for any θ > 0 and σ ∈ (0, 1), one has .34) is useful when the total mass of the Dirichlet process is less than one, which is the most difficult case to handle. Indeed, if one fixes θ = 1 one has that the random mean of a Dirichlet process with parameter σ ∈ (0, 1), rescaled with an independent beta random variable, has the same distribution of the mean of a Dirichlet process whose baseline measure has total mass 1. And the latter can be determined via the results described in Section 2.2. Finally, distributional connections between random Dirichlet means and generalized gamma convolutions have been deeply investigated in [47].

Numerical results
In the previous Sections we have been dealing with exact results about random Dirichlet means. However, as already highlighted at the beginning of Section 2.4, the form of the density q( · ; α) can be quite complicated to handle for practical purposes. Hence, it is also of interest to determine suitable approximations of the law of M (D α ). There have been various proposals in the literature which we concisely recall. An interesting approach we would like to account for is based on a remarkable result obtained in [25]. Inspired by the distributional identity (2.31), they define a recursive equation whereP 0 is an arbitrary probability measure on (X, X ), the random elements P n−1 , w n and X n are mutually independent and {(w n , X n )} ∞ n=1 forms a sequence of i.i.d. random variables with the same distribution as (w, X 1 ) in (2.31). It is apparent that (2.35) defines a Markov chain taking values in the space of probability measures on (X, X ). Theorem 1 in [25] states that there is a unique invariant measure for the sequence of random probability measures {P n } n≥0 on (X, X ) defined by (2.35) and it coincides withD α . This finding leads to study properties of the Markov chain {M n } n≥0 with M n = xP n (dx) and it can be shown that, for α ∈ A θ , {M n } n≥0 is a Harris ergodic Markov chain whose unique invariant distribution coincides with Q(·; α). See Theorem 1 in [21] for a multidimensional extension of these results. Consequently, one may try to devise a Markov chain Monte Carlo (MCMC) algorithm to draw a sample having approximately probability distribution Q( · ; α). This goal has been pursued in [29] where it is shown that the chain {M n } n≥0 is geometrically ergodic if |x| α(dx) < ∞. Such an integrability condition has been subsequently relaxed in [48]. If it is further assumed that supp(α) is bounded, one has the stronger property of uniform ergodicity. See Theorem 4 in [29]. Interestingly, the authors are also able to determine upper bounds on the rate of convergence thus obtaining an evaluation of the total variation distance between Q( · ; α) and the law of the approximating mean M n . As a subsequent development along these lines, [30] formulate an algorithm which allows for perfect sampling from Q( · ; α). Recently, a generalization of the Markov chain in (2.35) has been introduced and investigated in [24].
Another simulation-based method has been proposed by [62]. The main idea relies on truncating the series representation (2.3) ofD α at some random point N in such a way that the distance betweenD α and its truncated version is (almost surely) below a certain threshold. Define the random variable N = inf{m ≥ 1 : with Y 0 a random variable with probability distribution α/θ and independent from the sequences of stick-breaking weights {V n } n≥1 and from the random variables {Y n } n≥1 . In [62]P is termed -Dirichlet random probability. The authors show that if d p denotes the Prokhorov distance between probability distributions then d p (p ,D α ) ≤ , almost surely. Moreover, one can check that N has a shifted Poisson distribution with parameter −θ log so that it is easy to sample fromP . In some cases it is possible to show that the closeness, with respect to the Prokhorov distance, betweenP andD α induces closeness between the probability distributions of xP (dx) and of M (D α ).
As far as numerical methods for approximating Q( · ; α) are concerned, a first application can be found in [83] and it is based on a numerical inversion of the Laplace functional transform of the gamma processγ. Another possibility consists in approximating the cumulative distribution function F( · ). Since the expression in (2.24) can be evaluated, at least numerically, when the parameter measure has finite support, in general, one might approximate F( · ) with the cumulative distribution function of xdD αn , in symbols F n ( · ), where α n is finite measure supported by {x 1,n , . . . , x kn,n }. Given a prescribed error of approximation > 0, in [72] one can find a method for determining constants U < L and points x 1,n , . . . , x kn,n in (U, L] such that sup y∈(U,L] |F(y) − F n (y)| < When the support of α is bounded, U and L coincide with its upper and lower bound, respectively. One can, then, evaluate a cumulative distribution function which approximates, in the uniform metric, the actual cumulative distribution function F( · ) on a sufficiently large interval of the real line.

Means of more general processes
The Dirichlet process has been the starting point for the development of Bayesian nonparametric methods and it still is one of the most widely used nonparametric priors. Nonetheless some of the properties featured by the Dirichlet process may represent drawbacks in various inferential contexts. This has stimulated a strong effort towards the definition of classes of nonparametric priors which extend the Dirichlet process. Accordingly, one might try to generalize results known for the Dirichlet case to random means of these more general random probability measures. In the present Section we provide an overview of these contributions which incidentally have interesting connections with other areas of research such as the study of the excursions of stochastic processes, the investigation of fugacity measures in physics and the analysis of option pricing in mathematical finance. Section 3.1 illustrates a few recent distributional results involving linear functionals of the two-parameter Poisson-Dirichlet process. Means of normalized random measures with independent increments are the focus of Section 3.2, whereas in Section 3.3 we deal with means of neutral to the right processes.

Two parameter Poisson-Dirichlet process
An important extension of the Dirichlet process has been proposed by Pitman [65] as a two parameter exchangeable random partition on the set of integers N. It is termed two-parameter Poisson-Dirichlet process since it is seen as a generalization of the one-parameter Poisson-Dirichlet process introduced by Kingman in [52] which identifies the probability distribution of the random Dirichlet probabilities in the representation (2.3) ranked in an increasing order.
For our purposes, it is convenient to define the two-parameter Poisson-Dirichlet process by means of a rescaling of the probability distribution of a σ-stable CRMμ σ , with σ ∈ (0, 1) and base measure P 0 being a probability measure on (X, X ). This means thatμ σ is a random element taking values in the space (M X , M X ) of boundedly finite measure on (X, X ) such that for any measurable function f : X → R satisfying |f | dμ σ < ∞ (almost surely), one has E exp{− f dμ σ } = exp{− f σ dα}. If P (σ,0) denotes the probability distribution ofμ σ , introduce P (σ,θ) on M X such that P (σ,θ) P (σ,0) and where θ > −σ. The random measure with probability distribution P (σ,θ) is denoted byμ (σ,θ) and the random probability measure is termed two-parameter Poisson-Dirichlet process and we will also use the notation PD(σ, θ) when referring to it. Note that E P (σ,θ) (B) = P 0 (B) for any B in X . Like the Dirichlet process, one can also provide a stick-breaking construction ofP (σ,θ) . Indeed, if {W n } n≥1 is a sequence of independent random variables with W n ∼ beta(1 − σ, θ + nσ), and if one definesp 1 where {Y n } n≥1 is a sequence of i.i.d. random variables with probability distribution P 0 . This construction has become very popular in Bayesian Nonparametrics practice due to the fact it allows a simple scheme for simulatingP (σ,θ) . The Dirichlet process can be seen as a two-parameter Poisson-Dirichlet process: if α has total mass θ > 0, thenD α d =P (0,θ) . In order to evaluate the probability distribution of the random mean M (P (σ,θ) ) := R xP (σ,θ) (dx), we consider probability measures P 0 such that (3.4) |x| σ dP 0 < ∞ since the integrability of |x| σ with respect to P 0 is a necessary and sufficient condition for |x|P (σ,θ) (dx) < ∞ with probability one. See Proposition 1 in [73]. Moreover, by virtue of Theorem 2.1 in [44], the probability distribution of M (P (σ,θ) ) is absolutely continuous with respect to the Lebesgue measure. We denote the corresponding density function as q (σ,θ) . An important characterization of q (σ,θ) has been first provided in [50] where it is shown that for any z ∈ C such that |arg(z)| < π, for any σ ∈ (0, 1) and θ > −σ. Alternative proofs of the identity in (3.5) can be found in [84] and in [85]. The generalized Stieltjes transform in (3.5) can be taken as the starting point for the determination of q (σ,θ) . Inspired by the work of [10], one can try to invert S θ and evaluate the probability distribution of M (P (σ,θ) ). This goal has been achieved in [44]. Before describing the representations one can obtain for the density q (σ,θ) and the corresponding cumulative distribution function, we wish to emphasize that the analysis of M (P (σ,θ) ) is of interest also beyond the Bayesian Nonparametrics. For example, it plays an important role in the analysis of the excursions of skew Bessel bridges. To understand this connection, one can follow [69] and let Y = {Y t : t ≥ 0} denote a real-valued process such that: (i) the zero set Z of Y is the range of a σ-stable subordinator and (ii) given |Y |, the signs of excursions of Y away from zero are chosen independently of each other to be positive with probability p and negative with probability 1 − p. Examples of this kind of process are: Brownian motion (σ = p = 1/2); skew Brownian motion (σ = 1/2 and 0 < p < 1); symmetrized Bessel process of dimension 2 − 2σ; skew Bessel process of dimension 2 − 2σ. Then for any random time T which is a measurable function of |Y | with A 1 representing the time spent positive by Y between t = 0 and t = 1. Interestingly, from (4.b') in [1] one has with C ∈ X being such that P 0 (C) = p. This implies the remarkable distri- The determination of the probability distribution of coincides with the problem of determining the probability distribution of A 1 . P. Lévy in [55] has shown that the density function of A 1 is uniform in the case where p = σ = 1/2. The case where θ = 0 is also interesting since one has that the probability distribution ofP (σ,0) (C) coincides with the generalized arcsine law determined in [54]. Another case for which the probability distribution of P (σ,θ) is known corresponds to σ = 1/2, θ > 0 and p ∈ (0, 1) (see [5]).
In [44] one can find a general result, which holds for a large class of parameter measures P 0 including the one with point masses in 0 and 1 as above.
Theorem 3.1. Let P 0 be a probability measure on R + such that R + x σ P 0 (dx) < ∞ and let for any λ > 0, σ ∈ (0, 1) and θ > 0. Then for any y in the convex hull of the support of P 0 .
If θ > 1 it is possible to integrate by parts in (3.7) thus obtaining For the special case where θ = 1, one uses the Perron-Stieltjes inversion formula which yields q (σ,1) ( · ) = ∆ (σ,1) ( · ; 1). Here below we illustrate a few examples where the above result is applied.
The two examples we have been considering assume that P 0 is a Bernoulli probability distribution. One can obviously consider more general probability distributions given the kind of result provided by Theorem 3.1. For additional explicit formulae check Section 6 in [44].
Up to now we have mentioned an approach to the determination of q (σ,θ) which makes use of an inversion formula for its Stieltjes transform. One can, however, rely on other tools to recover q (σ,θ) . For example, from (3.3) one notes j are determined via a stick-breaking procedure involving a sequence of independent random variables {W n } n≥1 with W n being beta-distributed with parameters (1 − σ, θ + σ + nσ). Moreover W ∼ beta(1 − σ, θ + σ) and Y, Y * 1 , Y * 2 , . . . are i.i.d. random variables with common probability distribution P 0 on R + such that x dP 0 < ∞. Accordingly, one has ,θ+σ) ).
We finally remark that there is a nice distributional connection between M (P (σ,θ) ) and random Dirichlet means examined in Section 2.2. Indeed, if P 0 is a probability distribution on R + such that x σ dP 0 < ∞ and P (σ,0) stands for the probability distribution of M (P (σ,0) ), with E P (σ,0) where α = θ P (σ,0) . See Theorem 2.1 in [44]. Hence, one can also try to resort to expressions for the probability distributions described in Section 2.2 to obtain a representation for the density function q (σ,θ) .

NRMI
Another interesting family of random probability measures, which includes the Dirichlet process as a special case, is represented by the class of normalized random measures with independent increments (NRMI). These were introduced in [73] by drawing inspiration from Ferguson's [26] construction of the Dirichlet process as normalized gamma process and from Kingman's [52] definition of a normalized stable process. Such random probability measures have shown to be a useful tool for inferential purposes: see, for instance, [45] for a comprehensive Bayesian analysis and [58] for an application to mixture modeling.
A NRMI on a Polish space X can be defined as whereμ is a CRM on X such that 0 <μ(X) < ∞ almost surely.
Since a CRM is identified by the corresponding Poisson intensity ν, which satisfies (A.3), it is natural to express both the finiteness and positiveness conditions ofμ(X) in terms of ν as defined in (A.5). Finiteness ofμ(X) corresponds to requiring its Laplace exponent X×R + (1 − e −λv )ρ x (dv)α(dx) to be finite for any λ ≥ 0; if ν is homogeneous, the previous condition is equivalent to asking for θ := α(X) < ∞. Positiveness is ensured by requiring the existence of a set A such that α(A) > 0 and A ρ x (R + )α(dx) = ∞; if ν is homogeneous, it reduces to ρ(R + ) = ∞, which is equivalent to requiring that the CRMμ has infinitely many jumps on any bounded set. Such a property is often called infinite activity ofμ. See [73] for details. In the following NRMIs are termed homogeneous or non-homogeneous according as to whether they are based on homogeneous or non-homogeneous CRMs. As recalled in (2.1) the Dirichlet process can be seen as a gamma NRMI. Moreover, the two extreme cases of the two-parameter Poisson-Dirichlet process identified by the pairs (0, θ) and (σ, 0), with θ > 0 and σ ∈ (0, 1), are NRMIs: they coincide with the Dirichlet process and the stable NRMI, respectively, where the latter corresponds to (3.11) withμ being a stable CRM. On the other hand, theP (σ,θ) process with other choices of pairs (σ, θ) does not belong to the class of NRMIs, although they are closely connected. See [67].
When studying the distributional properties of a linear functional ofP, i.e. X f (x)P(dx) with f : X → R, one first has to determine conditions which ensure its finiteness. According to Proposition 1 in [73], for any NRMIP one has that Analogously to the Dirichlet and two-parameter Poisson-Dirichlet cases, one may confine the analysis, without loss of generality, to studying a simple mean M (P) := R xP(dx). Indeed, lettingP be a NRMI with intensity ρ x (dv)α(dx) and whereP is a NRMI with Poisson intensity ρ x (dv)α f (dx).
We start by briefly discussing two interesting distributional properties already presented for random Dirichlet means, which hold true also for homogeneous NRMIs. The first property is related to the Cauchy distribution: as shown in Example 2.4, P 0 = α/θ has the same distribution of M (D α ) if P 0 is Cauchy and, indeed, the same applies to homogeneous NRMI means M (P). The second property is represented by the symmetry of the distribution of the mean discussed at the end of Section 2.4 for Dirichlet means: we have that, if M (P) is a homogeneous NRMI mean, then its distribution is symmetric about c ∈ R if P 0 is symmetric about c ∈ R. These facts, noted in [46], can be shown by simply mimicking known proofs for the Dirichlet case. Indeed, such properties hold for an even larger class of models, namely species sampling models (see [66]) which include, besides homogeneous NRMIs, also the two-parameter Poisson-Dirichlet process as noteworthy examples. Now we turn attention to the problem of the determination of the distribution of a random NRMI mean. In view of the following treatment it is useful to stick with the general formulation in terms of X f (x)P(dx) rather than of M (P). We will also assume throughout that (3.12) is satisfied. A key step consists in noting that an idea first employed in [33] for studying Dirichlet means and recalled in (2.22), which clearly applies to NRMIs since it makes use of the representation of the Dirichlet process as normalized gamma process. By virtue of (3.13) the problem of studying a linear functional of a NRMI is reduced to the problem of studying a linear functional of a CRM. And, importantly, the characteristic functions of linear functionals of CRMs are known and have an elegant Lévy-Khinchine type representation in terms of the underlying Poisson intensity measure. Therefore, by using a suitable inversion formula for characteristic functions one obtains a representation for the probability distribution of X f (x)P(dx). In particular, by resorting to Gurland's inversion formula (see Appendix B for details) one obtains where Im z stands for the imaginary part of z ∈ C. This line of reasoning together with the proof of the absolute integrability in the origin of the integrand in (3.14) led [73] to establish the following result.
Theorem 3.2. LetP be a NRMI. Then, for every y ∈ R one has Now we present three examples, which show how the expression can be simplified for particular choices of NRMIs.
Example 3.4. The first NRMI we consider is the normalized stable process [52], which is obtained by normalizing a stable CRM, i.e. a CRM having Poisson intensity of the form ρ(dv)α(dx) = σv −σ−1 dv/Γ(1 − σ)α(dx) with σ ∈ (0, 1) and α a finite measure. A mean of such a random probability measure is finite if and only if |f (x)| σ α(dx) < ∞, which coincides with the condition (3.4) required for a generalP (σ,θ) mean to be finite. As shown in [73], application of Theorem 3.2 leads to where sgn stands for the sign function. Such a distribution turns out to be important for the study of the zero-range process introduced in [80]: the zerorange process is a popular model in physics used for describing interacting particles which jump between the sites of a lattice with a rate depending on the occupancy of the departure site. See [22] for a recent review. An interesting case corresponds to the one-dimensional zero-range process acting in a symmetric medium and the inverse of the particle rates are in the domain of attraction of a strictly σ-stable distribution. Under these assumptions, in [70] it is shown that the associated sequence of fugacity processes weakly converges to a process whose marginal distribution, at index point z ∈ (0, 1), is described by (3.15) with f (x) = 1 (−∞,z] (x) and y − v/(u − v) replacing y for u > v. It should also be mentioned that Theorem 1 in [70] provides an expression for the moments of P((−∞, z]) whenP is a normalized σ-stable process. In a similar fashion as for the Dirichlet case examined in Section 2.4, one has an expression involving a sum over a space of partitions of {1, . . . , n} if the n-th moment is to be computed. See also [38] for a general structural expression for moments of means of species sampling models. Since the σ-stable NRMI is a PD(σ, 0) process, the considerations in Section 3.1 establish interesting connections of formula (3.15) with occupation time phenomena for the Brownian motion and more general Bessel processes. In particular, if f (x) = 1 C (x) for some set C in X such that α(C)[α(X)] −1 = p ∈ (0, 1), then the probability distribution ofP(C) coincides with the generalized arcsine laws, originally determined by [54]. It also represents the probability distribution of the time spent positive by a skew Bessel process of dimension 2 − 2σ (see, e.g., [68,69]). It then follows immediately from (3.15) that the cumulative distribution function of Lamperti's generalized arcsine laws can be represented as (3.16) F(y; 1 C ) = for any y ∈ (0, 1). See Example 4.1 in [44] for a different representation of (3.16). By further assuming σ = 1/2 and some algebra, (3.16) can be further simplified to It is then evident that, if also p = 1/2, the previous expression coincides with 2 π −1 arcsin( √ y), which is the famous Lévy's arcsine formula for the Brownian motion [55].
Example 3.5. An important family of NRMIs, considered in [40,58,67], is obtained by settingμ as a generalized gamma CRM, which is characterized by a Poisson intensity of the type ρ(dv) with σ ∈ (0, 1), τ > 0 and α a boundedly finite measure on X (see Example A.2 in Appendix A for details). Given that ρ(R + ) = ∞ and that, for any λ > 0, the normalization procedure leads to a well-defined random probability measure if and only if α is a finite measure. The Dirichlet process is recovered by letting σ → 0, whereas the normalized σ-stable process [52] arises if τ = 0. Moreover, the resulting NRMI is a normalized inverse Gaussian CRM [56] if σ = 1/2.
For a generalized gamma NRMI mean, the finiteness condition (3.12) reduces to X (τ + |f (x)|) σ α(dx) < ∞. Note that the Feigin and Tweedie condition (2.4) for the Dirichlet case is recovered by letting σ → 0. One can then use Theorem 3.2 for determining also the distribution of such a mean as done in [46]. Recall that α( · ) = θP 0 ( · ) and set β = θτ σ > 0. Then, its expression is given by (3.17) F(y; f ) = Example 3.6. Another interesting NRMI is the normalized extended gamma CRM, which is obtained by normalizing a CRM with intensity ν(dv, dx) = e −β(x)v v −1 dv α(dx) with β a positive real valued function and α a boundedly finite measure on X (see Example A.3 for details). Being an infinite activity process, the NRMI is well defined for any combination of β and α satisfying X log[1 + (β(x)) −1 ]α(dx) < ∞. In contrast to the previous two examples, the extended gamma NRMI is non-homogeneous: its random probability masses depend on the locations where they are concentrated. As for means of extended gamma NRMIs, the necessary and sufficient condition for finiteness (3.12) becomes One obtains the Feigin and Tweedie condition (2.4) for the Dirichlet case by setting β equal to a constant. As far as the cumulative distribution function of an extended gamma NRMI mean is concerned, note that whereμ andγ are extended gamma and gamma CRMs, respectively. The latter equality displayed above follows from arguments similar to those in Proposition 3 in [63]. This means that the desired distribution is obtained by inverting the characteristic function of a linear functional of a gamma process, which is exactly the same situation as for the Dirichlet process. Hence, Theorem 3.2 leads to a similar expression to the one given in Theorem 2.4, namely Another example of NRMI, where explicit expressions can be obtained, is represented by a generalization of the Dirichlet process obtained by normalizing superposed gamma processes with different scale parameters. See [73] for details.
As for the computation of moments of means of NRMIs, it is possible to obtain a generalization of the formula displayed in (2.30). In order to do so we introduce the following quantity which is defined one the set of positive integers {n = (n 1 , . . . , n k ) : Note that when f ≡ 1, then Π (n) k (n 1 , . . . , n k ; f ) = Π (n) k (n 1 , . . . , n k ) describes the so-called exchangeable partition probability function induced by a NRMI and derived in [45]. Hence, one has the following statement Note that when the NRMIP is generated by normalizing a homogeneous CRM, i.e. ρ x = ρ for any x in X, then the formula above reduces to Since homogeneous NRMI are a subclass of special sampling models [66], (3.21) is an instance of where the structural representation for the moments of species sampling means, derived in [38], can be made explicit. The result above is in the spirit of Lo's treatment of mixtures of the Dirichlet process in [60]. See also [40].
The following examples show how (3.20) can be easily applied, even if the concrete evaluation of the n-th moment is a challenging task since one has to deal with sums over spaces of partitions.
Example 3.7. Since the Dirichlet process is a special case of homogeneous NRMI, one can recover (2.30) from (3.21). Suppose α = θP 0 and note that in this case ρ x (ds) = ρ(ds) = e −s s −1 ds. Hence (n i − 1)! will yield to the representation in terms of partial exponential Bell polynomials as recalled in Theorem 2.5.
Example 3.8. SupposeP is a stable NRMI already considered in Example 3.4. Without loss of generality, let α be a probability measure on X. In this case If in the previous expression one sets f = 1 (−∞,x] and α is a probability measure on (0, 1), then one obtains the formula described in Theorem 1 of [70].
Having determined the probability distribution of f dP, a Bayesian is interested in the evaluation of the posterior distribution, given a sample of n observations. In the Dirichlet case, the determination of the prior distribution of M (D α ) is enough to solve the issue because of conjugacyD α . Hence the posterior distribution of M (D α ), given X 1 , . . . , X n , coincides with the distribution of M (D α * ), with α * = α+ n i=1 δ Xi . If the prior is not conjugate, further efforts are needed for determining the posterior distribution of a linear functional. This is the case of NRMIs different from the Dirichlet process: indeed, as shown in [43] they are not conjugate. Nonetheless, for NRMIs a full description of the posterior distribution can be provided. Here, we briefly summarize the results achieved in [46,73].
Assume (X n ) n≥1 to be a sequence of exchangeable observations, defined on (Ω, F , P) and with values in X, such that, given a NRMIP, the X i 's are i.i.d. with distributionP, i.e. for any B i ∈ X , i = 1, . . . , n and n ≥ 1 Since a NRMI selects a.s. discrete distributions, ties in the observations will appear with positive probability. Set X = (X 1 , . . . , X n ) and clearly one can always represent X as (X * , π), where X * = (X * 1 , . . . , X * n(π) ) denotes the distinct observations within the sample and π = {C 1 , . . . , C n(π) } stands for the corresponding partition of the integers {1, . . . , n} recording which observations within the sample are equal, that is C j = {i : X i = X * j }. The number of elements in the j-th set of the partition is indicated by n j , for j = 1, . . . , n(π), so that n(π) j=1 n j = n. For the remainder of the section we will deal with NRMIs derived from CRMs with non-atomic base measure α in (A.5).
Before dealing with functionals of a NRMIs, it is useful to first recall their posterior characterization given in [45]. For any pair of random elements Z and W defined on (Ω, F , P), we use the symbol Z (W ) to denote a random element on (Ω, F , P) whose distribution coincides with a regular conditional distribution of Z, given W . Moreover, denote the Laplace exponent of a CRM by ψ(f ) : ]ρ x (dv)α(dx) for any measurable function f : X → R for which the integrability condition (3.12) holds true. Introduce now a latent variable, denoted by U n , whose conditional distribution, given X, admits a density function (with respect to the Lebesgue measure on R) coinciding with where 1(x) = 1, for any x ∈ X, and for i = 1, . . . , n(π). Indeed, the posterior distribution, given X, of the CRM µ defining a NRMI is a mixture with respect to the distribution of the latent variable U n . Specifically, 's are the corresponding jumps, which are mutually independent and independent fromμ (Un) , and whose density is given by See [45,Theorem 1] for details. Finally, denote by ψ (u) and J (u,X) r the Laplace exponent of the CRM defined by (3.24) and the jumps whose density is given by (3.25), respectively, with U n = u. Now we are in a position to provide a description of the exact posterior distribution of a linear functional of a general NRMI. First of all, it is to be noted that such a distribution is always absolutely continuous w.r.t. the Lebesgue measure R.
Theorem 3.4. LetP be a NRMI. Then, the posterior density function of (dv) and τ nj (u|X * j ) is as in (3.23), for j = 1, . . . , n(π). Moreover, the posterior cumulative distribution function of X f (x)P(dx), given X, can be expressed as and f X Un is the density of the latent variable U n given in (3.22). The proof of this result heavily relies on the useful and powerful techniques and tools developed in [74]. As for the determination of the posterior density, the starting point is given by its exact representation in the case of NRMIs with α in (A.5) having finite support [73], which, loosely speaking, handles observations as derivatives of the characteristic function. In order to achieve the representation presented above, one needs to exploit the fact that α is non-atomic in connection with a martingale convergence argument. The cumulative distribution function, instead, arises by combining Theorem 3.2 with the posterior representation of NRMIs.
The reader is referred to [46,57,73] for expressions of the posterior distribution of functionals based on particular NRMIs. Here, we just point out that, by the application of the previous results, in [73] a particularly simple formula for the density of Dirichlet means has been obtained. Indeed, provided there exists an x such that α({x}) ≥ 1, one has We close this section by considering linear functionals of hierarchical mixture models, which is nowadays the most common use of Bayesian nonparametric procedures. Letting Y be a Polish space equipped with the Borel σ-algebra Y , one defines a random density (absolutely continuous with respect to some σ-finite measure λ on Y) driven by a random discrete distribution i.e. where k is a density function on Y indexed by some parameter with values in X. A typical choice for k is represented by the density function of normal distribution: in such a caseP controls the means (and possibly also the variances) of the random mixture density. This approach is due to Lo [60] who defined a random density as in (3.26) withP being the Dirichlet process: this model is now commonly referred to as mixture of Dirichlet process (MDP). One can obviously replace the Dirichlet process with any NRMI: interesting behaviours especially in terms of the induced clustering mechanism appear for particular choices of NRMIs. See, e.g., [56][57][58]. The study of mean functionals of such models has been addressed in [46,63]. The first thing to note is that, as far as the prior distribution is concerned, the problem of studying a linear functional of a NRMI mixture can be reduced to studying (different) linear functional of a simple NRMI. Indeed, for a NRMI mixture densityp withP being the corresponding NRMI, one has From (3.27), it follows the prior distribution of the mean of a mixture can be evaluated by applying Theorem 3.2. As far as the posterior distribution of NRMI mixtures is concerned, general expressions for their density and cumulative distribution function can be found in [46]. For the popular MDP case, which is also not conjugate, an expression for the posterior density function has been first obtained in [63].
Finally, an alternative use of discrete nonparametric priors, such as NRMI, for inference with continuous data is represented by histogram smoothing. Such a problem can be handled by exploiting the so-called "filtered-variate" random probability measures as defined by [16], which essentially coincide with suitable means of random probability measures.

Means of neutral to the right priors
A first remarkable generalization of the Dirichlet process that has appeared in the literature is due to Doksum [17]. A random distribution functionF = {F (t) : t ≥ 0} on R + is said neutral to the right (NTR) if, for any choice of points 0 < t 1 < t 2 < · · · < t n < ∞ and for any n ≥ 1, the random variables are independent. An important characterization is given in Theorem 3.1 of [17] where it is shown thatF is NTR if and only if there exists a CRMμ on R + with lim t→∞μ ((0, t]) = ∞ almost surely such thatF = {F (t) : t ≥ 0} coincides in distribution with {1 − e −μ((0,t]) : t ≥ 0}. Example 3.9. As recalled in Section 2, the Dirichlet process can also be seen as a neutral to the right process. Indeed, letμ be a CRM with Laplace transform for some finite measure α and for any λ > 0. The random probability measure whose distribution function is defined byF (t) = 1 − e −μ((0,t]) coincides with D α . See [27].
The random quantity appearing on the right-hand-side of (3.31) is known as exponential functional and it has attracted great interest in probability and finance. Indeed, it appears in risk theory and in models for the pricing of Asian options and for the determination of the law of a perpetuity. In mathematical physics it represents a key quantity for the analysis of one-dimensional diffusion in a random Lévy environment. See [2] for a comprehensive review and relevant references. The analysis of distributional properties of ∞ 0 e −μ((0,t]) dt is a demanding task. There are still many open problems in this area that certainly deserve a considerable amount of work. Here we confine ourselves to recalling some known results concerning the existence of means of neutral to the right processes and some interesting distributional characterizations. Moreover, we confine ourselves to considering the case whereμ does not have fixed points of discontinuity.
As far as the first issue is concerned, according to Theorem 1 of [2] the condition P[lim t→∞μ ((0, t]) = ∞] = 1 is equivalent to almost sure finiteness of ∞ 0 e −μ((0,t]) dt whenμ is a CRM with intensity measure ν(ds, dx) = dx ρ(ds). We are not aware of necessary and sufficient condition for the existence of ∞ 0 e −μ((0,t]) dt for a general CRMμ. In [20], for the case of a homogeneousμ, a sufficient condition for its existence is given in terms of the Lévy intensity. For a non-homogeneous CRM, finiteness of the mean can be guaranteed by assuming the finiteness of its first moment.
Example 3.10. A popular neutral to the right process is given by the beta-Stacy process introduced in [86], which, from a Bayesian point of view, has the appealing feature of being conjugate with respect to both exact and rightcensored observations. The CRM corresponding to such a process via the exponential transformation is the log-beta CRM, whose Poisson intensity measure is of the form ν(ds, dx) = 1 1 − e −s e −sβ(x) ds α(dx) where β is a positive function, α is a measure concentrated on R + which is absolutely continuous with respect to the Lebesgue measure and ∞ 0 (β(x)) −1 α(dx) = +∞. The assumption of absolute continuity of α is equivalent to assuming no fixed points of discontinuity. Clearly, if α is a finite measure and β(x) = α((x, +∞)), we recover the process (3.30), which corresponds to the Dirichlet process.
For the mean of a beta-Stacy process the condition for the existence of the moment of order n reduces to α(ds) β(s) + n − j dt n · · · dt 1 .
Given the conjugacy property of the beta-Stacy process, one can also obtain moments of the posterior mean. In fact, let Y 1 , . . . , Y N denote exchangeable observations and S 1 , . . . , S r be the r ≤ N exact observations among the Y j 's, the others being right-censored. For simplicity, assume that the observations are all distinct. The posterior distribution of the beta-Stacy process is still a beta-Stacy process. In particular, the corresponding CRM is a log-beta process with fixed points of discontinuities, whose Laplace transform is with the proviso that, if {i : S i ≤ t j } = ∅ for some j, then ∅ := 1.
Unlike other means of random probability measures discussed so far, the problem of determining the distribution of a mean of a neutral to the right process has not found a solution with the exception of the Dirichlet process; however, it is to be noted that in order to achieve these results representations of the Dirichlet process different from the one recalled in Example 3.9 were used. The only available characterization has been given in [6] and it can be summarized in the following theorem.

Concluding remarks
In this paper we have tried to give a broad account on distributional properties of means of random probability measures and to relate contributions from different areas. There are still several interesting open issues about means and among them we mention: the study of relation between the distribution of the random mean and the expected value of the corresponding random probability measure for processes different from the Dirichlet process such as, e.g., the normalized stable process; the determination of the posterior distribution of means of processes different from NRMI and, most notably, of the two parameter Poisson-Dirichlet process; the derivation of closed form expressions for the density of means of neutral to the right processes, at least in some specific cases; the derivation of analogues of the Cifarelli-Regazzini identity for processes different than the two parameter Poisson-Dirichlet process; further investigations of the interplay between means and excursion theory, combinatorics and special functions. The obvious and important following step consists in studying non-linear functionals of random probability measures. As for quadratic functionals, various preliminary results are already available for the variance functional of a Dirichlet process V (D α ) = x 2D α (dx) − [ xD α (dx)] 2 : the moments and Laplace transform of V (D α ) have been obtained in [7], expressions for its cumulative distribution function and density function are given in [72], an alternative representation of its Laplace transform has been derived in [59], whereas in [19] a stochastic equation for V (D α ) is derived. It is to be mentioned that, in the above papers, the variance functional is treated as a linear combination of the coordinates of the vector of means ( x 2P (dx), xP(dx)): hence, the determination of distributional properties of genuinely quadratic functionals such as, e.g., the mean difference |x−y|P(dx)P(dy), is still an open problem. Results for non-linear functionals such as the quantiles are provided, for the Dirichlet case, in [35,37]. Finally, representations for square integrable functionals of the Dirichlet process in terms of an infinite orthogonal sum of multiple integrals of increasing order are derived in [64].

A Completely random measures
In this Appendix we provide a concise account on completely random measures, a concept introduced by Kingman [51], which allows to unify in an elegant way the classes of random probability measures dealt with in the previous sections: indeed, all of them can be derived as suitable transformations of completely random measures.
Let (X, X ) be a Polish space equipped with the corresponding Borel σ-field and recall that a measure µ on X is said to be boundedly finite if µ(A) < +∞ for every bounded measurable set A. Denote by (M X , M X ) the space of boundedly finite measures endowed with the corresponding Borel σ-algebra. See [14] for an exhaustive account. Let nowμ be a measurable mapping from (Ω, F , P) into (M X , M X ) and such that for any A 1 , . . . , A n in X , with A i ∩ A j = ∅ for i = j, the random variablesμ(A 1 ), . . . ,μ(A n ) are mutually independent. Theñ µ is termed completely random measure (CRM).
The reader is referred to [53] for a detailed treatment of the subject. Here, we confine ourselves to highlighting some aspects of CRMs, which are essential for the study of mean functionals. A CRM on X can always be represented as the sum of two components: a proper CRMμ c = ∞ i=1 J i δ Yi , where both the positive jumps J i 's and the X-valued locations Y i 's are random, and a measure with random masses at fixed locations in X. Accordingly where the fixed jump points z 1 , . . . , z M , with M ∈ {1, 2, . . . , +∞}, are in X, the (non-negative) random jumps V 1 , . . . , V M are mutually independent and they are independent fromμ c . Finally,μ c is characterized by the Laplace functional where f : X → R is a measurable function such that |f | dμ c < ∞ (almost surely) and ν is a measure on R + × X such that where α is a measure on (X, X ) and ρ a transition kernel on X × B(R + ), i.e. x → ρ x (A) is X -measurable for any A in B(R + ) and ρ x is a measure on (R + , B(R + )) for any x in X. If ρ x = ρ for any x, then the distribution of the jumps ofμ c is independent of their location and both ν andμ c are termed homogeneous. Otherwise, ν andμ c are termed non-homogeneous.
Another important property of CRMs is their almost sure discreteness [53], which means that their realizations are discrete measures with probability 1. This fact essentially entails discreteness of random probability measures obtained as transformations of CRMs. It is worth noting that almost sure discreteness of the Dirichlet process was first shown in [3].
Finally, note that, ifμ is defined on X = R, one can also consider the càdlàg random distribution function induced byμ, namely {μ((−∞, x]) : x ∈ R}. Such a random function defines an increasing additive process, that is a process whose increments are non-negative, independent and possibly not stationary. See [77] for an exhaustive account. We close this section by introducing three noteworthy examples of CRMs.
Example A.1. (The gamma CRM). A homogeneous CRMγ whose Lévy intensity is given by (A. 6) ν(ds, dx) = e −s s ds α(dx) is a gamma process with parameter measure α on X. It is characterized by its Laplace functional which is given by for any measurable function f : X → R such that log(1 + |f |) dα < ∞. Now set f = λ 1 B with λ > 0, B ∈ X such that α(B) < ∞ and 1 B denoting the indicator function of set B. In this case one obtains E e −λγ(B) = [1 + λ] −α(B) , from which it is apparent thatγ(B) has a gamma distribution with scale and shape parameter equal to 1 and α(B), respectively.
Example A.3. (The extended gamma CRM). Another interesting CRM, introduced in [18], is the extended gamma CRM (also known as weighted gamma CRM according to the terminology of [61]). It differs from the two previous examples in that it is a non-homogeneous CRM or in other words the distribution of the jumps depends on their location. Letting β be a positive real valued function and α be a boundedly finite measure on X, an extended gamma CRM is characterized by the Lévy intensity The corresponding Laplace functional is given by for any measurable function f : X → R such that log(1 + |f | β −1 )dα < +∞. Infinitesimally speaking, one has thatμ(dx) is gamma distributed with scale parameter β(x) and shape parameter α(dx).

B Transforms and inversion formulae
Most of the results presented in Sections 2.2, 3.1 and 3.2 depend on the inversion of integral transforms. Here we provide a few details on their definition and the inversion formulae one resorts to in order to obtain the results we have been describing.

B.1 Generalized Stieltjes transform
Let X be a non-negative random variable whose probability distribution is absolutely continuous with respect to the Lebesgue measure with density function g.
The generalized Stieltjes transform of order p > 0 of X, or of the corresponding density g, is defined by for any z ∈ C such that | arg(z)| < π. A special case occurs when p = 1 so that the Stieltjes transform is, at least formally, an iterated Laplace transform Moreover, the inversion formula which allows to recover g from S 1 [ · ; X] is simple since it yields g(x) = lim Stieltjes transforms of order p = 1 are also very useful for proving infinite divisibility of probability distributions. See, e.g., [39]. The inversion formula becomes more complicated when p = 1. Different versions of it can be found in the literature. See, e.g, [82] and [79]. Here we refer to the result contained in the latter contribution. Under suitable conditions, for example S p [ · ; X] is holomorphic on {z ∈ C : | arg(z)| < π} and |z β S p [z; X]| is bounded at infinity for some β > 0, one has where Γ is a contour in the complex plane starting and ending at the point w = −1 and enclosing the origin in a counterclockwise sense. If p < 1, then one can integrate by parts the expression above thus obtaining (B.2) g(x) = p − 1 2πi x p−1

B.2 Characteristic functions
In order to determine prior and posterior distributions of means of NRMIs we have applied an inversion formula for characteristics functions provided in Gurland [31]. The formula is useful when one is interested in determining the probability distribution of ratios of random variables. And this is the case for means of NRMIs. If F is a distribution functions and φ the corresponding characteristic function, then t Im e −ixt φ(t) dt.
If one sets x = 0, the expression in (3.14) is easily obtained.