Functional central limit theorems for stick-breaking priors

We obtain the empirical strong law of large numbers, empirical Glivenko-Cantelli theorem, central limit theorem, functional central limit theorem for various nonparametric Bayesian priors which include the Dirichlet process with general stick-breaking weights, the Poisson-Dirichlet process, the normalized inverse Gaussian process, the normalized generalized gamma process, and the generalized Dirichlet process. For the Dirichlet process with general stick-breaking weights, we introduce two general conditions such that the central limit theorem and functional central limit theorem hold. Except in the case of the generalized Dirichlet process, since the finite dimensional distributions of these processes are either hard to obtain or are complicated to use even they are available, we use the method of moments to obtain the convergence results. For the generalized Dirichlet process we use its finite dimensional marginal distributions to obtain the asymptotics although the computations are highly technical.


Introduction
Ever since the work of Ferguson (1973) the Dirichlet process has become a critical tool in Bayesian nonparametric statistics and has found applications in various areas, including machine learning, biological science, social science and so on. One of the important features of the Dirichlet process is that when the prior is a Dirichlet process its posterior is also a Dirichlet process (see e.g. Ferguson (1973)). This makes the complex computation in the Bayesian nonparametric analysis possible and enables the Dirichlet process to become a backbone of the Bayesian nonparametric statistics.
To widen the applicability of the Bayesian nonparametric statistics, researchers have tried to extend the concept of Dirichlet process. One of these efforts is the introduction of the stick-breaking process. The first breakthrough along this path is due to Sethuraman (1994) who shows that the Dirichlet process admits the stick-breaking representation (see (2.1)- (2.2) in the next section), where the stick-breaking weights are independent and identically distributed (iid) random variables satisfying the beta distribution Beta(1, a) (throughout this paper the notation Beta(α, β) denotes the beta distribution whose density is g(x; α, β) = Γ(α)Γ(β) Γ(α+β) x α−1 (1 − x) β−1 , 0 < x < 1). Within this stick-breaking representation, we can extend the class of Dirichlet processes to many other priors by assuming that the stick-breaking weights are iid with other distributions; satisfy some other kinds of dependence; or satisfy some specific (joint) distributions. Among various such extensions, let us mention the following works which we shall deal with in this paper. Perman et al. (1992) obtain a general formulae for sized-biased sampling from a Poisson point process where the size of a point is defined by an arbitrary strictly positive function. From this formulae, they identify the stick-breaking representation of the two-parameter Poisson-Dirichlet process, which admits a stick-breaking process with the stick-breaking weights v i ind ∼ Beta(1 − b, a + ib), where b > 0, a > −b and i = 1, 2, · · · . Favaro et al. (2012) introduce the normalized inverse Gaussian process through its stick-breaking representation by identifying the explicit finite dimensional joint density functions of its stick-breaking weights. Favaro et al. (2016) present the stick-breaking representation of a more general class of random measures called homogeneous normalized random measures with independent increments (NRMIs), which include the normalized generalized gamma process and the generalized Dirichlet process, two widely used priors in Bayesian nonparametric statistics.
Strong law of large numbers, central limit theorem and functional central limit theorem have always been ones of the central topics in statistics and in probability theory. Without exception the asymptotic behaviours of the Dirichlet process and other Bayesian nonparametric priors play an important role in the Bayesian nonparametric analysis, for example in the construction of asymptotic Bayesian confidence intervals, regression analysis and functional estimations. Compared to the vast literature in the field of parametric statistics relevant to these issues the achievements in the field of Bayesian nonparametrics are quite limited. However, let us mention the following works pioneered this paper. Sethuraman and Tiwari (1982) discuss the weak convergences of the Dirichlet measure P when its parameter measure (i.e the measure aH in this paper) approaches to a non-zero measure or a zero measure respectively. Lo (1983) studies the central limit theorem of the posterior distribution of Dirichlet process which is analogous to our central limit theorem for the Dirichlet process. Based on this result, Lo (1987) obtains the asymptotic confidence bounds and establishes the asymptotic validity of the Bayesian bootstrap method. The above mentioned Lo's results are extended to the mixtures of Dirichlet process by Brunner and Lo (1996). James (2008) reveals the consistency behaviour (the posterior distribution converges to the true distribution weakly) and the functional central limit theorem for the posterior distribution of the two-parameter Poisson-Dirichlet process (with fixed a and when the sample size goes to infinity). Kim and Lee (2004) show that the Bernstein-von Mises theorem holds in survival models for the Dirichlet process, Beta process and Gamma process. Dawson and Feng (2006) establish the large deviation principle for the Poisson-Dirichlet distribution and give the explicit rate functions when the parameter a (which represents the mutation rate in the context of population genetics) approaches infinity. Labadi and Zarepour (2013) present the functional central limit theorem for the normalized inverse Gaussian process on D(R) when its parameter a is large by using its finite dimensional joint density. Labadi and Abdelrazeq (2016) obtain the functional central limit theorem for the Dirichlet process on D(R) when the parameter a is large by using the finite dimensional joint density of Dirichlet process.
From the above mentioned works we see that there are only very limited results on the asymptotics of the stick-breaking processes. Relevant to the asymptotics as a → ∞, there have been established the central limit theorem and functional central limit theorem only for two processes: the Dirichlet process and the normalized inverse Gaussian process. The reason for the above limitation is that the most commonly used technique appeals to the explicit forms of the finite dimensional densities of the process itself. This method is effective only when the finite dimensional distributions have explicit forms and are possible to handle. It cannot be applied to study other processes when the explicit forms for the finite dimensional marginal densities of the process itself are unavailable or they are too complex to analyse even though they are available.
This paper is to introduce the method of moments into this study and to provide a systematic study of the asymptotics as a → ∞ for various stickbreaking processes depending on a parameter a > 0. We are mainly concerned with three types of the asymptotics (strong law of large numbers, central limit theorem, and functional central limit theorem) for a number of processes, which include the Dirichlet process with general stick-breaking weights, the classical Dirichlet process DP(a, H) (see e.g. Ferguson (1973), Ghosal and van der Vaart (2017), Regazzini et al. (2002)), the two-parameter Poisson-Dirichlet process PDP(a, b, H) (also known as Pitman-Yor process, Pitman and Yor (1997)), the normalized inverse Gaussian process N-IGP(a, H) (see Lijoi et al. (2005b)), the normalized generalized gamma process NGGP(σ, a, H) (see Brix (1999)), and the generalized Dirichlet process GDP(a, r, H) (see Lijoi et al. (2005a)). However, we will not concern with the large sample problem in this work.
For the generalized Dirichlet process since the finite dimensional marginal distributions of the process itself are available we shall use them to obtain the asymptotics directly although the computations are very technical. Let us point out that this process also admits a stick-breaking representation. However, it seems to us that it is more complex to use the method of moments than to use the finite dimensional marginal distributions of the process itself.
Let us point out the following points of the paper about the well-known Bayesian nonparametric priors.
(1) (for Dirichlet process) Both the finite dimensional distributions of the stick-breaking weights and the process itself are explicit and are easy to handle. Prior to this work the central limit theorem and the functional central limit theorem have been established for this process by using the finite dimensional distributions of the process itself. For the Dirichlet process the stick-breaking weights {v i } are iid and follow the Beta distribution Beta(1, α). We introduce the concept of Dirichlet process with general stick-breaking weights, where we still require the stick-breaking weights {v i } to be iid but the distribution g a they follow can be arbitrary. In this case there is no way to obtain the explicit form of the joint distributions of the process itself. We use the method of moments to establish the central limit theorem and the functional central limit theorem for this process with very general distribution g a . For example, g a can be a Beta distribution Beta(ρ a , a), where ρ a is a function of a such that ρ a /a → 0 as a → ∞, which may have potential applications in posterior Dirichlet process. In this case the joint distributions of the process itself is unavailable except in the case ρ a = 1, i.e. in the case of Dirichlet process. (ii) (for the normalized inverse Gaussian process and for the generalized Dirichlet process) Both the finite dimensional distributions of the stick-breaking weights and that of the process itself are explicit. Prior to this work the central limit theorem and the functional central limit theorem have been established only for the normalized inverse Gaussian process by using the finite dimensional distributions of the process itself. We shall also use the finite dimensional distributions of the process itself to obtain the central limit theorem and the functional central limit theorem for the generalized Dirichlet process. We shall use the method of moments to re-derive the central limit theorem and the functional central limit theorem for the normalized inverse Gaussian process, providing an alternative tool for this process. (iii) (for the Poisson-Dirichlet process and the normalized generalized gamma process) The finite dimensional distributions of the stick-breaking weights are available but the finite dimensional distributions of the process itself are not available. We use the method of moments to obtain the central limit theorem and the functional central limit theorem for these processes.
Now we explain the organization of this paper. In Section 2 we recall some commonly used stick-breaking processes, including the classical Dirichlet process DP(a, H), the two-parameter Poisson-Dirichlet process PDP(a, b, H), the normalized inverse Gaussian process N-IGP(a, H), the normalized generalized gamma process NGGP(σ, a, H), and the generalized Dirichlet process GDP(a, r, H). We will also introduce the Dirichlet process with general iid stick-breaking weights. Interested readers are referred to Hu and Zhang (2020) and references therein for a recent survey of some of these processes and their applications. In this section we also recall the concepts of the weak convergence with respect to the Skorohod topology on D(R d ). Although the method of the moments can be used to obtain the strong law of large numbers, central limit theorem, and the functional central limit theorem, the computations are still very sophisticated and for different processes the computations of the moments are different. For this reason, we present our moment results for various stick-breaking pro-cesses, including DPG(g, H), PDP(a, b, H), N-IGP(a, H), and NGGP(σ, a, H), GDP(a, r, H), separately in Section 3. We point out that we use finite dimensional distribution method to obtain the moment results for GDP(a, r, H), while we compute the moments directly through their stick-breaking weights for other processes. In Section 4, we state the strong law of large numbers, central limit theorem, and functional central limit theorem. The Dirichlet processes with general stick-breaking weights are new and we allow the stick-breaking weights to be some very general iid random variables defined on (0, 1). With different choices of the stick-breaking weights we can obtain various known stick-breaking processes. Because of this generality of the stick-breaking weights we state one theorem on the central limit theorem and functional central limit theorem for this type of processes. We state a similar theorem for all other processes (the Poisson-Dirichlet process PDP(a, b, H), the normalized inverse Gaussian process N-IGP(a, H), the normalized generalized gamma process NGGP(σ, a, H), and the generalized Dirichlet process GDP(a, r, H)). The details of the proofs will be provided in a supplementary file.
Finally, let us emphasize that all the processes we dealt with in this paper are actually "random probability measures". However, we follow the convention in the literature to continue to call them "processes".

Definitions
Let (Ω, F , P) be a complete probability space and let (X, X ) be a measurable Polish space, namely, X is a separable complete metric space and X is the Borel σ-algebra of X. Let H be a nonatomic probability measure on (X, X ) (i.e. H({x}) = 0 for any x ∈ X). A random measure is a mapping P from Ω × X to R + (we denote this random measure by P = (P (ω, A), ω ∈ Ω , A ∈ X )) such that (i) when ω ∈ Ω is fixed, P (ω, ·) is a measure on (X, X ); (ii) when A ∈ X is fixed, P (·, A) is a random variable on (Ω, F , P). Now we give the definition of the stick-breaking process (more appropriately a stick-breaking random probability measure). Definition 2.1. A random measure P = (P (ω, A), ω ∈ Ω, A ∈ X ) is said to be a stick-breaking process with the base measure H, if it has the following representation: where θ i , i = 1, 2, · · · are iid random variables defined on (Ω, F , P) (Throughout the paper all random variables are defined on the probability space (Ω, F , P)) with values in (X, X ) such that for each i, the law of θ i is H; δ θi denotes the Dirac measure on (X, X ), this means δ θi (A) = 1 if θ i ∈ A and δ θi (A) = 0 if θ i ∈ A for any A ∈ X ; and v i , i = 1, 2, · · · are random variables with values in [0, 1], independent of {θ i }, which are called the stick-breaking weights.
Remark 2.2. To make sure that P is well-defined (namely, (2.1) is convergent), one needs to impose the condition that Ishwaran and James (2001)).
Since we assume that {θ i } are iid and follow the distribution H, if H is given and fixed, then the stick-breaking process P depends only on the choice of {v i }. The first milestone work on the stick-breaking process is Sethuraman (1994) where it was shown that the Dirichlet process admits the stick-breaking representation (2.1)-(2.2) with the stick-breaking weights of the form v i iid ∼ Beta(1, a). We can use this characteristic as the definition of the Dirichlet process. Definition 2.3. Let a > 0 and let H be a nonatomic measure on (X, X ). A random probability measure P is called the Dirichlet process with parameter (a, H), denoted by P ∼ DP(a, H), if it has the representation (2.1)- Remark 2.4. Throughout the entire paper, we shall assume that a is a positive real number and H is a nonatomic measure on (X, X ) unless otherwise specified.
With this definition we can write the original definition of the Dirichlet process as a proposition. To state this proposition, we need to recall the concept of the Dirichlet distribution. Throughout this paper we use the following notation to denote the standard simplex in R n : In case of no ambiguity we also write S = S n . A random vector (X 1 , · · · , X n ) ∈ S is called to follow the Dirichlet distribution with parameters (α 1 , · · · , α n ) ∈ [0, ∞) n , denoted by (X 1 , · · · , X n ) ∼ Dir(α 1 , · · · , α n ), if the joint pdf of (X 1 , · · · , X n ) is given by , is the gamma function, and ½ S is the indicator function of the simplex S. With this notion of Dirichlet distribution we can write the following proposition.
Proof. We refer to Sethuraman (1994) or Hu and Zhang (2020) for the proof of the equivalence between Definition 2.3 and Proposition 2.5.
In the literature Proposition 2.5 is usually taken as the definition of the Dirichlet process. The reason that we use the stick-breaking representation as its definition is for the consistency purpose since most processes studied in this paper could be defined through the stick-breaking representation (2.1)- (2.2).
In the following definitions we shall always assume that P is a random probability measure admitting the representation (2.1)-(2.2). If we drop the specific distribution satisfied by v i in the definition 2.3, our limiting results below still hold. For this reason and for the potential applications in practice we introduce the concept of Dirichlet process with general stick-breaking weights.
Definition 2.6. P is called the Dirichlet process with general stick-breaking weights, denoted by P ∼ DPG(g, H), if the stick-breaking weights {v 1 , v 2 , · · · } in (2.1)-(2.2) are iid and follow a general probability distribution g(x), 0 < x < 1. Now we recall some other well-known processes studied in the literature, which are the subjects of this paper. Definition 2.7 (Pitman and Yor (1997)). Let b ∈ (0, 1) and let −b < a < ∞. P is called the two-parameter Poisson-Dirichlet process or the Pitman-Yor process, denoted by PDP(a, b, H), if the stick-breaking weights satisfy the following: v 1 , v 2 , · · · are independent , v i ∼ Beta(1 − b, a + ib), i = 1, 2, · · · (2.4) Definition 2.8 (Favaro et al. (2012)). P is called the normalized inverse Gaussian process with parameters a and H, denoted by P ∼ N-IGP(a, H), if the joint distributions of the stick-breaking weights {v 1 , v 2 , · · · } are given through the following conditional probability densities recursively: where a > 0 and K µ is the modified Bessel function of the third type (see e.g. Gradshteyn and Ryzhik (2014)).
Similar to what we did for the Dirichlet process, we present the original definition of the normalized inverse Gaussian process as a proposition. Proposition 2.9. A random probability measure P is the normalized inverse Gaussian process with parameter (a, H) if for any measurable partition (A 1 , · · · , A n ) of X, the random vector (P (A 1 ), · · · , P (A n )) follows the normalized inverse Gaussian distribution with parameters (aH(A 1 ), · · · , aH(A n )) given by following form: where S is the simplex defined by (2.3).
Proof. We refer to Favaro et al. (2012) for the proof of the equivalence between Definition 2.8 and Proposition 2.9.
It is trivial to verify that the Dirichlet process is a special case of the generalized Dirichlet process with parameter r = 1. Although the expression (2.9) looks very sophisticated, its mean, variance, and predictive distribution have been computed (see e.g. Lijoi et al. (2005b)). This process also admits a stickbreaking representation (e.g. Favaro et al. (2016)). However, the corresponding stick-breaking representation is more complicated to use for our study of the limiting theorems. So, we rather use this sophisticated finite dimensional distributions than the more sophisticated stick-breaking representation, which we omit.
As we are presenting the functional central limit theorem of P , we need to recall the definition of the Brownian bridge process of parameter H (see e.g. Kim and Bickel (2003) for more details). , ω ∈ Ω, A ∈ X ) be a stochastic process (random measure) with parameter A ∈ X . It is called the Brownian bridge with parameter H if the following two conditions are satisfied.
(i) B o H is Gaussian. Namely, for any elements A 1 , · · · , A n ∈ X , B o H (A 1 ), · · · , B o H (A n ) are jointly centered Gaussian random variables on the probability space (Ω, F , P).
(ii) For any A 1 , A 2 ∈ X , the covariance of B(A 1 ) and B(A 2 ) is given by (2.10) To state the functional central limit theorem we also need the space D(R d ) introduced in Section 3 of Bickel and Wichura (1971). The characteristics of the elements (functions) in D(R d ) are given by their continuity properties described as follows. For 1 ≤ p ≤ d, let R p be one of the relations < or ≥ and for Then, x ∈ D(R d ) if and only if (see e.g. Straf (1972)) for each t ∈ R d , the following two conditions hold: (i) x Q = lim s→t, s∈Q x(s) exists for each of the 2 d quadrants Q = Q R1,··· ,R d (t) (namely, for all the combinations that R 1 = " < ", or " ≥ ", · · · , R d = " < " or " ≥ "), and (ii) x(t) = x Q ≥,··· ,≥ . In other words, D(R d ) is the space of functions that are "continuous from above with limits from below", which are similar to the space of the càdlàg (French word abbreviation for "right continuous with left limits") functions in one variable strictly increasing and has limits at both infinities. Denote the Skorohod distance between x, y ∈ D(R d ) by Having introduced the metric space D(R d ) we can now explain the concept of weak convergence of a random measure on this space with respect to its Skorohod topology (the topology on D(R d ) induced by the Skorohod distance d(x, y)). Let Q a : Ω× B(R d ) → [0, 1] be a family of random probability measures depending on a parameter a > 0 and let B : Definition 2.13. We say Q a converges to B weakly on D(R d ) with respect to the Skorohod topology, denote Q a weakly → B in D(R d ), if for any bounded continuous (continuous with respect to Skorohod topology) functional f : (2.11)

Notations
Assume that the random probability measure P is a stick-breaking process having the stick-breaking representation (2.1)-(2.2). For any A ∈ X , s. We can also obtain its variance as follows.
Based on the expectation and variance of P , we introduce the following quantities that are investigated in the main theorems: where the last identity follows from (2.12)-(2.13). Up to a constant we may just consider the following quantity for the notational simplicity: . (2.15)

Moment results
We use the method of moments to show the announced asymptotics. This requires to have a nice estimates of the moments of the process P , which in turn requires some nice bounds for the moments of {w i } ∞ i=1 . Thus, in this section we present the asymptotic behaviors of the joint moments of w i 's for various processes introduced in the previous section. These results will play a key role in the proofs of our main theorems. On the other hand, they also have their own interest.
In the following proposition and through out the paper we use the notation p m:n := n i=m p i for m ≤ n.

Empirical strong law of large numbers
The empirical strong law of large numbers and the empirical Glivenko-Cantelli theorem play undoubtedly important roles in statistics. In this subsection we show the empirical strong law of large numbers and the empirical Glivenko-Cantelli theorem for the various processes introduced in Section 2. But before we state our theorem, we need an additional condition on the stick-breaking weights v i in the case of Dirichlet process with general stick-breaking weights.
for any p ∈ N, where k p is a positive sequence satisfying jk i ≥ ik j for i ≥ j and C p is a sequence of finite constants, independent of a.
Theorem 4.2. Let P be one of the Dirichlet process DP(a, H), the Dirichlet process with general stick-breaking weights DPG(g a , H) satisfying Assumption 4.1, the two-parameter Poisson-Dirichlet process PDP(a, b, H), the normalized inverse Gaussian process N-IGP(a, H), the normalized generalized gamma process NGGP(σ, a, H), and the generalized Dirichlet process GDP(a, r, H). Assume that a = n τ for some arbitrarily fixed τ > 0. Then, as n → ∞, for any measurable A ∈ X .
Once we have the empirical strong law of large numbers for P , we can deduce the empirical Glivenko-Cantelli theorem for P (see e.g. Theorem 20.6 in Patrick (1995) for a general discussion).
Theorem 4.3. Let (X, X ) = (R, B(R)). Let P be one of the Dirichlet process DP(a, H), the Dirichlet process with general stick-breaking weights DPG(g a , H) satisfying Assumption 4.1, the two-parameter Poisson-Dirichlet process PDP(a, b, H), the normalized inverse Gaussian process N-IGP(a, H), the normalized generalized gamma process NGGP(σ, a, H), and the generalized Dirichlet process GDP(a, r, H). Assume that a = n τ for some arbitrarily fixed τ > 0. Then, as n → ∞,

Central limit theorems and functional central limit theorems
In this subsection, we state the central limit theorems corresponding to the strong law of large numbers of the form (4.2).
We shall state the central limit theorems and functional central limit theorems for various processes as the following three theorems. The first one is for the Dirichlet processes with general iid stick-breaking weights defined by Definition 2.6. We will assume mild convergence conditions on the stick-breaking weights.
Theorem 4.4. Let P be a stick-breaking process with general stick-breaking weights v 1 , v 2 , · · · (whose distributions) depending on a parameter a > 0 (we omit the explicit dependence on a of the v i 's). Let D a and Q H,a be defined by (2.14) and (2.15) respectively. Assume that the stick-breaking weights v 1 , v 2 , · · · are iid and satisfy the following two conditions.
Then we have the following results.
Remark 4.5. For central limit theorem we use D a because each component converges to a standard Gaussian. For functional central limit theorem we use Q H,a since it converges to a Brownian bridge with parameter H. We can presumably use D a (or Q H,a ) in both (4.6) and (4.7) with a scaling.
The conditions (i) and (ii) in Theorem 4.4 are implied by many other conditions. One of them is given below.
Proof. It is obviously that {k p } is an increasing sequence, and thus the condition (i) of Theorem 4.4 (i.e (4.3)) holds.
For any nonnegative integer m, let N be a certain collection of integers j ′ s such that j∈N j = m. The condition (ii) in Theorem 4.4 is equivalent to the following statement: Thus, to prove (4.4) it is sufficient to show mk2 2 < j∈N k j . This is a simple consequence of jk i ≥ ik j for i ≥ j. In fact, taking i = 2, we have for all j ≥ 2, 2k j ≥ jk 2 holds and thus we have j∈N 2k j ≥ j∈N jk 2 , which implies mk2 2 < j∈N k j . Hence we have (4.8).
The conditions (i) and (ii) in Theorem 4.4 are satisfied by many interesting processes including the Dirichlet process. We give three examples to illustrate the applicability of our above theorem.
Corollary 4.7. Let P ∼ DP(a, H) be defined as in Definition 2.3, namely, v i iid ∼ Beta(1, a). Let D a and Q H,a be defined by (2.14) and (2.15) respectively. We have the following results.
Proof. It is sufficient to verify the condition (4.1) in Assumption 4.1. Since v i iid ∼ Beta(1, a), we have for any positive integer p, Hence, k p = p and C p = p!. Obviously, for i ≥ j, jk i ≥ ik j always holds true.
Remark 4.8. Since the posterior of the Dirichlet process is still a Dirichlet process, the above result can be applied to the posterior process in the Bayesian nonparametric models when the prior is the Dirichlet process for the following situations: (i) with large sample size and finite parameter a; (ii) with large parameter a and finite sample size, (iii) with parameter a and sample size both large.
The assumption of the Beta(1, a)-distribution in Corollary 4.7 can be replaced by a general Beta(ρ a , a), where ρ a /a → 0. In fact, in this case, we have It is easy to verify that the conditions (4.3)-(4.4) in Theorem 4.4 are satisfied.
Thus we have Let D a and Q H,a be defined by (2.14) and (2.15) respectively. We have the following results.
(i) Let A 1 , A 2 , · · · , A n be any disjoint measurable subsets of X. Then, as a → ∞, where (X 1 , X 2 , · · · , X n ) ∼ N (0, Σ) and Σ = (σ ij ) 1≤i,j≤n is given by (4.13) with respect to the Skorohod topology. The next corollary is about the asymptotic behaviour of the prior P , when the corresponding stick-breaking weights v i follow a linear combination of Beta distributions, whose precise meaning is give below.

Then the random variables
are called linear combinations of the beta distributions.
Corollary 4.12. Let P be the stick-breaking process as defined in Definition 2.1, where the weight v i is a linear combination of the beta distributions defined by (4.15). Let D a and Q H,a be defined by (2.14) and (2.15) respectively. Then the following statements hold true.
Proof. By the independence of {u i,ℓ } s ℓ=1 , we can compute the p-th moment of v i as follows.
where r = min(r 1 , · · · , r s ). Taking k p = pr and C p = t p p! in Assumption 4.1 we see the condition i ≥ j, jk i ≥ ik j is always verified.
Remark 4.13. Let us return to Corollary 4.7. This is a typical case and we take a close look of the density f a (x) = a(1 − x) a−1 , 0 ≤ x ≤ 1, of the Beta distribution Beta(1, a). For any continuous function g : R → R, it is easy to verify that This means that R g(x)f a (x)dx → g (1). In other word, f a converges to the Dirac delta function δ(x − 1). This observation hints that when the distribution f a of v i 's converges to the Dirac delta function δ(x − 1), or the random variable v i converges in distribution to 1 (as a → ∞) we should have the convergence of the random process Q H,a . But we still need to impose some more technical conditions. We give a further illustration by the following corollary.
Corollary 4.14. Let the stick-breaking process P be defined as in Definition 2.1, where the corresponding v i follows the following distribution: Proof. Before we proceed to the proof. Let us note the obvious fact that f b converges to the Dirac delta distribution δ(x − 1).
For any n > 0, we see lim b→0 b n = 0. A trivial calculation implies that for any positive integer p, An application of Assumption 4.1 with k p = p (let a = 1 b ) yields the desired statement.
When the stick-breaking weights are iid, Theorem 4.4 that we obtained for the stick-breaking process P covers very general situation and the conditions (4.3)-(4.4) are minimal and are easy to verify. But when the stick-breaking weights are not iid the situation becomes much more sophisticated like in other statistical situations. We shall consider some well-known processes introduced earlier in Section 2. For these processes the explicit forms of the joint finite dimensional distributions of the stick-breaking weights, although complicated, are given. We can state similar results to that in Theorem 4.4 in one theorem for all these processes.
Theorem 4.15. Let P be one of the Poisson-Dirichlet process PDP(a, b, H), the normalized inverse Gaussian process N-IGP(a, H), the normalized generalized gamma process NGGP(σ, a, H), and the generalized Dirichlet process GDP(a, r, H). Then, we have the following results.
As long as the central limit theorem of P is obtained, it is trivial to use the delta-method to show similar theorem for the nonlinear functional of this process. Using Theorem 3.9.4 in van der Vaart and Wellner (1996), we can state the following theorem.
One application of the above theorem is the limiting distribution of the empirical quantile process of P .
where H −1 (s) = inf{t : H(t) ≥ s}. The limiting process G is a Gaussian process with zero-mean and with covariance function for s, t ∈ R,

Concluding remarks
The method of moments used in this paper could be applied to the study of asymptotics for some Bayesian nonparametric posterior processes in the following situations: (i) when the parameter a is finite and the sample size is large; (ii) when the parameter a is large and the sample size is finite; (iii) when the parameter a and the sample size are both large. Interesting examples are the posterior processes of the normalize random measures with independent increments, which include the normalized generalized inverse Gaussian process studied in James et al. (2009). We may also apply our method of moments to study the posterior distributions of the hNRMI studied by Favaro et al. (2016), which include the generalized Dirichlet process and the normalized generalized gamma process.

Supplemental Appendix
In this section, we present the proofs for the propositions and theorems appeared in this paper.

Proof of Proposition 3.1
Proof. Using the binomial expansion and using the fact that v i ∈ [0, 1] we have where the last equality follows from the assumption lim where the last equality also follows from the assumption lim = 0 for all n ∈ Z + . This proves (3.1)-(3.2). Now we use (3.1)-(3.2) to show (3.3). Denote By the construction of the stick-breaking sequence {w i } ∞ i=1 , we may rewrite I as Since 1 ≤ i 1 < i 2 < · · · < i k < ∞, we can rearrange I by putting v's with the same index together to obtain Denoting the general factor in the above expression by for m ∈ {1, 2, · · · , k}, we can write From the fact that {v 1 , v 2 , · · · } are identical and by (3.1)-(3.2) we have for m = 1, · · · , k, Substituting this estimate into (6.4), we see This prove (3.3). If we take p j = 2 for all j ∈ {1, · · · , k}, then p 1:k = 2k. The identity (3.4) is hence a straightforward consequence of (3.3).

Proof of Proposition 3.2
Since v 1 , v 2 , · · · are no longer identically distributed, the results established in the previous proof cannot be applied and we need some new computations. We shall still use the general method of moments. To this end, we need first to recall some results about the hypergeometric functions and we refer to Aomoto et al.
This function is defined for |x| < 1 and may be extended to x = 1 and/or x = −1 by continuation.
We need the following result obtained by Gauss: when Re(c − a − b) > 0 (real part of c − a − b), the hypergeometric function can be extended to x = 1 and its value at this point is given by We introduce a variant of the hypergeometric function that will be needed in the following calculations.
Definition 6.2. For any b < 2, n ∈ N + , a > 0, m > 0, c > 0, define the increasing coefficient hypergeometric function 2 Q 1 ((a, b), c, m, n; x) by the series for |x| < 1 and we may extend the definition to x = 1 and/or x = −1 by continuation. In the above product we use the convention that 0 ℓ=1 c ℓ = 1.
The next proposition describes a Gauss type result for the increasing coefficient hypergeometric function. Proposition 6.3. Let b < 2, n ∈ N + , a > 0, m > 0, c > 0. Then, the increasing coefficient hypergeometric function can be extended to x = 1 and its value at this point is given by 2 Q 1 ((a, b), c, m, n; 1) = n−1 ℓ=1 (a + bℓ) a + m m − nb .
Now we are in the position of proving Proposition 3.2.

Proof of Proposition 3.3
Proof. By the stick-breaking representation of N-IGP(a, H) and the formula (3.471.9) in Gradshteyn and Ryzhik (2014), we find that the joint distribution of {v i } n i=1 can be written as (1−v i ) dt.
We repeatedly apply the above procedure to integrate v n−2 , v n−3 , · · · , v 1 , each time using (6.12). After these computations we obtain (6.13) Again by using Fubini's theorem and by the fact that v ∈ (0, 1), we take the sum to obtain (6.14) Then, the result (3.7) is obtained by first applying (6.11) when we integrate the integral with respect to v, and then by applying the formula (3.471.9) in Gradshteyn and Ryzhik (2014) when we integrate t. Here, we also use the approximation that for fixed ν and for large a, K ν (a) = π 2 a −1/2 e −a 1 + o 1 a . When p = 2, we want to show that the leading coefficient in (3.7) is 1. This needs some more delicate computations. First, we have Notice the fact that the power of v n−4 in the above integrand is − 3 2 and to compute the integral with respect to v n−4 we can use the following nice integral identity: (6.15) After this integration with respect to v n−4 , we obtain an expression for v n−5 which also has this form and we then integrate v n−5 and so on. This procedure can continue until integrating v 1 . Hence, we compute the integrals for v n−4 , and then for v n−5 , · · · and then for v 1 recursively to obtain  (2π) Approximating the above modified Bessel function K − 5 2 of the third type by the formula (8.451.6) in Gradshteyn and Ryzhik (2014) we have To evaluate the above integral we make the following variable substitutions.
To prove the results (3.8) and (3.10), we denote We have Using the explicit form of the joint density of v 1 , · · · , v i k , we have Notice that the integrals of v i k−1 +1 , v i k−1 +2 , · · · v i k with the sum of i k from i k−1 + 1 to ∞ is the same form as (6.9). Thus, by the computation (6.13), we By a similar calculation to that of (6.14), We can perform the analogous computations for i k−1 , i k−2 , · · · , i 1 in this order repeatedly to obtain (3.8).
When p 1 = · · · = p k = 2, similar computations to that in the proof of (3.9) can be carried out to obtain (3.10).

Proof of Proposition 3.4
Proof. By the identities Γ(c, j! x j = (1 − x) n , we can rewrite the joint density of stick-breaking weights v 1 , · · · , v n as We make the substitution t = n i=1 (1−vi) σ s a in the above integral. Then, when a is large, namely, when t is small, we have where and throughout this paper we use µ ≍ ν to represent lim µ ν = 1. The integral in (6.26) is then approximated by (6.27) Thus, for large a, the joint density of v 1 , · · · , v n has the following asymptotics: (1−v i ) σ . (6.28) Now the equalities (3.11) and (3.12) follows from the same arguments as that in the proof of Proposition 3.3 and from the use of the following identity, which holds true for any q ∈ R: (6.29) where the last equality follows from the substitution s = a (1−x) −σ − a and the following asymptotics: To obtain the exact asymptotics in the case when σ = 1 m and p = p 1 = · · · = p k = 2, we first prove (3.13) using the same argument as that in the proof of (3.9). The only differences are as follows. First, we integrate v n−m , v n−m−1 , · · · , v 1 recursively in this order by using (6.29). After these integrations, it remains to integrate the variables v n−m+1 , · · · , v n . This multiple integral is now evaluated simultaneously by using the substitution v n−i = 1 − a + y 0 + · · · + y i−1 a + y 0 + · · · + y i 1 σ , i = 0, · · · , m − 1 .
The identity (3.14) follows from (3.13) by the same argument as that in the proof of (3.10).
6.5. Proof of Proposition 3.5 Proof. When P ∼ GDP(a, r, H), we will prove the weak convergence of D a in the part (i) of Theorem 4.15. Combining with the fact that GDP(a, r, H) also admits the general stick-breaking representation as in (2.1) and (2.2), we can obtain the desired results by using the same argument as that in the proof of (4.5) in Theorem 4.4.
6.6. Proof of Theorem 4.2 Proof. (6.30) where the first sum is taken over all combinations of nonnegative integers {p 1 , · · · , p k } such that k ∈ {1, · · · , m} and k i=1 p i = m and c(p 1 , · · · , p k ) are the corresponding constants. By the discussion of Case 1 in the proof of Theorem 4.4, it is easy to see that p i ≥ 2 for all i and thus k ≤ m 2 .
First, assume P is one of DP(a, H), PDP(a, b, H), N-IGP(a, H), NGGP(σ, a, H), GDP(a, r, H). We choose m = ⌊ 4 τ ⌋, where ⌊x⌋ is the smallest integer that is greater than or equal to x. Then, from Propsition 3.2-3.5, we have If p ∼ DPG(g a , H), then we can choose m such that for all 1 ≤ k ≤ m/2, Then, from Proposition 3.1, Since the series ∞ n=1 1 for any of the processes presented in the theorem. This implies (4.2) by the Borel-Cantelli lemma.

Proof of Theorem 4.4
Before we proceed to the proof of Theorem 4.4, we need a preparatory result about the joint moments of multivariate normal distribution. To state this result, we introduce the following notations. Let n be a positive integer and let p = (p ij , 1 ≤ i < j ≤ n) be a multi-index. Denote | p| = 1≤i<j≤n p ij (6.32) and denote i<n p in when m = n . (6.33) The following proposition is about the joint moments of Gaussian random variables. Similar or more general results may be found in literature under the terminology of "Feynman diagram" (e.g. Hu (2017, Theorem 5.7) and references therein). But we could not find the exact result we need. So, we give the following proposition.
For any A ∈ X , the variance of P (A) is given by (2.13). Using Proposition 3.1, we have .

Case 2
n i=1 r i is odd or n i=1 r i is even but n i=1 ri 2 > q. We substitute (6.42) into (6.41) and we consider the expectations of v 1 's. By Proposition 3.1, when excluding the terms discussed in Case 1 the remaining terms corresponding to this case have the following asymptotics (6.43) From the assumption (4.4), it follows that when n i=1 r i is even and when q < n i=1 ri 2 , the expectation of the terms in the sum of I(q; s 1 , · · · , s n , s 1,1 , · · · , s q,n ) will converge to 0 as a → ∞.
Similarly, when n i=1 r i is odd, since q is an integer and s j ≥ 2 for all j, we always have q < n i=1 ri 2 . Therefore, the expectation of the corresponding terms satisfying the condition that n i=1 r i is odd in the sum of I(q; s 1 , · · · , s n , s 1,1 , · · · , s q,n ) will always converge to 0 as a → ∞. Case 3 n i=1 r i is even and The only terms that may not converge to zero are the terms that are not covered in Case 1 and Case 2. This means that the only terms that have nontrivial limits are the terms satisfying the condition that n i=1 r i is even and q = n i=1 r i 2 .
For each ℓ ∈ {1, · · · , n i=1 ri 2 }, let Namely, for each pair of i, j such that 1 ≤ i < j ≤ n, p ij is the number of (ij)-mixed terms in the product of I(q; s 1 , · · · , s n , s 1,1 , · · · , s q,n ). Notice that, in order to obtain a (ij)-mixed term, we need to multiply the form w e ℓ (δ θe ℓ (A i )− H(A i )) (we call this form the power 1 term of A i and there are r i power 1 terms of A i ) and the form w e ℓ (δ θe ℓ (A j ) − H(A j )) (we call this form the power 1 term Consequently, I(q; s 1 , · · · , s n , s 1,1 , · · · , s q,n ) can be rewritten by Proposition 3.1 as I(q; s 1 , · · · , s n , s 1,1 , · · · , s q,n ) = (6.45) for any positive integer k and for any (t 1 , · · · , t n ) ∈ R d , By the method of moments (see e.g. Billingsley (1995, Theorem 30.2)) it follows that Part (i) of Theorem 4.4 follows then from the Cramér-Wold theorem (e.g. Billingsley (1995, Theorem 29.4)). Now, we prove the part (ii) of this theorem by proving the weak convergence of finite dimensional distributions and by verifying a tightness condition. The finite dimensional weak convergence of Q H,a can be shown directly by part (i), i.e. for any finite measurable sets A 1 , · · · , A n in X d , we have . By Theorem 2 of Bickel and Wichura (1971), to show (4.7) we only need to check the tightness condition, i.e, inequality (2) of Bickel and Wichura (1971), with γ 1 = γ 2 = 2, β 1 = β 2 = 1 and µ = 2H. Obviously, µ is finite and nonatomic.
The last inequality is due to the fact that H(·) ∈ (0, 1) and thus H(·) 2 ≤ H(·). Therefore, the tightness condition on D(R d ) is verified.

Proof of Theorem 4.15
Proof. Once we have Proposition 3.2-3.4, the proofs of part (i) of this theorem for the various processes except the generalized Dirichlet process follow from a similar argument to that in the proof of part (i) of Theorem 4.4. So, we shall omit the details. When P ∼ GDP(a, r, H), We need the following result about the variance of P from Lijoi et al. (2005a) where I a,r is given by I a,r = a(r!) a r k=1 ∞ 0 x (k + x) 2 r j=1 (j + x) a dx = a(r!) a Γ(ra) r ra Γ(ra + 2) r j=1 F (r−1) D ra, a * k ; ra + 2; 1 r J r−1 . (6.47) Here a * k = (a, · · · , a + 2, · · · , a) T is a r − 1 dimensional vector where the k-th element is a + 2 and all other elements are equal to a; J r−1 = (1, · · · , r − 1) T ; and F (P (·) − H(·)) . (6.50) We shall prove the result for n = 3 and the general n case can be handled in a similar way.