Large sample asymptotic analysis for normalized random measures with independent increments

Normalized random measures with independent increments represent a large class of Bayesian nonaprametric priors and are widely used in the Bayesian nonparametric framework. In this paper, we provide the posterior consistency analysis for normalized random measures with independent increments (NRMIs) through the corresponding Levy intensities used to characterize the completely random measures in the construction of NRMIs. Assumptions are introduced on the Levy intensities to analyze the posterior consistency of NRMIs and are verified with multiple interesting examples. A focus of the paper is the Bernstein-von Mises theorem for the normalized generalized gamma process (NGGP) when the true distribution of the sample is discrete or continuous. When the Bernstein-von Mises theorem is applied to construct credible sets, in addition to the usual form there will be an additional bias term on the left endpoint closely related to the number of atoms of the true distribution when it is discrete. We also discuss the affect of the estimators for the model parameters of the NGGP under the Bernstein-von Mises convergences. Finally, to further explain the necessity of adding the bias correction in constructing credible sets, we illustrate numerically how the bias correction affects the coverage of the true value by the credible sets when the true distribution is discrete.


Introduction
Bayesian nonparametrics has been undergone major investigation due to its various applications in many areas, such as biology, economics, machine learning and so on.As a lavish class of Bayesian nonparametric priors, normalized random measures with independent increments (NRMIs), introduced by (Regazzini et al., 2003), include the famous Dirichlet process (Ferguson, 1973), the σ-stable NRMIs (Kingman, 1975), the normalized inverse Gaussian process (Lijoi et al., 2005b), the normalized generalized gamma process (Lijoi et al., 2003(Lijoi et al., , 2007)), and the generalized Dirichlet process (Lijoi et al., 2005a).We refer to (Müller and Quintana, 2004;Lijoi et al., 2010;Zhang and Hu, 2021) as reviews of these processes with their properties and applications.
In Bayesian nonparametric statistics, samples are drawn from a random probability measure that is equipped with a prior distribution.To be more precise, let (Ω, F, P) be any probability space, let X be a complete, separable metric space whose σ-algebra is denoted by X and let (M X , M X ) be the space of all probability measures on X.A sample X = (X 1 , • • • , X n ) that takes values in X n is drawn iid from a random probability measure P conditional on P , which follows a prior distribution Q on (M X , M X ).That is to say, Two natural questions under the literature are raised as follows.
(i) A frequentist analysis of the Bayesian consistency (Freedman and Diaconis, 1983): by assuming the "true" distribution of X is P 0 , we are interested in whether the posterior law, that is the conditional law of P |X, denoted by Q n , converges to δ P 0 , the Dirac measure with point mass at the "true" distribution, as n → ∞.
(ii) What is the limiting distribution of centered and rescaled P |X?In particular, is there a Bernstein-von Mises like theorem and central limit theorem for P ?If so, what is the limit process of √ n(P |X − E[P |X])?
The above two questions are always very important in statistics, as the posterior consistency can guarantee the model behaves "good" when the sample size is large, and the limiting distribution of the posterior process is the key to construct Bayesian credible sets and conduct hypothesis tests.Many inspiring works corresponding to the above questions have been done.Referred to question (i), (James, 2008) obtains the posterior consistency analysis of the two-parameter Poisson-Dirichlet process, which is not an NRMI, but closely related to NRMIs (Pitman and Yor, 1997;Perman et al., 1992;Ghosal and Van der Vaart, 2017).The posterior consistency of the species sampling priors (Pitman, 1996;Aldous et al., 1985) and the Gibbs-type priors (Gnedin and Pitman, 2006) are discussed in (Ho Jang et al., 2010) and (De Blasi et al., 2013).It is worth to point out that there are overlaps among the species sampling priors, the Gibbs-type priors and the homogeneous NRMIs.Whereas, non-homogeneous NRMIs are totally different from the species sampling priors and the Gibbs-type priors.As for question (ii), the Bernstein-von Mises results have been established for the Dirichlet process (Lo, 1983(Lo, , 1986;;Ray and van der Vaart, 2021;Hu and Zhang, 2022) and for the two-parameter Poisson-Dirichlet process (James, 2008;Franssen and van der Vaart, 2022).Along the same line, we would like to answer the two addressed questions when P is an NRMI.
Since NRMIs are constructed by the normalization of completely random measures (Kingman, 1967(Kingman, , 1993) ) associated with their Lévy intensities (see e.g., section 2), it is quite natural to study their properties based on the corresponding Lévy intensities.In this work, we discuss the posterior consistency of non-homogeneous NRMIs (including the homogeneous case as a particular case) and provide a simple condition to guarantee the posterior consistency of non-homogeneous NRMIs.As a result, when P 0 is continuous, the posterior consistency doesn't hold for NRMIs generally, and when P 0 is discrete, the posterior consistency holds as long as our proposed condition is satisfied.
Furthermore, we obtain the Bernstein-von Mises theorem for the normalized generalized gamma process (NGGP), which is a flexible class of Bayesian nonparametric priors includes the Dirichlet process, the normalized inverse-Gaussian process and the σ−stable process.Through the posterior consistency analysis, the NGGP is posterior consistent when the true distribution P 0 is discrete or when the true distribution P 0 is continuous and the parameter σ of the NGGP goes to 0. The case that σ → 0 would reduce the NGGP to the Dirichlet process.Thus, we should emphasis the case when the true distribution P 0 is discrete.However, there will be a bias term on the left hand side of the Bernstein-von Mises theorem for the NGGP when P 0 is discrete.It turns out that the bias term may not go to 0 when n → ∞.Thus, in order to construct the "correct" Bayesian credible sets that cover the true parameter value, we suggest a bias correction to mitigate the bias term.The comparison of credible intervals with bias correction and without bias correction is given in the numerical illustration.In the application, the model parameters of NGGP are chosen by some data driven estimators and we show that the Bayesian estimator or maximum likelihood estimators of the model parameters of the NGGP won't affect the convergences in the Bernstein-von Mises results.
The outline of this paper is as follows.In Section 2, we recall the construction of the NRMIs and their posterior distributions.In Section 3, we discuss the posterior consistency of the homogeneous NRMIs and introduce a simple assumption on the corresponding Lévy intensities to guarantee the posterior consistency of the homogeneous NRMIs.Examples for several well-known Bayesian nonparametric priors are given to verify the applicability of the introduced assumption.In Section 4, we derive the Bernstein-von Mises theorem for the NGGP and provide an analysis of the bias correction with an numerical illustration.Finally, in Section 5, we provide a discussion of our results and some ideas that can be studied in the future.In order to ease the flow of the ideas, we delay the proofs to the supplementary materials (Section 6).
2 Normalized random measures with independent increments

Constructions of NRMIs
We start by recalling the notions of completely random measures (see e.g., (Kingman, 1967(Kingman, , 1993) ) and references therein for more details), which play important roles in the construction of NRMIs.
Definition 1.Let µ be a measurable function defined on (Ω, F, P) that takes values in (M X , M X ).We call µ is a completely random measure (CRM) if the random variables The completely random measures play an important role in Bayesian nonparametric priors and we refer to (Regazzini et al., 2003;Lijoi et al., 2010) for more detailed discussion.
One way to construct NRMIs is through Poisson random measure explained as follows.Denote S = R + × X and denote its Borel σ-algebra by S. A Poisson random measure Ñ on S with finite intensity measure ν(ds, dx) is a random measure from Ω × S to R + satisfying are mutually independent.
Let (B X , B X ) be the space of finite measures on (X, X ) endowed with the topology of weak convergence and let μ be the random measure defined on (Ω, F, P) that takes values in (B X , B X ) defined as follows, It is trivial to verify that μ is a completely random measure.It is also well-known that for any B ∈ X , μ(B) is discrete and is uniquely characterized by its Laplace transform as follows: 1 − e −λs ν(ds, dx) . (2.2) The measure ν is called the Lévy intensity of μ and we denote the Laplace exponent by 1 − e −λs ν(ds, dx) . (2.3) From the Laplace transform in (2.2), we aware that the completely random measure μ is characterized completely by its Lévy intensity ν, which usually takes the following forms in the literature: (a) ν(ds, dx) = ρ(ds)α(dx), where ρ : B(R + ) → R + is some measure on R + and α is a non-atomic measure on (X, X ) so that α(X) = a < ∞.The corresponding μ is called homogeneous completely random measure.
It is obvious that the case (a) is a special case of case (b).Usually, we assume that α is a finite measure so we may write α(dx) = aH(dx) for some probability measure H and some constant a = α(X) ∈ (0, ∞).
To construct NRMIs, the completely random measure will be normalized, and thus one needs the total mass μ(X) to be finite and positive almost surely.This happens under the condition that ρ(R + ) = ∞ in homogeneous case and that ρ(R + |x) = ∞ for all x ∈ X in non-homogeneous case (Regazzini et al., 2002).Under the above conditions, an NRMI P on (X, X ) is a random probability measure defined by (2.4) P is discrete due to the discreteness of μ.For notional simplicity, we let T = μ(X) and let f T (t) be the density of T throughout this paper.

Posterior of NRMIs
We will recall the posterior analysis (James et al., 2009) of NRMIs, which is a key topic in Bayesian nonparametric analysis.Let P be an NRMI on X.A sample of size n from P as in (1.1) is an exchangeable sequence of random variables X = (X i ) n i=1 defined on (Ω, F, P) and taking values in X n , such that given P , (X i ) i≥1 are drawn iid from distribution P , i.e.,

P[X
(2.5) j=1 be the distinct observations of the sample X and let n(π) be the number of unique values of X.This means, π The number of the jth set of the partition is n j , so that s k e −us ρ(ds|Y ) for any positive integer k and Y ∈ X.
(2.6)With these notations, the posterior distribution of P conditional on the observations of the sample X 1 , • • • , X n is given by the following theorem.
Theorem 2 (James et al. (2009)).Let P be an NRMI with intensity ν(ds, dx) = ρ(ds|x)α(dx).The posterior distribution of P , given a latent random variable U n , is an NRMI that coincides in distribution with the random measure where (i) The random variable U n has density is the conditional completely random measure of μ with the Lévy intensity ν (Un) = e −Uns ρ(ds|x)α(dx); } are random variables depending on U n and Y j and having density (2.9) (iv) The random elements μ(Un) and J j , j ∈ {1, • • • , n(π)} are independent; (v) T (Un) = μ(Un) (X) and κ n = T (Un) (vi) The conditional density of U n given X is given by (2.10) The above theorem shows that, given the latent variable U n , the posterior of P is a weighted sum of another NRMI μ(Un) T (Un) and the normalization of Delta measure δ Y j of distinct observations Y j , multiplied by its corresponding jumps J j .This gives a rather complete description of the posterior distribution of NRMIs.More details of the posterior analysis of μ and P can be found in (James et al., 2009).

Posterior consistency analysis for the NRMIs
In this section, we aim at discussing the posterior consistency for NRMIs as pointed out in question (i) in the introduction.Assume that denote the probability law of the posterior random probability measure P |X.The posterior distribution is said to be weakly consistent if Q n concentrates on the weak neighbourhood of P 0 almost surely.More precisely, for any weak neighbourhood O ∈ M X of P 0 with arbitrary radius > 0, as n → ∞.The limiting probability measure P ∞ 0 = lim n→∞ P n 0 is the infinite product measure on X ∞ , namely, P ∞ 0 = P 0 × P 0 • • • , which makes the random variables X 1 , X 2 , • • • independent with common true distribution P 0 .
Before presenting the main result, we shall give the following lemma, which provides the moments of the posterior P .The lemma plays an important role in the proof of the main theorem.By recalling ψ A in (2.3), we denote for any A ∈ X .
Lemma 3. Let X = (X i ) n i=1 be a random sample from a normalized random measure with independent increments P .The moments and the mixed moments of the posterior moments of P given X are given as follows (we use the notation of Theorem 2).
(i) For any A ∈ X and m ∈ N, the posterior m-th moment of P is given by (ii) For any family of pairwise disjoint subsets {A 1 , • • • , A q } of X and any integers {m 1 , • • • , m q }, we have ) is the set of the index of Y j 's that are in A i , and #(λ i ) is the number of components in λ i .
The above lemma provides the posterior moments of NRMIs.Such results can be reduced to the moments of NRMIs by letting the sample size n = 0.The proof of lemma 3 is inspired by the idea in (James et al., 2006) and the details are given in the supplementary materials (Section 6).To apply the above lemma, one needs to deal with the term V (k) α(A) (y) defined by (3.1).We give the following recursion formula for this quantity: where ξ i (y) = A τ i (y, x)α(dx).
To answer question (i) mentioned in the introduction, we shall study the weak consistency for more general NRMIs.To do so, we need the following assumption.
Assumption 4. Let τ k (u, x) be defined by (2.6) and let ρ(s|x) be a function such that u τ k+1 (u,x)  τ k (u,x) is nondecreasing in u and bounded from above by k − C k (x) uniformly for all k ∈ Z + and x ∈ X, where {C k (x)} is a sequence of functions from X to [0, 1).Namely, there is an increasing positive function φ(u) with lim u→∞ φ(u) = 1 such that Theorem 5. Let P be an NRMI with Lévy intensity ν(ds, dx) = ρ(s|x)dsα(dx), where ρ(s|x) satisfies Assumption 4. Then 1.If P 0 is continuous, then the posterior of P converges weakly to a point mass at .
2. If P 0 is discrete with lim n→∞ n(π) n = 0, then P is weakly consistent, i.e., the posterior of P converges weakly to a point mass at P 0 (•) a.s.−P ∞ 0 .Although the assumption 4 looks complicate, it is quite easy to check as long as ρ(s|x) is given.For instance, the intensities ρ(s|x) for almost all popular NRMIs are gamma type, and we shall check assumption 4 for these NRMIs in example 10, example 11 and example 12 to show how the assumption 4 works for these processes.This allows more applicability of Theorem 5.
As a comparison between Theorem 5 and the results in (Ho Jang et al., 2010) for the species sampling priors and (De Blasi et al., 2013) for the Gibbs-type priors, Theorem 5 considers the consistency results for the non-homogeneous NRMIs, which is a more general class of Bayesian nonparametric priors than both the species sampling priors and the Gibbstype priors.On the other hand, the conditions in (Ho Jang et al., 2010;De Blasi et al., 2013) are not trivial to verify for homogeneous NRMIs, even though the predictive distribution of homogeneous NRMIs is given (Pitman, 2003;James et al., 2009).
In Theorem 5, we require lim n→∞ n(π) n = 0 as a condition to guarantee the posterior consistency result when P 0 is discrete.This condition is true almost surely by the following proposition.
Proof.Note that P 0 is the true distribution of X, i.e., X iid ∼ P 0 .Recall that n(π) is the number of distinct observations of X.
be the empirical probability measure.
If P 0 is discrete, we denote the collection of atoms of almost surely, where we use the Borel-Cantelli lemma when taking the limit of , we can have the following assumption that is equivalent to assumption 4.
Remark 8. Theorem 5 can be extended to more general NRMIs.For example, (James, 2002) introduced the h-biased random measures μ by Y×X g(s) Ñ (ds, dx), where g : Y → R + is an integrable function on any complete and separable metric space Y.
One interesting quantity to be considered is n(π), the number of distinct observations of the sample {X i } n i=1 .In Bayesian nonparametric mixture models, n(π) is the number of clusters in the sample observations and thus is studied in a number of works that are concerning the clustering and so on.Among the literatures let us mention that the distribution of n(π) is obtained in (Korwar and Hollander, 1973) for the Dirichlet process; in (Antoniak, 1974) for the mixture of Dirichlet process; in (Pitman, 2003) for the twoparameter Poisson-Dirichlet process.For the general NRMIs we have by a result of (James et al., 2009): Proposition 9.For any positive integer n, the distribution of n(π) is where k = 1, • • • , n, and the summation is over all vectors of positive integers such that k j=1 n j = n.As we mentioned above, the assumption 4 is in fact quite easy to verify.We provide in the following examples to see the applicability of Theorem 5.
It is easy to check that for any nonnegative integer k, Thus, the assumption 4 is verified and Theorem 5 implies the normalized generalized gamma process is posterior consistent when σ → 0 (i.e. the Dirichlet process), or when P 0 is discrete.
Example 11.The generalized Dirichlet process GDP(a, γ, H) (Lijoi et al., 2005a) is an NRMI with the following homogeneous Lévy intensity where γ is a positive integer.The corresponding Laplace transform of μ(A) is is the ascending factorial of c for any positive integer k.When γ = 1, the generalized Dirichlet process is reduced to the Dirichlet process.
It is trivial to obtain for any nonnegative integer k, x) is increasing in u with the upper bound k.Theorem 5 can then be used to conclude that the generalized Dirichlet process is posterior consistent.
Example 12.As a non-homogeneous example, we consider the extended gamma NRMI whose non-homogeneous Lévy intensity is given by ν(ds, dx) = e −β(x)s s dsα(dx) , (3.7) where β(x) : X → R + is an integrable function (with respect to α(dx)).Such NRMI is constructed by the normalization of the extended gamma process on R introduced by (Dykstra and Laud, 1981).More generally, (Lo, 1982) studied the extended Gamma process, called weighted Gamma process on abstract spaces.By a trivial computation, for any nonnegative integer and the assumption 4 is satisfied.Theorem 5 implies that the extended gamma NRMI is posterior consistent when β(x) is integrable with respect to α(dx).
Our theorem can also be applied to more general NRMIs which haven't been investigated in previous works.For example, we may naturally consider the following generalized extended gamma NRMI by letting the Lévy intensity be as follows: where r ∈ Z + and β i (x) : X → R + are integrable functions (with respect to α(dx)).
A similar argument to that of example 11 and example 12 implies that the generalized extended gamma NRMI is posterior consistent when β i (x) is integrable (with respect to α(dx)) for all i ∈ {1, • • • , r}.
Relying on the results in this section, we have answered the question (i) addressed in the introduction.The posterior consistency of NRMIs when P 0 is continuous doesn't hold generally, as the posterior distribution of NRMIs is inconsistent when C1 = 0 or H = P 0 (P n ).However, it is rare to choose H to be the "true" distribution P 0 and it is not possible to let H = P n before a sample is observed.Thus, the assumption C1 = 0 should be made to guarantee the posterior consistency for the NRMIs when P 0 is continuous.And, whenever ρ x (ds) is gamma type, C1 = 0 would reduce the corresponding P to the Dirichlet process or the generalized Dirichlet process.

Bernstein-von Mises theorem for the generalized normalized gamma process
The Bernstein-von Mises theorem links Bayesian inference with frequentist inference.Similarly to the Bernstein-von Mises theorem (Vaart, 1998) in Bayesian parametric framework, one can derive the Bernstein-von Mises theorem in Bayesian nonparametric framework.
There has been some works in the literature.One example is the Bernstein-von Mises theorem for the empirical process (van der Vaart and Wellner, 1996;Vaart, 1998).With the fact that the maximum likelihood estimator of P 0 in the Bayesian nonparametric sense is , one can conclude the limit law of √ n(P n − P 0 ) is normal distribution.Based on a similar idea, we would consider the limit law of the posterior distribution of √ n(P − P n ) given an iid sample X from P 0 .To explain the Bernstein-von Mises theorem in the Bayesian nonparametric case, we temporarily let P ∈ M X be any random probability measure and define the functional as follows: where f : X → R is any measurable functions.
Let F be the collection of functions f , the Bernstein-von Mises theorem in the Bayesian nonparametric case considers the distribution of It is worth to point out that there have been many works for the weak convergence of stochastic processes indexed by elements of Banach space of functions, we refer the statisticians to (van der Vaart and Wellner, 1996;Vaart, 1998) for further reading.When the function collection Otherwise, it is convenient to consider the F to be P 0 −Donsker.Here we recall that An notable result is that a finite set F is P 0 −Donsker if and only if P 0 f 2 < ∞ for every f ∈ F. For the infinite P 0 −Donsker classes, one can find details and examples in (van der Vaart and Wellner, 1996).
In order to define the weak convergence of √ n(P − P n ) conditional on X to B o P 0 , we can use the conditional weak convergence in the bounded Lipschitz metric (van der Vaart and Wellner, 1996) as follows: as n → ∞.The expectation in (4.1) is taken for the random probability measure P , and thus the left side of (4.1) is a function of X.The convergence in (4.1) refers to the iid sample X from P 0 and can be in probability or almost surely.The supreme is taken over the set BL 1 of all functions h : l Under the convergence criteria we explained above, we will present the Bernstein-von Mises theorem when P ∼ NGGP(a, σ, θ, H).For simplicity of interpretation, let Pn = Theorem 13.Let X be a sample as defined in eq.(1.1) with P ∼ NGGP(a, σ, θ, H).Let F be the finite collection of functions such that P 0 f 2 < ∞ and Hf 2 < ∞ for any f ∈ F. We have the following convergences almost surely under P ∞ 0 .
(i) If P 0 is discrete, Here B o P 0 , B o H are independent Brownian bridges, independent of the standard normal random variable Z.Moreover, if F is any P 0 −Donsker class of functions, then the convergences hold in probability in l ∞ (F).In this case, the convergences is also P ∞ 0 −almost surely under an additional condition that P 0 ||f − P 0 f || 2 F < ∞.We refer to Theorem 2.11.1 and 2.11.9 in (van der Vaart and Wellner, 1996) for more details of the discussion for F such that the convergence holds in l ∞ (F).
When P 0 is continuous, there is a "bias" term σ(H − P n ) in the convergence in (4.4).And the term vanishes only when σ = 0, under which P becomes the Dirichlet process, or when H = P n (H = P 0 ), which is unrealistic.Moreover, the σ equals the C1 in Theorem 5. Thus, it suggests that one is not expected to use NGGP for continuous P 0 .
On the other hand, it is interesting to see that there is a "bias" term σn(π) n (H − Pn ) on the left hand side of the convergence in (4.2) when P 0 is discrete to make the limiting process is B o P 0 .We can not drop this "bias" term directly, although lim n→∞ n(π) n = 0 a.s..The term can be dropped as long as lim n→∞ n(π) √ n = 0, in the sense that the number of atoms {x j } in P 0 should decrease fast enough when n → ∞.For a formal condition of P 0 to make lim n→∞ n(π) √ n = 0, we have the following corollary.Corollary 14.Under the conditions in Theorem 13, when P 0 is discrete, we have the following results.
(i) If P 0 ({x j }) ≤ C j α , for some positive constant C and α > 2 and F is the class of uniformly bounded functions, then √ n(P (ii) If the function h(t) := #{x : P 0 ({x}) ≥ 1 t } is regularly varying at ∞ of exponent η with η < 1 2 and F is the class of uniformly bounded functions, then (iii) If F is a class of functions f such that f ({x j }) j β for some β > 0 and P 0 ({x j }) ≤ C j α , for some positive constant C and α > 2 + 2β, then The proof of the above Corollary follows directly from the Corollary 2 in (Franssen and van der Vaart, 2022).And we recall that if h is regularly varying at ∞ with exponent η ∈ (0, 1), then for any t > 0, we have lim n→∞ h(nt) h(n) = t η .Moreover, for such regularly varying function h, we have n(π) h(n) → Γ(1 − η) a.s., and h(n) is n η up to a slowly varying factor.We refer the appendix in (Haan and Ferreira, 2006) and (Bingham et al., 1987) for more details of the regularly varying function.
As the application of the Bernstein-von Mises results in theorem 13, we may construct Bayesian credible sets for P f when n → ∞.The choices of f determine the parameters, for which the credible sets are constructed.For example, if f (x) = x, the credible interval is for the mean.Since the posterior consistency does not hold for the case when P 0 is continuous, the credible sets for P f is not correct in this case, thus we shall only give the credible sets for P f when P 0 is discrete.
Corollary 15.If P 0 is discrete, under the conditions in Theorem 13, we have the probability of Here L n,α is the α−quantile of the posterior distribution of P f |X and β > α.
One direct interpretation of the above corollary is one may want n (π)  n → 0 in probability to make the "bias" term vanish and therefore the confidence interval for P 0 f becomes a regular form (L n,α f, L n,β f )).This is true under the case (i) of corollary 14, or when f (x) = x with α > 4. Otherwise, the correction σn (π)  n (Hf − Pn f ) is necessary as a bias correction to the credible interval.We provide a numerical illustration corresponding to this scenario in section 4.1.
However, P 0 is of course unknown in the real application and we shall consider Theorem 13 without the information from P 0 .In this case, one needs to pay especial attention to the parameter σ, and it is easy to see from both Theorem 5 and Theorem 13 that if σ → 0, P is posterior consistent and the Bernstein-von Mises results hold without the bias terms for any P 0 .But this corresponds to the case that P becomes the Dirichlet process.Thus, one should at least expect the parameter σ to be small.Usually, the model parameters are chosen by the empirical Bayesian method, and people can estimate the model parameters by using the maximum likelihood estimators conditional on the observations X.A well known conclusion (Pitman, 2003(Pitman, , 2006) ) in Bayesian nonparametric framework is the observation X from NRMIs induces a random partition structure for {1, • • • , n} as we introduced in section 2.2.The random partition structure is characterized by the exchangeable partition probability function (EPPF) (Pitman, 2003), which also plays the rule as the likelihood function of σ as explained in e.g., (Favaro and Naulet, 2021;Ghosal and Van der Vaart, 2017;Franssen and van der Vaart, 2022).And the EPPF for the NGGP is given as Γ(1−σ) .From Theorem 1 in (Favaro and Naulet, 2021), the maximum likelihood estimator σn exists uniquely.Furthermore, the results in Theorem 2 in (Favaro and Naulet, 2021) implies that σn → σ 0 in probability with a rate log(n)n − σ 0 2 , when P 0 is discrete with atoms x satisfying h(t) = #{P 0 ({x}) ≥ 1 t } is a regularly varying function of exponent σ 0 ∈ [0, 1).
Theorem 16.Under the assumptions in theorem 13, we have the following results.
(i) If σn is an estimator based on X that converges to σ 0 in probability, then the convergences in theorem 13 hold in probability by replacing σ n by σn and replacing σ by σ 0 .In particular, this is true for the maximum likelihood estimator σn , if P 0 is discrete with atoms x satisfying the condition that h(t) = #{P 0 ({x}) ≥ 1 t } is a regularly varying function of exponent σ 0 ∈ [0, 1).
(ii) If σ ∼ L σ , where L σ is a probability law on [0, 1] that plays the prior distribution of σ, then the Bayesian model becomes The convergences in theorem 13 hold by replacing σ n by σ on the left hand side, and replacing σ by σ 0 on the limiting processes.The σ on the left hand side is the posterior random variable.
The proof of the above theorem follows the same constructions as the proof in section 4.2 of (Franssen and van der Vaart, 2022).For the posterior consistency of σn , we refer to the details with proofs in section 4.3 of (Franssen and van der Vaart, 2022).The maximum likelihood estimator is not quite interesting as σn → σ 0 with σ 0 = 1 when P 0 is continuous, and σ 0 = 0 when P 0 is discrete (Favaro and Naulet, 2021).
Besides the parameter σ, the parameters a and θ don't appear in the asymptotic results in Theorem 5 and Theorem 13, and thus estimators of a and θ based on prior distributions or maximum likelihood method won't affect the convergences when a << √ n and θ << n σ .And the cases when ân and θn converge to ∞ as n → ∞ are not usual and beyond the scope of this work and can be considered in the future works.

Numerical illustration
We present the credible intervals for P 0 f when P 0 is discrete with different types of the number of atoms.To be more precise, let P 0 f = P 0 ([2, ∞]) for P 0 = P 1 , P 2 , P 3 , P 4 , where we describe P 1 , P 2 , P 3 , P 4 as follows.Let the probability distributions of P 1 , P 2 , P 3 , P 4 be on Z + are as follows.
Obviously, n(π) = 5 for P 1 .From the result (see e.g., Example 4) in (Karlin, 1967), we have the regularly varying functions h(t) corresponding to P 2 , P 3 , P 4 are proportional to t respectively.And when n → ∞, the distinct numbers n(π) of P 2 , P 3 , P 4 are proportional to n 3 , respectively, from Theorem 1 in (Karlin, 1967).Thus, the "bias" term for P 1 , P 2 , P 3 , P 4 goes to 0, 0, some constant, ∞, respectively.For the NGGP, we let P ∼ NGGP(1, σ = 0.5, 1, H), where H is standard normal distribution.We simulate P through its stick-breaking representation with the generating algorithm in (Favaro et al., 2016).To make sure the simulation of P = ∞ i=1 w i δ X i is accurate, we truncate the infinite sum at some N such that the weight of the tail ∞ i=N w i < 1 √ n , where n is the sample size.We simulate 10000 replications of the sample X from P 1 , P 2 , P 3 , P 4 with the sample size n = 10, 100, 1000, 10000, 100000 respectively.For the sample from P 1 , we construct one 95% credible interval for each sample for P 1 ([2, ∞)) with the "bias" correction as in corollary 15 and compute the proportion that the true value P 1 ([2, ∞)) belongs to the intervals of 10000 replications.And we also compute the same proportion without the "bias" correction.The results of P 1 , P 2 , P 3 , P 4 are given in tables 1 and 2.
Since the "bias" terms for P 1 and P 2 vanish as n → ∞, the proportions of the coverage of the true value are large for both with and without "bias" correction.And the 95% credible intervals for P 3 f and P 4 f are not performing good without "bias" correction.

Discussion
To the best of our knowledge, the Lévy intensities of the well-studied NRMIs up-to-date are given in the form of the gamma density: s −σ−1 e −βs .It turns out that with the shape parameter σ = 0, the posterior consistency is always guaranteed for any "true" prior The marginal densities for P 1 ([2, ∞)) with sample size n = 10, 100, 1000, 10000, 100000 follow the order from top left to bottom right.distribution P 0 .Otherwise, the posterior consistency only holds for discrete prior P 0 but not for continuous P 0 .Such phenomenon does naturally make sense due to the discreteness of NRMIs (the completely random measures (Kingman, 1975)).As explained in the Bayesian literature, if P 0 is diffusive and the prior guess for the sample distribution α = P 0 , then the prior guess will always contribute to the posterior, no matter how large is the sample size.In such sense, the Bayesian nonparametric models never behave "better" than the empirical models asymptomatically.However, this doesn't mean the NRMIs are not useful.On the one hand, we are not able to know the "true" distribution of a given sample with any size n, also the sample size n will never be ∞, a prior guess of the random probability measure based on experience could make the model suitable.On the other hand, the NRMIs behave great for the data from discrete distributions.Furthermore, the mixture and hierarchical Bayesian nonparametric models based on NRMIs are showing great success in the applications and consistency behaviours (Lijoi et al., 2005).And the class of NRMIs is much larger than we expected, so that more study is necessary to develop more flexible subclasses of NRMIs or more general NRMIs like classes that are satisfying the consistency property.The results in this work provides a guideline of choosing the proper intensity ρ(s|x), for example, the generalized Dirchlet process and the generalized extended gamma NRMI are good choice in the Bayesian nonparametric applications and they both show some flexibility.Besides, we may let σ → 0 by assigning a randomness on σ, or one may construct α to depend on ρ(ds|x) to deduct C1 .
Due to the complexity of the posterior of the NRMIs, it is not easy to present a Bernstein-von Mises like result to give the limiting process of posterior of general NR-MIs.The result for the normalized generalized gamma process, along with the works in (Lo, 1983(Lo, , 1986;;Ray and van der Vaart, 2021;Hu and Zhang, 2022;James, 2008;Franssen and van der Vaart, 2022), shed some light in discovering the Bernstein-von Mises theorem for general NRMIs.

Supplementary Materials
In this section, we prove lemma 3,theorem 5 and theorem 13.

Proof of lemma 3
Let I = E[(P (A)|X) m ].Then, by Theorem 2, I can be computed as follows.
For any family of pairwise disjoint sets {A 1 , • • • , A q } in X and for any positive integers {m 1 , • • • , m q } we denote A q+1 = (∪ q i=1 A i ) c , m q+1 = 0, and m = q i=1 m i .For any sample j=1 be the distinct values of {X i } n i=1 .Let λ i = {j : Y j ∈ A i } be the set of the index of Y j 's that in A i and we denote by #(λ i ) the number of components in λ i .We can compute the following moments easily.
A similar computation as that for I yields This is part (ii) of the theorem.Then the proof of lemma 3 is completed.

Proof of Theorem 5
We need the following lemma to prove Theorem 5.
Lemma 17.Under the assumption 4, we have for any y ∈ X and k ∈ Z + , Proof.Let g n (u) be a constant multiple of the density of f Un|X (u|X) given by (2.10).Namely, The derivative of g n (u) is computed as follows, . By the assumption 4, u τ 2 (u,y) τ 1 (u,y) ≤ 1.This means h n (u) ≥ 0 and then h n (u) is nondecreasing in u.Similarly, from the assumption 4, it follows that u τ n j +1 (u,Y j ) τn j (u,Y j ) is also nondecreasing in u for all n j .Thus, we have it is then bounded and attain its maximum point at some point u 2 n,n(π) Note that gn is also a continuous function and is then bounded on bounded interval.We claim that u 2 n,n(π) → ∞ as n → ∞.In fact, by assumption 4, u τ k+1 (u,y) ), ∀k ∈ Z + and y ∈ X, for some function φ(u) ∈ (0, 1) which is nondecreasing in u and lim u→∞ φ(u) = 1.Assume that which is a contradiction.Denote τk (u, y) = u τ k+1 (u, y) τ k (u, y) .
And let u n,n(π) be the positive square root of u 2 n,n(π) , thus u n,n(π) → ∞ as n → ∞.Then, we have the following inequalities, (6.7) The last limit in eq.(6.7) is due to the following form and lim n→∞ = 1, lim n→∞ τk (u n,n(π) , y) = lim u→∞ τk (u, y) = k − C k (y) by assumption 4.This completes the proof of the lemma.Now we are ready to give the proof of Theorem 5. To emphasise the finiteness of α, we use the notation that α = aH, where a = α(X) is finite and H is some probability measure.
We would follow the similar idea as that in (Freedman and Diaconis, 1983) to define a class of semi-norms on M X such that convergence under such norms implies weak convergence.Let A = {A i } ∞ i=1 be a measurable partition of X.The semi-norm between two probability measures P 1 and P 2 in M X with respect to the partition A is defined by (6.8) In order to show the posterior distribution of NRMI concentrates around its posterior mean, we have the following lemma.
Lemma 18.For any given measurable partition A, a.s.-P ∞ 0 as n → ∞.Proof.To prove this claim, we shall evaluate the first and second posterior moments of P for any A ∈ X .For the first moment we have For the second moment we have Then, we can write where the terms J 1 , J 2 , J 3 , J 4 are defined as follows. and We will first consider the terms J 2 , J 3 , J 4 and then J 1 .But before dealing with them, we need some prior preparations.By the identity E[P (X)|X] = 1 we have By Lemma 17, we have the approximation as n is large.On the other hand, let u n,n(π) be the maximal point of g n (u) as in Lemma 17.
Under the assumption 4, we know that uτ 1 (u, x) is nondecreasing in u for all x.We have which goes to 0 as n → ∞ by the Monotone convergence theorem, since τ 1 (u, x) is decreasing to 0 in u for all x.Combining the above computation with the approximation (6.15), we have Step 1: Evaluation of J 2 .Notice first that for any A i and Y j , by the assumption 4, we will have On the other hand, uf Un|X (u) X τ 1 (u, x)H(dx)du .(6.20) By the above inequalities (6.19), (6.20) and the approximation (6.15), (6.18), we can see as n becomes large Thus, for large n, we have This combined with (6.15) yields Step 2: Evaluation of J 3 .For J 3 , notice that under the assumption 4, we have is nondecreasing in u and is bounded by (n j + 1 − C n j +1 (Y j ))(n j − C n j (Y j )).Using a similar approach as that in Lemma 17, we have as n is large, Combining it with Lemma 17, we have as n becomes large , (6.23) which has order at most O( 1 n ).
Step 3: Evaluation of J 4 .For J 4 , we have that under the assumption 4, . Using a similar argument to that in Lemma 17 leads to Thus which has an order at most O( 1 n ).
Step 4: Evaluation of J 1 .Finally, we deal with the term J 1 .Notice that E[P (X) 2 |X] = 1.Using the computation we obtained for J 2 , J 3 , J 4 , we have This implies ) n(n + 1) .(6.24) Combining the approximations (6.15) and (6.24), we have We now treat the above last two summation terms.First, we have It is easy to have that Thus J 1 has an order O( 1 n ).Summarizing the above four steps for evaluating J 1 , J 2 , J 3 , J 4 , we have Now, we can give the completion of theorem 5.
(i) The random variable U n has density (iv) The random elements P Un and J j , j ∈ {1, • • • , n(π)} are independent.
(v) T (Un) = μ(Un) (X) and κ n = T (Un) Proof.The lemma is an immediate consequence of theorem 2 and the NGGP intensity given in example 10 except (iii), where we have a more specific form for D n .To verify (iii), we let D n,j := By the Proposition G.2 in (Ghosal and Van der Vaart, 2017), we have which is totally independent of U n , thus independent of κ n and P Un .To understand the independence, we can use the relationship between Dirichlet distribution and the gamma distribution from the Proposition G.2 in (Ghosal and Van der Vaart, 2017).
The convergences (4.2) and (4.3) are equivalent in Theprem 13, and also the convergences (4.4) and (4.5) are equivalent.These equivalences can be shown by the following lemma.To make the results lavish, we will assume {σ i } n i=1 be a sequence such that lim n→∞ σ n = σ ∈ [0, 1) in the following proofs.It is worth to point that, we always assume that σ i < 1 and σ < 1 to make sure all quantities in this work are well-defined.To be more precise, this assumption would make the forms ∞ 0 s n j −σ i −1 e −(u+θ)s ds < ∞ and ∞ 0 s n j −σ−1 e −(u+θ)s ds < ∞ for any integer n j ≥ 1. Lemma 20.For any P 0 , we have Proof.Since the convergence of σ n to σ won't affect the proof, and σ n is well-defined as discussed previously, we may fix σ n and use σ for the sake of notational simplicity in the proof.
By applying the NGGP intensity (3.5) to lemma 3, we have To evaluate lim n→∞ E[P |X], we need to find the limits of (6.34) We will find the limit of (6.33) and then (6.34).For (6.33) by the density of U n , we have By the similar arguments in lemma 17, we use the Laplace method to find the limit of the nominator and denominator of (6.35).Let Thus, As n →, by the similar arguments in lemma 17, g 1 (u) and g 2 (u) attain their maximums at u 1,n , u 2,n that are both infinity large.Thus, (6.36) Recall proposition 6, when P 0 is discrete, lim n→∞ n(π) n = 0,almost surely.The limit in (6.36) becomes where the exponential part in the last equation converges to 0 by the fact that where we emphasis that 1 σ > 1 when dealing with the convergence of the exponential part.By using the same arguments above for finding the limit of (6.33), we can find the limit of (6.34).We omit the details of the computation and can obtain the following results. When where the last equation is due to lim n→∞ n(π) n = 0 and the Borel-Cantelli lemma .That is to say, the result in (6.30) is completed by combining the limit of (6.33) and (6.34) when P 0 is discrete.
Thus, combining the limit of (6.33) and (6.34), we have Thus the proof of the result in (6.31) is completed.
With the lemma 20, it is sufficient to proof theorem 13 by only showing the convergences (4.2) and (4.4).The following lemma plays an important role in the proof of theorem 13.Here, we recall that an envelop function of F is a measurable function f e X → R such that |f | < f e , for any ∈ F.
Lemma 21.Let F be a finite set of H−square integrable functions.Assume that n(π) → ∞ as n → ∞, which includes the case when P 0 is continuous so that n(π) = n and the case when P 0 is discrete but n(π) converges to ∞ with a lower rate than n do.Then a.s., (6.37) in R F .The convergence holds a.s. in l ∞ (F) with an envelop function f e such that H(f 2 e ) < ∞, and thus the central limit theorem holds for P Un |X in l ∞ (F).
Proof.The proof relies on the stick-breaking representation of P Un in (Favaro et al., 2016) and the functional central limit theorem of NGGP in (Hu and Zhang, 2022).And similarly as discussed in the proof of last lemma, we use σ instead of σ n to make the interpretation easy to read.By section 4.2 in (Favaro et al., 2016), P Un admits a stick-breaking representation with dependent stick-breaking weights {v i } ∞ i=1 , and the joint distribution of {v i } ∞ i=1 are given (Hu and Zhang, 2022) as (6.38)where β n = a(u+θ) σ σ .We will follow the same idea as in the proof of Proposition 3.4 and the theorem 4.4 in (Hu and Zhang, 2022).To obtain the similar result as the Proposition 3.4 in (Hu and Zhang, 2022), we will consider the asymptotic result of the following quantity as n → ∞.
(1−v i ) σ dtf Un (u)du , (6.39) where p is any positive integer.To evaluate (6.39) as n → ∞, we shall have a further analysis of the integral with respect to u, which is the only term that relates to n.Consider the following integral for any b > 0, and any positive integer k.This would imply (1−v i ) σ dtf Un (u)du , (6.41) in which we choose M = M n that goes to ∞ as n → ∞.In this case, when n → ∞, β n → ∞ as well and we are safe to use the results in Proposition 3.4 in (Hu and Zhang, 2022) to obtain that when n → ∞ (thus n(π) → ∞) , where the last equation can be computed by the same argument as in (6.35) and the computation afterwards.The result of (6.37) follows immediately by applying the theorem 4.4 in (Hu and Zhang, 2022).
By the above lemma and its proof, it is interesting to see that when n → ∞, we can have P n(π) d = P Un , where P n(π) ∼ NGGP(n(π), σ, θ, H) for any n(π) → ∞.Thus, we can replace P Un by P n(π) in the proof of theorem 13, the benefit of such replacement is P n(π) is independent of κ n when n → ∞.
The next lemma provides the convergence of κ n .
Proof.We shall compute the moments of κ n = T (Un) T (Un) + n(π) j=1 J j by the same method that we use in the proof of lemma 3. To make it clear, we present the details for E[κ n ] as follows.

Table 1 :
Proportion of coverage of the true value for the 95% credible interval without "bias" correction.

Table 2 :
Proportion of coverage of the true value for the 95% credible interval with "bias" correction.