On Dependent Dirichlet Processes for General Polish Spaces

We study Dirichlet process-based models for sets of predictor-dependent probability distributions, where the domain and predictor space are general Polish spaces. We generalize the definition of dependent Dirichlet processes, originally constructed on Euclidean spaces, to more general Polish spaces. We provide sufficient conditions under which dependent Dirichlet processes have appealing properties regarding continuity (weak and strong), association structure, and support (under different topologies). We also provide sufficient conditions under which mixture models induced by dependent Dirichlet processes have appealing properties regarding strong continuity, association structure, support, and weak consistency under i.i.d. sampling of both responses and predictors. The results can be easily extended to more general dependent stick-breaking processes.


Introduction
This paper focuses on the properties of Bayesian nonparametric (BNP) prior distributions for sets of predictor-dependent probability measures, F = {F x : x ∈ X }, where the F x 's are probability measures defined on a common measurable Polish space (Y, B(Y)), indexed by a vector of exogenous predictors x ∈ X , with X being also a Polish space, where B(Y) is the Borel σ-field of Y.To date, most of the BNP priors to account for the dependence of predictors on set of probability measures F are generalizations of the Dirichlet process (DP) [Ferguson, 1973[Ferguson, , 1974] ] and Dirichlet process mixture (DPM) models [Lo, 1984].Let D(Y) be the space of all probability measures, with density w.r.t.Lebesgue measure, defined on (Y, B(Y)).A DPM model is a stochastic process, F , defined on an appropriated probability space (Ω, F, P), such that for almost every ω ∈ Ω, the density function of F is given by where ψ(•, θ) is a continuous density function on (Y, B(Y)), for every θ ∈ Θ, and G is a DP, whose sample paths are probability measures defined on (Θ, B(Θ)), with B(Θ) being the corresponding Borel σ-field.If G is DP with parameters (M, G 0 ), where M ∈ R + 0 and G 0 is a probability measure on (Θ, B(Θ)), written as G | M, G 0 ∼ DP(M G 0 ), then the trajectories of the process can be a.s.represented by the following stick-breaking representation [Sethuraman, 1994]: , where δ θ (•) is the Dirac measure at θ, Discussion of properties and applications of DP can be found, for instance, in Müller et al. [2015].
Most of the BNP extensions incorporate dependence on predictors via the mixing distribution in (1), by replacing G with G x , and the prior specification problem is related to the modeling of the collection of predictor-dependent mixing probability measures {G x : x ∈ X } [Quintana et al., 2022].Some of the earliest developments on predictor-dependent DP models appeared in Cifarelli and Regazzini [1978], who defined dependence across related random measures by introducing a regression for the baseline measure of marginally DP random measures.A more flexible construction was proposed by MacEachern [1999], called the dependent Dirichlet process (DDP).The key idea behind the DDP is to create a set of marginally DP random measures and to introduce dependence by modifying the stick-breaking representation of each element in the set.Specifically, MacEachern [1999] generalized the stick-breaking representation by assuming G x (B) = ∞ i=1 π i (x)δ θi(x) (B), B ∈ B(Θ), where the point masses θ i (x), i ∈ N, are independent stochastic processes with index set X , and the weights take the form π i (x) = V i (x) j<i [1 − V j (x)], with V i (x), i ∈ N, being independent stochastic processes with index set X and Beta(1, M ) marginal distribution.We refer the reader to Barrientos et al. [2012] for a formal definition of the DDP.Other extensions of the DP for dealing with related probability distributions include the DPM mixture of normals model for the joint distribution of the response and predictors [Müller et al., 1996], the hierarchical mixture of DPM [Müller et al., 2004], the predictor-dependent weighted mixture of DP [Dunson et al., 2007], the kernel-stick breaking process [Dunson and Park, 2008], the probit-stick breaking processes [Chung andDunson, 2009, Rodriguez andDunson, 2011], the cluster-X model [Müller and Quintana, 2010], the PPMx model [Müller et al., 2011], and the general class of stick-breaking processes [Barrientos et al., 2012], among many others.Dependent neutral to the right processes and correlated two-parameter Poisson-Dirichlet processes have been proposed by Epifani and Lijoi [2010] and Leisen and Lijoi [2011], respectively, by considering suitable Lévy copulas.The general class of dependent normalized completely random measures has been discussed, for instance, by Lijoi et al. [2014].Based on a different formulation of the conditional density estimation problem, Tokdar et al. [2010] and Jara and Hanson [2011] proposed alternatives to convolutions of dependent stick-breaking approaches.
All of the dependent BNP approaches described previously have focused on responses and parameters defined on Euclidean spaces, and are not appropriate for spaces in which the Euclidean geometry is not valid.A relevant example of this situation arises in statistical shape analysis, where one of the main spaces of interest is Kendall's shape space [Kendall, 1977], which can be viewed as the quotient of a Riemannian manifold.Kendall's space is a natural underlying space for applications in different areas, including morphometry [Claude, 2008], meteorology [Mardia and Jupp, 2000], archeology [Dryden and Mardia, 2016] and genetics [Billera et al., 2001].In these contexts, to employ standard statistical procedures that do not take into account the geometrical properties of the underlying spaces can lead to wrong inferences, which explains the increasing interest in the development of statistical models for more general Polish spaces.
To date, the development of statistical procedures for non Euclidean spaces has focussed on the problem of mean estimation [see, e.g., Bhattacharya and Patrangenaru, 2002, 2003, 2005], density estimation [see, e.g., Pelletier, 2005, Bhattacharya and Dunson, 2010, 2012a], and on the regression problem for Euclidean responses based on non Euclidean predictors [see, e.g., Pelletier, 2006, Bhattacharya andDunson, 2012b].Bhattacharya and Patrangenaru [2002, 2003, 2005] studied the problem of nonparametric estimation of a location parameter on a Riemannian manifold, by means of Dependent Dirichlet processes (DDP) are a class of P(Θ)-valued stochastic processes on X defined on (Ω, F, P).If {G a.s.for a sequence {{π i,x : x ∈ X }} i∈N of processes such that π i,x ≥ 0 for every i ∈ N and x ∈ X and i∈N π i,x ≡ 1 a.s., and a sequence {{θ i,x : x ∈ X }} i∈N of i.i.d.Θ-valued processes.The distinctive feature of the DDP is that the processes {{π i,x : x ∈ X }} i∈N are defined in terms of a stick-breaking process.Let {{V i,x : x ∈ X }} i∈N be a sequence of i.i.d.processes with V i,x ∼ BETA(1, α x ) for some α x ∈ R + and any x ∈ X .Then, the stick-breaking process asociated to {{V i,x : x ∈ X }} i∈N is ( We now generalize the definition in Barrientos et al. [2012] to Polish spaces.Definition 3.1.Let G X be a P(Θ)-valued stochastic process on X and defined on (Ω, F, P) given by {G x : x ∈ X }.Suppose the following conditions hold: 1.There exists a sequence {{V i,x : x ∈ X }} i∈N of separable i.i.d.processes, with a law characterized by a finite-dimensional parameter Ψ V , and with marginal distribution BETA(1, α x ) for some α x ≥ 0 for any x ∈ X .
2. There exists a sequence {{θ i,x : x ∈ X }} i∈N of i.i.d.processes, with a law characterized by a finite-dimensional parameter Ψ Θ , and with marginal distribution G 0 x ∈ P(Θ) for any x ∈ X .3. There exists a null set N ⊂ Ω such that for every x ∈ X , B ∈ B(Θ) and ω where the sequence {{π i,x : x ∈ X }} i∈N is given by the stick-breaking process in (2).
Then, the process G X is called a dependent Dirichlet process (DDP) with parameters (Ψ V , Ψ Θ ) and denoted as G X ∼ DDP(Ψ V , Ψ Θ ).If G X is a DDP we write α X := {α x : x ∈ X } and G 0 X := {G 0 x : x ∈ X }.
It is of interest to determine when a DDP can be constructed from a prescribed α X and G 0 X .First, we can construct a sequence of processes {{θ i,x : x ∈ X }} i∈N satisfying Condition 2 in Definition 3.1 from a prescribed G 0 X by constructing first a process {θ x : x ∈ X} with marginals {G 0 x : x ∈ X} using an extension of Kolmogorov's consistency theorem to Polish spaces [see Section III.3 in Neveu, 1965].Remark that in this case the separability of the resulting process is not required.
Second, we can construct a sequence of processes {{V i,x : x ∈ X }} i∈N satisfying Condition 1 in Definition 3.1 from a prescribed α X using Kolmogorov's consistency theorem and a consistent families of copula functions as in [Barrientos et al., 2012].However, Definition 3.1 requires this process to be in addition separable.This ensures that the set of outcomes for which the left-hand side in (3) is not a probability measure is a measurable set.In fact, if for every x ∈ X we define the event then the left-hand side in (3) fails to be a probability measure if However, the above set may not be measurable.The separability of the processes ensures there exists a countable set Therefore, whence it suffices to ensure P(N x ) = 0 for every x ∈ X V .Since any stochastic process on a separable space with a.s.continuous sample paths is separable, an alternative, standard way to build a separable process on X with BETA(1, α x ) marginal distributions is to transform a real-valued process over X with a.s.continuous sample paths using the quantile function of the beta distribution [MacEachern, 1999[MacEachern, , 2000]].Specifically, let {Z x : x ∈ X } be a real-valued stochastic process with a.s.continuous sample paths, and with continuous cumulative distribution function F Z,x at x ∈ X .Let F B,x denote the cumulative distribution function of a BETA(1, α x ) distribution.Then, the process {x : x ∈ X } defined as is separable, has a.s.continuous sample paths, and has marginal distribution BETA(1, α x ).The choice for the base process {Z x : x ∈ X } depends on the structure of X .When it is a Gaussian process, there are known conditions under which it admits a modification with a.s.continuous sample paths.For example, Theorem 2.3.1 in Khoshnevisan [2002] and Preston [1972] provide sufficient conditions for the existence of such a modification when X is compact.Another possibility when X is a manifold is to use diffusion processes [Hsu, 2002].
Although the DDP is flexible, it is of interest to define parsimonious variants of the DDP for which either the support points or the weights are independent of x.These parsimonious versions should be understood not only as simplifications of the DDP, but also as useful models with comparative advantages over the DDP that make them suitable in specific settings.The first parsimonious version removes the dependence of the weights on x.Definition 3.2.Let G X be a P(Θ)-valued stochastic process on X and defined on (Ω, F, P) given by {G x : x ∈ X }.

Suppose the following conditions hold:
1.There exists a sequence {V i } i∈N of i.i.d.processes, with a common law BETA(1, α) for some α ≥ 0.
2. There exists a sequence {{θ i,x : x ∈ X }} i∈N of i.i.d.processes, with a law characterized by a finite-dimensional parameter Ψ Θ , and marginal distribution G 0 x ∈ P(Θ) for any x ∈ X .3. There exists a null set N ⊂ Ω such that for every where the sequence {π i } i∈N of random variables is defined as Then, the process G X is called a single-weights dependent Dirichlet process (wDDP) with parameters (α, Ψ Θ ) and denoted as G X ∼ wDDP(α, Ψ Θ ).If G X is a wDDP we write G 0 X := {G 0 x : x ∈ X }.One of the advantages of the single-weights DDP is that it avoids any difficulty that may arise in the construction of a separable process {V i,x : x ∈ X } with marginals BETA(1, α x ) for x ∈ X .However, it implicitly assumes we may still be able to construct the process {θ x : x ∈ X } given the marginals {G 0 x : x ∈ X }.Hence, this variant is desirable when the structure of X is complex relative to that of Θ.
The second parsimonious variant of the DDP relaxes the dependence of the support points on x.Definition 3.3.Let G X be a P(Θ)-valued stochastic process on X and defined on (Ω, F, P) given by {G x : x ∈ X }.

Suppose the following conditions hold:
1.There exists a sequence {{V i,x : x ∈ X }} i∈N of separable i.i.d.processes, with a law characterized by a finite-dimensional parameter Ψ V , and with marginal distribution BETA(1, α x ) for some α x ≥ 0 for any x ∈ X .
3. There exists a null set N ⊂ Ω such that for every x ∈ X , B ∈ B(Θ) and where the sequence {{θ i,x : x ∈ X }} i∈N is given by (2).
Then, the process G X is called a single-atoms dependent Dirichlet process (θDDP) with parameters (Ψ V , G 0 ) and denoted as A single-atoms DDP can be easier to construct in situations where the structure of Θ is complex.As matter of fact, the construction of a stochastic process {θ x : x ∈ X } can be difficult for general spaces X and Θ, particularly when it needs to satisfy additional properties, such as a.s.continuity of its sample paths.Some specific constructions are available in particular cases of interest.For example, if Θ is Kendall's planar shape space [see, e.g.Kendall, 1984], diffusion processes have been proposed under two different approaches: (i) directly on the landmarks, on the space of configurations, which is referred to as Euclidean diffusion of shape [see, e.g., Kendall, 1977, 1988, 1990, Le, 1991], and (ii) directly on Θ, via infinitesimal generators [see, e.g., Le, 1994, Kendall, 1998, Ball et al., 2008, Golalizadeh, 2010] and the solution of partial differential equations [see, e.g., Hsu, 2002].
Although neither the DDP nor its variants require the continuity of the sample paths of the processes {{V i,x : x ∈ X }} i∈N or {{θ i,x : x ∈ X }} i∈N .However, this additional condition endows the DDP and its variants of some desirable properties that are the focus of this work.Definition 3.4.Let G X be a DDP or any of its variants.4 Properties of dependent Dirichlet processes

Continuity
The continuity of the sample paths of a process is an important property that also plays a critical role in statistical applications.On one hand, it allows us to determine suitable topologies for the spaces containing the sample paths.On the other, it ensures that the process will be able to borrow strength across sparse data sources regarding the predictors.
In fact, continuity eliminates the need for replicates of the responses at every value of the predictors to obtain adequate estimates of the predictor-dependent probability distributions [see, e.g., Barrientos et al., 2017, Wehrhahn et al., 2022].
For the DDP and its variants, the sample paths are functions from X into P(Θ) and their continuity depends on the topologies on these spaces.Although we always assume X is endowed with its metric topology, there are several standard choices for the topology on P(Θ).Although we mostly focus on the weak topology, we will also study the effect of considering the strong (or weak-*) and uniform (or norm, or total variation) topologies on P(Θ).
For the weak topology on P(Θ) we denote C W (X , P(Θ)) the space of weakly continuous functions from X into P(Θ).
These are the functions P : X → P(Θ) such that for any f ∈ C b (Θ) the function is continuous on X .The following theorem shows that when the underlying stochastic processes have a.s.continuous sample paths, the DDP and its variants have a.s.weakly continuous sample paths.We defer its proof to Appendix A.1.1.Theorem 4.1.Let G X be a P(Θ)-valued process.Suppose that G X is a continuous parameter DDP, a continuous parameter wDDP or a continuous parameter θDDP.Then for a.e.ω ∈ Ω Consequently, to construct a DDP or any of its variants with a.s.weakly continuous sample paths it suffices to construct suitable continuous processes on X .As discussed earlier, a process {V x : x ∈ X } with the desired properties can be constructed from a real-valued base process {Z x : x ∈ X } on X with a.s.continuous sample paths which can be, for instance, a suitable Gaussian process.
To our knowledge, there is no similar, standard way to construct a process {θ x : x ∈ Θ} with the desired properties for general X and Θ.When X = R d there are well-known sufficient conditions that ensure the there exists a modification of a Θ-valued process {θ x : x ∈ X } with a.s.continuous sample paths [Kallenberg, 1997, Theorem 2.23]: this modification exists if there exists exponents α, γ > 0 and a constant C > 0 such that where d Θ is a complete metric on Θ.This result can be applied to a Polish space X that is homeomorphic to R d for some d.
When Θ is a Riemannian manifold, processes with a.s.sample paths can be defined through diffusion processes [see, e.g., Hsu, 2002].When Θ is also a quotient space Θ = A/G for a locally compact space A and a group G, then a Θ-valued process with a.s.continuous sample paths can be constructed as follows.Let Q : A → Θ be the canonical quotient map and let A : X × Ω → A be a process with a.s.continuous sample paths.Since the canonical quotient map is continuous, the process {Q(A(x)) : x ∈ X } has a.s.continuous sample paths.
As mentioned earlier, the variants of the DDP should not be thought only as simplifications of the DDP but also as processes with distinct properties.Endowing P(Θ) with the strong topology already allows us to distinguish the properties of these processes.Let C S (X , P(Θ)) be the vector space of strongly continuous functions from X into P(Θ).These are the functions P : X → P(Θ) such that for any f ∈ L ∞ (Θ) the function is continuous on X .The following theorem shows that, under the same hypothesis of Theorem 4.1, the θDDP has a.s.strongly continuous sample paths.Although this is really a corollary of Theorem 4.4, we present this statement independently for clarity.Theorem 4.2.Let G X be a P(Θ)-valued process.Suppose G X is a continuous parameter θDDP with A natural question is whether the DDP or the wDDP can have a.s.strongly continuous sample paths under similar assumptions.To our knowledge, this cannot be the case unless substantially stronger conditions are imposed on these processes.We defer the proof of the following theorem to Appendix A.1.2.Theorem 4.3.Let G X be a P(Θ)-valued process.Suppose that G X is a continuous parameter DDP or a continuous parameter wDDP.Let x 0 ∈ Ω.If for a.e.ω ∈ Ω we have then for a.e.ω ∈ Ω there exists an open neighborhood U ω ⊂ X of x 0 and at least one i ω ∈ N such that θ ω i ω is constant on U ω .
Since for the DDP and wDDP the sequence of processes {{θ i,x : x ∈ X }} i∈N is independent and identically distributed, the above implies that when the process has a.s.strongly continuous paths at x 0 the process {θ 1,x : x ∈ X } must have a.s.constant sample paths near x 0 .Although this suggests that the main issue is the behavior of the atoms themselves, the proof shows that the main issue is the independence between the processes {{V i,x : x ∈ X }} i∈N and {{θ i,x : x ∈ X }} i∈N .We conjecture that for the DDP and wDDP to have a.s.strongly continuous paths, it is necessary to introduce dependence between these processes.
Since the DDP and wDDP do not have a.s.strongly continuous paths, it is clear they will not have a.s.continuous paths with respect to stronger topologies on P(Θ).However, for the θDDP we can strengthen the topology on P(Θ) while preserving this property.Consider the uniform topology on P(Θ) and denote as C U (X , P(Θ)) the set of uniformly continuous functions from X into P(Θ).The total variation norm for any signed finite measure Q on (Θ, B(Θ)) is defined as Then the elements of C U (X , P(Θ)) are the functions P : X → P(Θ) such that for any x 0 ∈ X By choosing indicator functions, it is clear the above is equivalent to which is an expression that is typically more interpretable in statistical applications.For the uniform topology we can show that, under the same assumptions of Theorem 4.2, the θDDP has a.s.uniformly (or norm) continuous sample paths.This is also known as continuity in total variation.Its proof is deferrered to Appendix A.1.3.Theorem 4.4.Let G X be a continuous parameter θDDP.Then, for a.e.ω ∈ Ω,

Support
The sample paths of the DDP and its variants are elements of suitable spaces of functions from X into P(Θ).It is of interest to characterize the size, in a suitable sense, of the set containing the sample paths.This leads us to the concept of support.In applications in statistics, a large support is an important and basic property that any BNP model should possess.In fact, it is a minimum requirement, and almost a "necessary" property, for a BNP model to be considered "nonparametric."This property is also important because it typically is a necessary condition for the consistency of the posterior distribution.In such settings, the full support of the prior implies that the prior probability model is flexible enough to generate sample paths sufficiently close to any element of the parameter space.
Given a topology T on P(Θ) X the support of a process is the smallest closed set, in the sense of set inclusion, such that the probability it contains a sample path is equal to one.We say it has full support, or that the support is full, if it is equal to P(Θ) X .When the support is not full, its complement is a non-empty open set.In particular, it contains a point with a neighborhood that is disjoint from the support for which the probability of containing a sample path is zero.Consequently, to prove that a process has full support with respect to T it suffices to show that the probability that any element of a neighborhood basis contains a sample path is positive.
We characterize the support of the DDP and its variant for common choices of T starting from the weakest.We consider first the (weak) product topology, or pointwise topology Kelley [1975], on P(Θ) X .For reasons that shall be clear soon, we call it the product-weak topology.In this topology, a neighborhood basis at P 0 ∈ P(Θ) X is given by sets of the form for ε 1,1 , . . ., ε n,n > 0, x 1 , . . ., x n ∈ X and f 1,1 , . . ., f n,n ∈ C b (Θ).The following theorem shows the DDP and its variants have full support with respect to this topology.We defer its proof to Appendix A.2.1.Theorem 4.5.Let G X be a P(Θ)-valued process on X .Suppose that one of the following assertions holds.
1. G X ∼ DDP(Ψ V , Ψ Θ ), for any x 1 , . . ., x n ∈ X the law of the random vector has full support on Θ n , and the law of the random vector 2. G X ∼ wDDP(α, Ψ Θ ), for any x 1 , . . ., x n ∈ X the law of the random vector has full support on Θ and the law of the random variable V 1 has full support on [0, 1].
3. G X ∼ θDDP(Ψ V , G 0 ), G 0 has full support on Θ, and or any x 1 , . . ., x n ∈ X the law of the random vector has full support on [0, 1] n . Then In consequence, the process has full support on P(Θ) X endowed with the product-weak topology.
The product-weak topology is often too coarse in statistical applications.The topology we consider next is the compactopen topology on P(Θ) X Kelley [1975].In this topology, a neighborhood basis at P 0 ∈ P(Θ) X is given by sets of the form for As this topology is stronger, it is unlikely the DDP and its variants will still have full support on P(Θ) X .For this reason, we determine whether the support contains a subset of P(Θ) X of functions of interest.For the weak topology on P(Θ), we consider the weakly continuous functions from X into P(Θ).
If the support of a process does not contain C W (X , P(Θ)) then there is at least one P 0 ∈ C W (X , P(Θ)) in the complement of the support.Since this set is open, it contains at least one set of the form (6). The following result shows that, under mild conditions, the support of both the DDP and θDDP contains C W (X , P(Θ)).We defer its proof to Appendix A.2.2.
). Suppose the following conditions hold: 1.The processes {{V i,x : x ∈ X }} i∈N have a.s.continuous sample paths.
2. For any ε > 0, continuous function h : X → [0, 1] and K ⊂ X compact we have Then, for any P 0 ∈ C W (X , P(Θ)) we have In consequence, the support of the process in P(Θ) X endowed with the compact-weak topology contains C W (X , P(Θ)).
To construct a process {V x : x ∈ X } satisfying (7) we use the same construction outlined in Section 3. Let {Z x : x ∈ X } be a Gaussian process with mean function µ and covariance kernel σ and a.s.continuous sample paths.We define at any given x ∈ X the functions Observe that x has a.s.continuous sample paths and BETA(1, α x ) marginal distributions.Let K ⊂ X be compact, and let h : K → (0, 1) be continuous.Then where Therefore, it suffices to choose a process Z x for which the event for any continuous function h : K → R. Note the reproducing kernel Hilbert space (RKHS) associated to the covariance kernel spans the space of all smooth functions if τ > 0 is allowed to vary freely [Choudhuri et al., 2007].
It is natural to characterize the support of the DDP or its variants in stronger topologies.One such topology arises when we endow P(Θ) with the strong topology.In this topology, a neighborhood basis at P 0 ∈ P(Θ) is given by sets of the form . Hence, we consider the (weak) product topology on P(Θ) X when P(Θ) is endowed with the strong topology.We call this the product-strong topology.In this topology, a neighborhood basis at P 0 ∈ P(Θ) X is given by sets of the form . By choosing simple functions, it becomes clear that the sets open, and B 1 , . . ., B n ∈ B(Θ) also form a neighborhood basis at P 0 .
Theorem 4.2 suggests that neither the DDP not its variants will have full support on the product-strong topology.However, we are still able to characterize some key features of their support.We first introduce the following technical definition.Definition 4.1.Let G X be a P(Θ)-valued random process on X .
Therefore, we can associate to a DDP or to any of its variants a specific set of functions from X into P(Θ).The following theorem shows that, in fact, the support of DDP and its variants contain this set.We defer the proof of the theorem to Appendix A.2.3 Theorem 4.7.Let G X be a P(Θ)-valued process on X .The following assertions are true.
1.If G X ∼ DDP(Ψ V , Ψ Θ ) and for any x 1 , . . ., x n ∈ X the law of the random vector has full support on Θ n and the law of the random vector 2. If G X ∼ wDDP(α, Ψ Θ ) and for any x 1 , . . ., x n ∈ X the law of the random vector ), G 0 has full support on Θ, and for any x 1 , . . ., x n ∈ X the law of the random vector has full support on [0, 1] n then the support of G X in the product-strong topology contains P(Θ) X | G X .
When P(Θ) is endowed with the strong topology, we can consider the associated compact-open topology on P(Θ) X .In this topology, the neighborhood basis at any P 0 ∈ P(Θ) X have the form (6) for f 1 , . . ., f n ∈ L ∞ (Θ).
In this case, the functions of interest are strongly continuous functions from X into P(Θ).In contrast to Theorem 4.6 we cannot show that the support of the DDP not its variants contains this set.However, we can show the support contains the intersection between C S (X , P(Θ)) and the surrogate functions associated to a DDP or θDDP.We defer the proof of the following result to Appendix A.2.4.
). Suppose the following conditions hold: 1.The processes {{V i,x : x ∈ X }} i∈N have a.s.continuous sample paths.
2. For any ε > 0, continuous function h : X → [0, 1] and K ⊂ X compact we have x G 0 for every x ∈ X and that for any A ∈ B(Θ) and K ⊂ X compact we have Then C S (X , P(Θ)) ∩ P(Θ) X | G X is in the support of G X with respect to the compact-strong topology.

Association structure
In statistical applications, it is of interest to study the behavior of the process {G x (B) : x ∈ X } for some fixed ) the hypothesis of Theorem 4.2 ensure the process {G x (B) : x ∈ X } has a.s.continuous sample paths.As a consequence, for any d ∈ N and f : s. continuous sample paths.Furthermore, this holds for its expectation Some functions of this form that are of statistical interest are the measures of association.For instance, the Pearson correlation coefficient is given by It is clear that it is continuous whenever the denominator is non-zero.Continuity implies On the other hand, if lim then it follows that lim Since the DDP and wDDP may not have a.s.strongly continuous paths, the above argument does not hold and, with positive probability, the process {G x (B) : x ∈ X } may have discontinuous sample paths.In this case, a measure of association can act as a surrogate to study the regularity of this process, on average, at any point.The following theorem states that, under mild conditions, any function of the form (8) is continuous.Its proof is given in Appendix A.3.1.
Theorem 4.9.Let G X be a P(Θ)-valued process on X and let B ∈ P(Θ).Suppose that one of the following assertions holds.
1. G X is a continuous sample paths DDP with G X ∼ DDP(Ψ V , Ψ Θ ).
Furthermore, suppose that for any d ∈ N the function is continuous.

Mixtures induced by dependent Dirichlet processes
From now on, we let Y be a Polish space, and we let ν Y be a base measure on (Y, B(Y)).To allow for flexible statistical models, we also consider a Polish space Γ representing the parameters of the induced mixture.We always assume Γ is endowed with the Borel σ-algebra B(Γ).The dependent Dirichlet processes mixture models that we study are constructed from a fixed measurable function ψ : The mixture induced by P ∈ P(Θ) X is the map M P : Γ × X → P(Y) formally defined as In particular, a dependent Dirichlet processes mixture model is a map By construction, the measure M P γ,x is absolutely continuous with respect to ν Y for any (γ, x) ∈ Γ × X .For this reason, we distinguish the set D(Y) ⊂ P(Y) of probability measures on Y that admit a density with respect to ν Y .We often use the identification In particular, for the dependent Dirichlet processes mixture models we study we have an explicit for their density.For this reason, for P ∈ P(Y) we define the function ρ representing the density of Q P γ,x with respect to ν Y .We sometimes write ρ P γ,x (y) := ρ P (y, γ, x).Hence, the dependent Dirichlet processes mixture model M induces a map Depending on the choice of ν Y and ψ the dependent Dirichlet processes mixture model may have regularizing properties and the density ρ P may be, for instance, continuous.The following lemma shows that, under mild regularity and decay assumptions on ψ, we can characterize points of continuity of ρ P when P : X → P(Θ) is weakly continuous.The proof of the following result is deferred to Appendix A.4.1.Lemma 5.1.Let P : X → P(Θ) be weakly continuous, and suppose that ψ is continuous.
The hypotheses imply that near y 0 and γ 0 the function ψ tends to zero "at infinity" in θ.To gain insight into the consequences of these assumptions, we consider the following example.Let Y = [0, 1] be endowed with the standard topology, and let ν Y be the Lebesgue measure restricted to [0, 1].Let be endowed with the standard subspace topology, and let Γ = ∅.If we consider the function associated to a family of BETA(α, β) probability distributions on [0, 1], then the induced mixture model would not satisfy the properties of the lemma.In fact, if y 0 = 1/2 we can choose α = β = t to see that, from Stirling's approximation, for t 1.Hence, ψ does not decay over Θ near y 0 .This can be mitigated by restricting the values of both α and β to a compact set.A middle ground can be achieved if, for example, one parameter is constrained to a compact set, whereas the other becomes a parameter of the induced mixture.For example, In this case, the resulting induced mixture model satisfies the desired properties.Finally, note that failure to satisfy this condition is not always due to a lack of compactness.For instance, we could consider the model In this case, not only ψ is discontinuous, we also have for any choice of α, β.
Due to the continuity properties of a DPP and its variants, the conclusions of Lemma 5.1 follow from milder hypotheses.In fact, in this case the same conclusion follows by only imposing boundedness.We defer the proof of this result to Appendix A.4.2 Lemma 5.2.Let G X be a P(Θ)-valued process on X and let ψ : Y × Γ × Θ → R + .Suppose one of following conditions hold: 6 Properties of dependent Dirichlet processes mixture models

Continuity
Mixture models have a regularizing effect.Under the same assumptions of Lemma 5.2 the dependent Dirichlet processes mixture model M maps weakly continuous into uniformly continuous functions from X into P(Θ).We defer the proof of this result to Appendix A.5.1.Theorem 6.1.Suppose that ν Y is locally finite and that ψ is continuous.Then, for every P ∈ C W (X , P(Θ)) the induced mixture M (P ) is uniformly continuous, i.e., lim for any γ 0 ∈ Γ and x 0 ∈ X .
As a consequence of this result, the induced mixture of a continuous parameter DDP or its variants has uniformly continuous sample paths.Corollary 6.1.Let G X be a P(Θ)-valued process on X and let ψ : Y × Γ × Θ → R + .Suppose G X is a continuous parameter DDP, a continuous parameter wDDP or a continuous parameter θDDP.Then for a.e.ω ∈ Ω the map M (G ω ) is uniformly continuous, i.e., lim for any γ 0 ∈ Γ and x 0 ∈ X .

Support
As in the case of a DDP or any of its variants, it is of interest to determine the effect that an induced mixture has on the support.As for the induced mixture models that we study the probability measures on Y admit a density with respect to ν Y , we may interpret the sample paths of the induced mixture as elements of D(Y) Γ×X .This allows us to consider other topologies defined in terms of the density of the induced mixture model.
On D(Y) we consider the topology induced by the Hellinger distance and by the Kullback-Leibler (KL) divergence for p 1 , p 2 ∈ D(Y).

The Hellinger distance
We define the product-Hellinger topology on D(Y) Γ×X as follows.In this topology, a neighborhood basis at P 0 ∈ D(Y) Γ×X is given by sets of the form ) < ε i for some ε 1 , . . ., ε n > 0, γ 1 , . . ., γ n ∈ Γ and x 1 , . . ., x n ∈ X .The following result shows that any neighborhood of the image of P ∈ P(Θ) X under the induced mixture on the product-Hellinger topology contains, with positive probability, the image of a sample path of a DDP or its variants under the same induced mixture.We defer the proof of the following result to Appendix A.6.1.Theorem 6.2.Suppose that ν Y is locally finite, that ψ is continuous and satisfies (9) for any (y, γ) ∈ Y × Γ, and that the hypotheses of Theorem 5 in Part I hold.Then, for any P 0 ∈ P(Θ) X the event A stronger topology induced by the Hellinger distance is what we call the compact-Hellinger topology on D(Y) Γ×X .In this topology, a neighborhood basis at P 0 ∈ D(Y) Γ×X is given by sets of the form for some ε > 0, K Γ ⊂ Γ compact, and K X ⊂ X compact.Note these neighborhoods include sets of the form for ε 1 , . . ., ε n > 0 and γ 1 , . . ., γ n ∈ Γ.The following result shows that any neighborhood of the image of P ∈ P(Θ) X under the induced mixture on the product-Hellinger topology also contains, with positive probability, the image of a sample path of the DDP or its variants under the same induced mixture.We defer the proof of the following result to Appendix A.6.1.
Theorem 6.3.Suppose that ν Y is locally finite, that ψ is continuous and satisfies (9) for any (y, γ) ∈ Y × Γ, and that the hypotheses of Theorem 7 in Part I hold.Then, for any P 0 ∈ P(Θ) X the event ω ∈ Ω : sup has positive probability for any ε > 0, and compact K Γ ⊂ Γ and K X ⊂ X .

The L ∞ distance
We define the product-L ∞ topology on D(Y) Γ×X as follows.In this topology, a neighborhood basis at P 0 ∈ D(Y) Γ×X is given by sets of the form ) < ε i for some ε 1 , . . ., ε n > 0, γ 1 , . . ., γ n ∈ Γ and x 1 , . . ., x n ∈ X .Similarly to the Hellinger distance, any neighborhood of the image of P ∈ P(Θ) X under the induced mixture on the product-L ∞ topology contains, with positive probability, the image of a sample path of a DDP or its variants under the same induced mixture.However, we require the additional hypothesis of compactness of Y.We defer the proof of the following result to Appendix A.6.2.Theorem 6.4.Suppose that Y is compact, that ψ is continuous and satisfies (9) for any (y, γ) ∈ Y × Γ, and that the hypotheses of Theorem 5 in Part I hold.Then, for any P 0 ∈ P(Θ) X the event The stronger compact-L ∞ topology on D(Y) Γ×X can be defined similarly as for the Hellinger distance.In this topology, a neighborhood basis at P 0 ∈ D(Y) Γ×X is given by sets of the form for some ε > 0, K Γ ⊂ Γ compact, and K X ⊂ X compact.Note these neighborhoods include sets of the form for ε 1 , . . ., ε n > 0 and γ 1 , . . ., γ n ∈ Γ.The following result shows that any neighborhood of the image of P ∈ P(Θ) X under the induced mixture on the product-L ∞ topology also contains, with positive probability, the image of a sample path of a DDP or its variants under the same induced mixture.In this case, we also assume Y is compact.We defer the proof of the following result to Appendix A.6.2.Theorem 6.5.Suppose that Y is compact, that ψ is continuous and satisfies (9) for any (y, γ) ∈ Y × Γ, and that the hypotheses of Theorem 7 in Part I hold.Then, for any P 0 ∈ P(Θ) X the event ω ∈ Ω : sup has positive probability for any K Γ ⊂ Γ compact, K X ⊂ X compact, and ε > 0.

The Kullback-Leibler divergence
The KL-divergence defines a premetric on D(Y) that induces a locally convex topology on D(Y).This topology depends on which argument is used to define the neighborhood.Due to its connection to the consistency of Bayesian procedures, we consider the neighborhood basis at P 0 ∈ D(Y) given by sets of the form for ε > 0. The product-KL topology on D(Y) Γ×X is defined as follows.A neighborhood basis for P 0 ∈ D(Y) Γ×X on the product-KL topology is given by sets of the form ) < ε i for some ε 1 , . . ., ε n > 0, γ 1 , . . ., γ n ∈ Γ and x 1 , . . ., x n ∈ X .In this case, we obtain a result similar to that obtained for the Hellinger distance.In this case, we also assume Y is compact.We defer the proof of the following result to Appendix A.6.3.Theorem 6.6.Suppose that Y is compact, and that ψ is continuous, strictly positive, and satisfies (9) for any (y, γ) ∈ Y × Γ.Furthermore, suppose the hypotheses of Theorem 5 in Part I hold.Then, for any P 0 ∈ P(Θ) X the event ω ∈ Ω : KL(p 0 γ i ,xi q G ω γ i ,xi ) < ε i has positive probability for any ε 1 , . . ., ε n > 0, γ 1 , . . ., γ n ∈ Γ and x 1 , . . ., x n ∈ X .
The stronger compact-KL topology on D(Y) Γ×X can be defined similarly as for the Hellinger and L ∞ distances.In this topology, a neighborhood basis at P 0 ∈ D(Y) Γ×X is given by for some ε > 0, K Γ ⊂ Γ compact, and K X ⊂ X compact.Note these neighborhoods include sets of the form for ε 1 , . . ., ε n > 0 and γ 1 , . . ., γ n ∈ Γ.The following result shows that any neighborhood of the image of P ∈ P(Θ) X under the induced mixture on the product-L ∞ topology also contains, with positive probability, the image of a sample path of a DDP or its variants under the same induced mixture.We defer the proof of the following result to Appendix A.6.3.Theorem 6.7.Suppose that Y is compact, and that ψ is continuous, strictly positive, and satisfies (9) for any (y, γ) ∈ Y × Γ.Furthermore, suppose the hypotheses of Theorem 7 in Part I hold.Then, for any P 0 ∈ P(Θ) X the event has positive probability for any ε > 0.

Association structure
As a consequence of Theorem 6.1, when ψ is continuous and the DDP or any of its variants have a.s.weakly continuous sample paths, then the induced mixture will have a.s.strongly continuous sample paths.Therefore, the process s. continuous sample paths.This also holds for its expectation In some applications, it is useful to consider the parameter γ as random.Let (γ 1 , . . ., γ d ) be a random vector defined on (Ω, F, P).Then the process has a.s.continuous sample paths.By the same arguments as before, where P Γ is the probability law of (γ 1 , . . ., γ d ), is continuous.

Posterior consistency
An important property of dependent Dirichlet processes mixture model is their posterior consistency.To study the asymptotic behavior of dependent Dirichlet processes mixture models, we consider a random sample of size n given by pairs As is common in regression settings, we assume that x 1 , . . ., x n contain only exogenous covariates.The exogeneity assumption allows us to focus on the problem of conditional density estimation, regardless of the mechanism generating the predictors, that is, if they are randomly generated or fixed by design [see, e.g., Barndorff-Nielsen, 1973, 1978, Florens et al., 1990].Let P 0 be the true probability measure generating the predictors admitting a density p 0 with respect to a measure ν X .By the exogeneity assumption, the true probability model for the response variable and predictors takes the form m 0 (y, x) = p 0 (x)q 0 (y | x).In this case, both p 0 and {q 0 x : x ∈ X } are in free variation, where q 0 x (y) := q 0 (y | x) denoting a conditional density defined on Y for every x ∈ X .Let m ω (y, x) := p 0 (x)g ω x (y) be the random joint distribution for the response and predictors arising when {g x : x ∈ X } is an induced mixture model induced by a θDDP.Since the KL divergence between m 0 and the implied joint distribution m ω can be bounded as when x contains only continuous predictors, it follows that, under the assumptions of Theorem 6.7 when X is compact, for every ε > 0 Thus, by Schwartz's theorem [Schwartz, 1965], it follows that the posterior distribution associated with the random joint distribution induced by any of the proposed models is weakly consistent, that is, the posterior measure of any weak neighborhood, of any joint distribution of the form m 0 (y, x) = p 0 (x)q 0 (y | x), converges to 1 as the sample size goes to infinity.This result is summarized in the following theorem.Theorem 6.8.Suppose that assumptions of Theorem 6.7 hold.Then, the posterior distribution m ω (y, x) = p 0 (x)f (y|x, G ω x ) associated with the random joint distribution induced by the process G X where p 0 is the density generating the predictors and is weakly consistent, under independent sampling, at any joint distribution of the form m 0 (y, x) = p 0 (x)f 0 (y | x) Although Theorem 6.8 assumes that x contains only continuous predictors, a similar result can be obtained when x contains only predictors with finite support (e.g., categorical, ordinal and discrete predictors) or mixed continuous and predictors with finite support.The theorem can be also extended for more general mixture models induced by the DDP.Theorem 6.9.Suppose that assumptions in Theorem 6.7 hold and that X is compact.Then, the posterior distribution associated with the random joint distribution ) where p 0 is the density generating the predictors and is weakly consistent, under independent sampling, at any joint distribution of the form where P 0 ∈ C S (X , ν Y ) and γ 0 ∈ Γ.
We have defined a DDP for general Polish spaces and introduced some parsimonious variants that may be desirable on specific applications.Furthermore, we provided sufficient conditions for different versions of DDP defined on general Polish spaces to have appealing prior theoretical properties regarding the continuity of their sample paths under different topologies, and the continuity of their autocovariance function and other, more general measures of association.
These properties are of practical importance because they ensure that different versions of the model can combine and borrow strength across sparse data sources regarding the predictors and, therefore, avoid the need of replicates of the responses for every value of the predictors to obtain adequate estimates of the predictor-dependent probability distributions.
Furthermore, we studied induced mixture models arising from a DDP or any of its variants.We provided sufficient conditions that ensure the dependent Dirichlet processes mixture model has a continuous density.

A Proof of the main results
A.1 The continuity of the DDP and its variants A.1.1 Proof of Theorem 4.1 By hypothesis, there is a set Ω 0 ⊂ Ω of full measure such that for any ω ∈ Ω 0 we have: (i) i and θ ω i are continuous on X in the case of a DDP, θ ω i is continuous on X in the case of a wDDP, and π ω i is continuous on X in the case of a θDDP.
First, we prove the statement in the case of a DDP process.Fix ω ∈ Ω 0 and x 0 ∈ X .Let f ∈ C b (Θ) and suppose, without loss, that f C ≤ 1. Fix ε > 0 and let N ε ∈ N be such that Define the set U ε as Since π ω i and f • θ ω i are continuous on Θ we conclude U ε is an open neighborhood of x 0 .Observe that for any x ∈ U ε we have Let x ∈ U ε and consider the decomposition The first sum on the right-hand side can be bounded as whereas the second sum can be bounded as Consequently, Remark that in the case of a wDDP or the case of a θDDP the previous arguments can be adapted with minor modifications to prove the claim.We omit the details for brevity.

A.1.2 Proof of Theorem 4.3
By hypothesis, there is a set Ω 0 ⊂ Ω of full measure such that for any ω ∈ Ω 0 we have: (i) i and θ ω i are continuous on X in the case of a DDP, θ ω i is continuous on X in the case of a wDDP, and π ω i is continuous on X in the case of a θDDP.
First, we prove the statement in the case of a DDP process.Fix ω ∈ Ω 0 and let ε > 0. Let N ε be such that x (θ) if continuous at x 0 by hypothesis.Hence, the set It is clear that at least one θ ω i,x belongs to B. Note B is a finite set.Hence, by continuity, such θ ω i,x must be constant on U ε .This proves the claim in the case of a DDP.For the wDDP the previous arguments can be adapted with minor modifications to prove the claim.We omit the details for brevity.

A.1.3 Proof of Theorem 4.4
In the case of a θDDP, for every ω ∈ Ω we have sup Let Ω 0 ⊂ Ω be a set of full measure such that ω ∈ Ω 0 implies V ω i is continuous for every i ∈ N. Fix ω ∈ Ω 0 .Then π ω i is continuous for every i ∈ N. We follow a similar argument as in the proof of Theorem 4.1 in Appendix A.1.1.Let ε > 0 and let N ε ∈ N be such that Define the open neighborhood U ε of x 0 as For any x ∈ U ε we have Consequently, for any x ∈ U ε we have from where the theorem follows.

A.2 The support of the DDP and its variants
We first prove the following auxiliary result.Lemma A.1.Let P : X → P(Θ) and f 1 , . . ., f n ∈ L ∞ (Θ).Define Let K ⊂ X be compact.If F 1 , . . ., F n : K → R are continuous, then for any ε > 0 there exists P : K → P(Θ) X such that: 1.For any x ∈ K we have 3. The collection { Px : x ∈ K} is tight.
Proof of Lemma A.1.Let ε > 0. We begin by constructing a suitable partition of unity in K. Since F 1 , . . ., F n are uniformly continuous on K there exists δ > 0 such that From the open cover {B(x, r)} x∈K we can extract the finite subcover {B(x k , r)} N K k=1 .We construct a continuous partition of the unity on K surrogate to this cover as follows.Define the continuous functions for k ∈ {1, . . ., N K }.Note that x / ∈ B(x k , r) implies φk (x) = 0.It can be verified that for any x ∈ K. Therefore, we define the continuous functions which satisfy and Px := These functions satisfy To prove (a) note that Fi satisfies for any x ∈ K. To prove (b) note that for any B ∈ B(Θ) we have which is continuous by construction.To prove it is Lipschitz, note that Finally, to prove (c) note that Θ is Polish.Hence, the collection P x1 , . . ., P x N K is tight.For any ε > 0 there exists However, since ϕ i ≥ 0 we have proving the claim.

A.2.1 Proof of Theorem 4.5
By possibly modifying the sequence ε 1,1 , . . ., ε n,n we can assume, without loss, that f i,j C ≤ 1.Let Ω 0 ⊂ Ω be a set of full measure such that for any ω ∈ Ω 0 we have: (i) π ω i ≥ 0; (ii) i∈N π ω i ≡ 1; (iii) π ω i and θ ω i are continuous on X in the case of a DDP, θ ω i are continuous on X in the case of a wDDP, or π ω i are continuous on X in the case of a θDDP.Let ε > 0 be such that ε < ε 0 := min(ε 1,1 , . . ., ε n,n ) and define the event By inspection, the event is measurable.It suffices to show that this event has positive probability.The first step of the proof is the same regardless of the variant.
Since Θ is Polish, the collection P 0 x1 , . . ., P 0 xn is tight.Let K Θ ⊂ Θ be a compact set such that P 0 xj (Θ \ K Θ ) < ε for j ∈ {1, . . ., n}.Since f 1,1 , . . ., f n,n are uniformly continuous on K Θ , there exists δ > 0 such that Let r < δ/2.Then {B(θ, r)} θ∈KΘ is an open cover of K Θ and we can extract a finite subcover {B(θ k , r)} NΘ k=1 .By possibly removing elements, we may assume no ball is covered by the union of the remanining ones.This subcover induces the partition of K Θ .Remark that no A k is empty and that A k ⊂ B(θ k , r).Therefore, no f i varies by more than 2ε over any A k .Note that and Therefore, Since f 1,1 , . . ., f n,n are continuous, define the open subsets and the event which has positive measure by hypothesis.Note that for any ω ∈ Ω π we have The second step of the proof changes slightly in the case of a DDP, a wDDP, and the θDDP.We present it first in the case of a DDP.For ω ∈ Ω π we have Consider the event which has positive measure by hypothesis as each U k is open.Since the events Ω π and Ω θ are independent, the event Ω π ∩ Ω θ has positive measure.Hence, for any ω ∈ Ω π ∩ Ω θ we have for any i, j ∈ [n].This proves the theorem in the case of a DDP.
To prove the theorem in the case of a θDDP, the key inequality is Note that it suffices to consider the event which has positive measure as the U k are open.The rest of the argument is essentially the same as in the case of a DDP.
The proof in the case of a wDDP follows first the same step as in the other cases.However, the remainder of the proof is different and slighly more involved.Let M ∈ N be such that there are integers m j,k ∈ [M ] such that Note that for this choice, for any j ∈ Define the event which has positive measure by hypothesis.Note that for ω ∈ Ω π we have Hence, for every j we have where I j,k ⊂ [M ] is an arbitrary subset such that |I j,k | = m j,k .Therefore, for every j ∈ [n] we let {I j,k } NΘ k=1 be a collection of disjoint sets [M ] for which |I j,k | = m j,k and we let I 0 j be the complement of their union.Note that Consequently, in the case of a wDDP we obtain the inequality Hence, consider the event which has positive measure by hypothesis.By independence, Ω π ∩ Ω θ has positive measure, and for ω ∈ Ω π ∩ Ω θ we have Therefore, proving the theorem in the case of a wDDP.

A.2.2 Proof of Theorem 4.6
By possibly modifying the sequence ε 1 , . . ., ε n we can assume, without loss, that f i C ≤ 1.Let Ω 0 ⊂ Ω be a set of full measure such that for any ω ∈ Ω 0 we have: (i) π ω i and θ ω i are continuous on X in the case of a DDP, or π ω i are continuous on X in the case of a θDDP.For simplicity, we write for ω ∈ Ω By Theorem 4.1, F ω i is continuous on X for almost every ω ∈ Ω.By possibly removing from Ω 0 a null set, we may assume F ω i is continuous for every ω ∈ Ω 0 and every i ∈ [n].Furthermore, we write which, by hypothesis, is a continuous function on X .Since Since Θ is separable, a compact K ⊂ Θ is also separable.By hypothesis F i is continuous and by construction F ω is continuous for every ω ∈ Ω 0 .Hence, the event is measurable, and the theorem follows if we show that this event has positive measure.Let ε > 0 with ε < ε 0 .Since K is compact, by Lemma A.1 there is a function P : Hence, on K. Finally, since f 1 , . . ., f n are continuous, we can use the same construction as that in the proof of Theorem 4.5 in Appendix A.2.1 to obtain a partition {A k } NΘ k=1 of K Θ as in (10) where each A k is measurable, non-empty, and every f i varies at most by 2ε over any A k .Then Since x → Px (A k ) is continuous by Lemma A.1, with values on [0, 1], and 2, and has positive measure by hypothesis.The proof now proceeds exactly the same as the proof of Theorem 4.5 in Appendix A.2.1 for the DDP and θDDP.We omit the details for brevity.

A.2.3 Proof of Theorem 4.7
Let Ω 0 ⊂ Ω be a set of full measure such that for any ω ∈ Ω 0 we have: (i) are continuous on X for the DDP, θ ω i are continuous on X in the case of a wDDP, or π ω i is continuous on X in the case of a θDDP.
We prove in detail the statement in the case of a DDP.If P 0 ∈ P(Θ) X | G X then we consider the event ω ∈ Ω : which is measurable by inspection.A standard argument shows that, by possibly reducing ε 1,1 , . . ., ε n,n , we can assume f i,j are simple functions, and that there exists a partition where I A is the indicator function of the set A and |c i,j,k | ≤ 1 for any i, j ∈ {1, . . ., n} and k ∈ {1, . . ., N f }.
Consider the event which has positive measure by hypothesis.Remark that, in this case, for any j ∈ [n] we have where we used the fact that k=1 is a partition.Hence, Now we use the fact that and no such terms do not contribute to the sum.We can define the event which has positive measure.By independence Ω π ∩ Ω θ has positive measure.Hence, for ω ∈ Ω π ∩ Ω θ we have Therefore, proving the theorem in the case of a DDP.
To prove the theorem in the case of a θDDP, the argument is the same up to the inequality In this case, we define the event Since P 0 ∈ P(Θ) X | G X this event has positive measure.The proof then follows the same steps as those in the case of a DDP.
Finally, in the case of a wDDP we follow a similar argument as that on the proof of Theorem 4.5 in Appendix A.2.1.By choosing a suitable M such that there are integers m j,k ∈ {1, . . ., M } such that we can define the event which has positive measure by hypothesis, and for every j ∈ {1, . . ., n} we can define a collection {I j,k } N f k=1 of disjoint sets {1, . . ., M } for which |I j,k | = m j,k .We let I 0 j be the complement of their union.Note that for ω ∈ Ω π we have Hence, we obtain the inequality By hypothesis, this has positive measure and, by independence, so does Ω π ∩ Ω θ .The proof then follows the same arguments as those in the proof of Theorem 4.5 in Appendix A.2.1 in the case of a wDDP.We omit the details for brevity.

A.2.4 Proof of Theorem 4.8
The proof is similar to the proof of Theorem 4.6 in Appendix A.2.2 with minor modifications.Let Ω 0 ⊂ Ω be a set of full measure such that for any ω ∈ Ω 0 we have: (i) π ω i ≥ 0; (ii) i∈N π ω i ≡ 1; (iii) π ω i and θ ω i are continuous on X in the case of a DDP, or π ω i is continuous on X in the case of a θDDP.Furthermore, by possibly reducing Ω 0 by a null set, we may assume We prove the theorem in detail in the case of a DDP.Suppose C S (X , P(Θ)) ∩ P(Θ) X | G X is non-empty, as otherwise there is nothing to prove.Our goal is to show that for has positive probability.As argued in the proof of Theorem 4.6 in Appendix A.2.2, we can assume there exists a partition {A k } N f k=1 of Θ of measurable sets such that By hypothesis, if G 0 (A k ) = 0 then P 0 x (A k ) = 0 for any x ∈ K. Hence, without loss, we can assume G 0 (A k ) > 0. Note that in this case, we have Consider the event Since P 0 ∈ C S (X , P(Θ)) we see that x → P 0 x (A k ) is continuous whence Ω π has positive measure by hypothesis.Remark that, in this case, for any j ∈ [n] we have where we used the fact that k=1 is a partition.Hence, Since P 0 ∈ P(Θ) X | DDP , the event . ., N f }} has positive measure.The proof then proceeds exactly as the proof of Theorem 4.6 in Appendix A.2.2.We omit the proof in the case of a θDDP for brevity.
A.3 Association structure of the DDP and its variants A.3.1 Proof of Theorem 4.9 We prove the theorem in the case of a DDP as the arguments are the same in the case of a wDDP.To prove the theorem we proceed as follows.We first show that for any d ∈ N is continuous.Then, by leveraging Stone-Weierstrass's theorem Reed and Simon [1980], we can approximate any continuous f uniformly over the hypercube by a polynomial.Since the expectation of this polynomial is continuous by our first claim, the theorem follows.
Define the sequence of functions h n : Since the summands are a.e.non-negative, h n ≤ h n+1 .Furthermore, h n ∈ [0, 1] for any n ∈ N, and for a.e.ω ∈ Ω and x 1 , . . ., x d ∈ X .
We first control the expectation for fixed x 1 , . . ., x d ∈ X .Write h x n (ω) = h n (x 1 , . . ., x d , ω) for simplicity, and define the function g n : X d → R as where in the last equality we used the hypothesis of independence.Since the sequence {h x n } is monotone non-decreasing, by the monotone convergence theorem we have Hence, the sequence {g n } converges pointwise to g ∞ over X d .To conclude g ∞ is continuous, we will show that {g n } is a sequence of continuous functions, to then use a uniform approximation argument to show the continuity of g ∞ near any (x 1 , . . ., x d ) ∈ X d .
To show g n is continuous, note that for a.e.ω the functions x → π ω i,x are continuous for any i ∈ N. Therefore, for any i 1 , . . ., i d ∈ N the function is continuous by hypothesis.
We now show continuity of g ∞ near any (x 1 , . . ., x d ) ∈ X d using a uniform approximation.Fix (x 1 , . . ., x d ) ∈ X d and let ε > 0. Remark that for n > m we have the bound Hence, we can control the differences on the left-hand side by controlling the expectations on the right-hand side.By similar arguments as those used in the proof of Theorem 4.1 in Appendix A.1.1, for any δ < ε/3 we can find N δ ∈ N such that for every k ∈ {1, . . ., d}.We define the open neighborhood Therefore, for n > m > N δ we have In particular, this implies sup Then, for any (x 1 , . . ., x d ) ∈ W ε we have Now, let f : [0, 1] d → R be a continuous function.Since [0, 1] d is compact, f is bounded, and the function is well-defined.We now show it is continuous.Let (x 1 , . . ., x d ) ∈ X d and fix ε > 0. Since [0, 1] d is compact, f can be approximated uniformly by a polynomial p : is a continuous function by our previous result.Furthermore, if we define the open neighborhood proving the claim.
The second term can be controlled using the weak continuity of P .In fact, it suffices to consider the open set For the first term, consider first For the second, note that proving the claim.
A.4.2 Proof of Lemma 5.2 Consider the sequence of functions {q n } n∈N for q n : Y × Γ × X × Ω → R + given by Let Ω 0 ⊂ Ω be a set of full measure such that: (i) π ω i ≥ 0; (ii) i∈N π ω i ≡ 1; and (iii) both π ω i and θ ω i are continuous on X for any i ∈ N. If we restrict the functions to Y × Γ × X × Ω 0 then q n ≥ 0 and {q n } is a monotone non-decreasing sequence.Hence, q ω ∞,x (y) := lim n→∞ q ω n,x (y) is well-defined for a.e.ω.By the monotone convergence theorem q ω ∞,x (y)dν Y (y) ≡ 1.
To prove continuity for a.e.ω, fix ω ∈ Ω 0 .Let y 0 ∈ Y, x 0 ∈ X and γ 0 ∈ Γ.By hypothesis, there exists an open neighborhood U y 0 of y 0 and U γ 0 of γ 0 such that This set is open by continuity of θ ω i .Then, for any (y, γ, x) Therefore, proving the lemma.
A.5 The continuity of mixtures induced by dependent Dirichlet processes A.5.1 Proof of Theorem 6.1 Fix P ∈ C W (X , P(Θ)).To simplify notation, we drop the superscript on M P and ρ P .Let ε > 0. We will prove that there exist a neighborhood of (γ 0 , x 0 ) such that The strategy is to find suitable compact subsets of Y and Θ where the measures M and P x0 are concentrated near (γ 0 , x 0 ).Then, we leverage the continuity of ψ and the weak continuity of P .
First, since Y is Polish there exists K Y ⊂ Y compact such that Second, since Θ is Polish, there exists K Θ ⊂ Θ compact such that We now construct a suitable cover for Similarly, let r y,θ > 0 be 2r y,θ < δ y,θ .Then, for every y ∈ K Y the collection is an open cover of the compact set {y} × {γ 0 } × K Θ .Hence, there exists a finite subcover {B(y, r y,θ ) × B(γ 0 , r y,θ ) × B(θ , r y,θ )} Ny =1 .
Define the open neighborhood of K Furthermore, for any (y , γ , θ ) ∈ W y there exists, by construction, θ ∈ K Θ such that Hence, for some ∈ {1, . . ., N y } we have Furthermore, for the same choice of θ we also have d Y (y , y) < δ y,θ and d Γ (γ , γ 0 ) < r y,θ .Hence, we deduce that . By possibly removing elements, we may assume no ball is covered by the union of the remanining ones.Note that {B(y k , r y k )} N Y k=1 is a cover for K Y with the same property.We partition of K Y into the sets Additionally, we let Finally, consider the open set Then, there exists a continuous function h : Θ → [0, 1] such that h| KΘ ≡ 1 and h| U c Θ ≡ 0. From K Θ and U Θ we obtain the decomposition, ψ R (y, γ, θ) := (1 − h(θ))ψ(y, γ, θ) and ψ 0 (y, γ, θ) = h(θ)ψ(y, γ, θ).
We define ρ R , ρ 0 , M R and M 0 similarly.By construction ψ 0 is supported on U Θ and can be approximated by a continuous function near (γ 0 , x 0 ).Consider the decomposition It suffices to choose as an open neighborhood for x 0 .Hence, we obtain the bound We now show the remaining terms do not concentrate too much mass.By construction, Furthermore, by Fubini's theorem, where we used the fact that 1 − h is supported on Θ \ K Θ .It follows that Finally, from the decomposition This proves the theorem.

A.6 The support of mixtures induced by dependent Dirichlet processes
To characterize the support of induced mixtures in different topologies, we will use repeatedly the following lemma.It provides uniform control on the behavior of induced mixtures uniformly over weakly continuous P : X → P(Θ).Lemma A.2. Suppose ψ is continuous and that for every y 0 ∈ Y, γ 0 ∈ G and ε > 0 there exists The following assertions are true.
We now consider the case for an arbitrary compact set K Y×Γ .From the open cover {B((y, γ), r (y,γ) /2)} (y,γ)∈K Y G we can extract a finite subcover {B((y k , γ k ), r (y k ,γ k ) /2)} N k=1 .Remark the radius of the cover is half of that obtained in the previous step.Define for any P : X → P(Θ) and x ∈ X .Since the supremum over X can be at most ε with ε < ε, this proves the lemma.
Proof of 2. The hypothesis allow us to conclude from Theorem 6.1 that for any weakly continuous P : X → P(Θ) the map Q P : G × X → P(Y) is strongly continuous.Hence, for every (γ, x) ∈ K Γ×X let K γ,x ⊂ Y be a compact set such that Then, let U γ,x ⊂ G × X be an open neighborhood of (γ, x) such that ∀ (γ , x ) ∈ U γ,x : Hence, {U γ,x } (γ,x)∈KΓ×K X is an open cover of K Γ×X from which we can extract a finite subcover {U γ k ,x k } N k=1 .Let Then that for every (γ, x) ∈ U K Γ×X there exists (γ k , x k ) such that Hence, we can choose the compact set proving the claim.
As a consequence of the lemma, if the hypotheses of Theorem 4.1 hold, then for every ω on a set of full measure we have The lemma allows us to use essentially the same argument to characterize the support both in the product and compactopen topologies as follows.The statements in the case of the compact-open topology involve the supremum over a compact set K Γ × K X ⊂ Γ × X and P 0 : X → P(Θ) weakly continuous.The statements in the case of the product topology can be reduced to this as follows.First, in the case of the product topology we need to consider a finite set {(γ i , x i ) : i ∈ [n]}.We will see this is equivalent to bounding a supremum over the compact set or by considering first the compact sets and defining K Γ×X := K Γ × K X .Second, in the case of the product topology we make no assumptions about the continuity of P 0 : X → P(Θ).However, since only its values on the finite set {x i : i ∈ [n]} are relevant, we can leverage Lemma 1 in Part I to replace P 0 by its weakly continuous interpolant P 0 .Once we have performed this reduction, Lemma A.2 will allow us to prove the desired results using analogous arguments for the product and compact-open topologies.
A.6.1 Proof of Theorems 6.2 and 6.3 In the case of the product-Hellinger topology, we define the compact sets K Γ := {γ i : i ∈ [n]} and K X := {x i : i ∈ [n]}.Furthermore, we let K Γ×X = K Γ × K X and we let ε 0 < min{ε 1 , . . ., ε n }.As indicated before, over the finite set K X we can assume without loss that P 0 is weakly continuous.For the compact-Hellinger topology, we define K Γ×X = K Γ × K X and let ε 0 < ε.
Both Theorem 6.2 and 6.3 follow if we show the above event has positive probability.
To prove the theorems, it remains to bound the integral in the right-hand side.
The hypotheses of Theorem 6.2 allow us to apply Theorem 5 in Part I to show the event ω ∈ Ω : has positive probability.This proves Theorem 6.2.
The hypotheses of Theorem 6.3 allow us followsto apply Theorem 7 in Part I to show the event ω ∈ Ω : sup has positive probability.This proves Theorem 6.2.
A.6.2 Proof of Theorems 6.4 and 6.5 In the case of the product-L ∞ topology we can define the compact set K Γ×X ⊂ K Γ × K X as in Appendix A.6.1 and let ε 0 < min{ε 1 , . . ., ε n }.As indicated before, over this finite set we can assume without loss that P 0 is weakly continuous.For the compact-L ∞ topology, we define K Γ×X = K Γ × K X and let ε 0 < ε.
This reduction allows us to consider the event ω ∈ Ω : sup Both Theorem 6.4 and 6.5 and follow is we show the above event has positive probability.Let ε ∈ [0, 1) be such that ε < ε 0 .
Let K Y G = Y × K Γ which, by hypothesis, is compact.From Lemma A.2 there exists δ > 0 such that for any (y , γ ), (y, γ) ∈ K Y G we have has positive probability.This proves Theorem 6.5.
This case is of practical interest in statistical applications.In addition, we provided sufficient conditions under which dependent Dirichlet processes mixture models have large full or large support, considering different topologies, and study the behavior of the posterior distribution under i.i.d.joint sampling of responses and predictors.The study of stronger consistency results and concentration rates is the subject of ongoing research.
Finally, the results provided in this article can be easily extended to more general dependent stick-breaking processes.Científico y Tecnológico (FONDECYT) grant No 1211643 and through grant NCN17_ 059 from Millennium Science Initiative Program, Millennium Nucleus Center for the Discovery of Structures in Complex Data (MIDAS).