Colombian Women’s Life Patterns: A Multivariate Density Regression Approach

. Women in Colombia face diﬃculties related to the patriarchal traits of their societies and well-known conﬂict aﬄicting the country since 1948. In this critical context, our aim is to study the relationship between baseline socio-demographic factors and variables associated to fertility, partnership patterns, and work activity. To best exploit the explanatory structure, we propose a Bayesian multivariate density regression model, which can accommodate mixed responses with censored, constrained, and binary traits. The ﬂexible nature of the models allows for nonlinear regression functions and non-standard features in the errors, such as asymmetry or multi-modality. The model has interpretable covariate-dependent weights constructed through normalization, allowing for combinations of categorical and continuous covariates. Computational diﬃculties for inference are overcome through an adaptive truncation algorithm combining adaptive Me-tropolis-Hastings and sequential Monte Carlo to create a sequence of automatically truncated posterior mixtures. For our study on Colombian women’s life patterns, a variety of quantities are visualised and described, and in particular, our ﬁndings highlight the detrimental impact of family violence on women’s choices and behaviors.


Introduction
Colombian women face difficulties that are quite typical in Latin American countries, particularly related to the patriarchal traits of their society. Nonetheless, the welfare of Colombian women is possibly more critical due to the conflict between state military forces, paramilitaries, and guerrilla groups that has afflicted the country since 1948. As underlined by Gimenez Duarte et al., dramatic subnational inequalities exist in every indicator, especially within low-income, low-education, and rural populations. Reinforcing constraints, such as "limited and gender-unequal economic opportunities, exclusion from quality endowments among marginalized populations, and social norms and gender roles that relegate unpaid care work to women and tolerate violence against them (emotional, physical and sexual), affect young women's choices and actions with respect to life plans and fertility decisions"(Gimenez Duarte et al., 2015, p. 5). In particular, despite significant progress since 2000, teenage pregnancy rates in Colombia are still very high. The majority of teenage pregnancies remain unplanned, signaling a lack of opportunity and agency for young girls. Different studies discuss the detrimental effects of teenage pregnancy (see e.g., Gimenez Duarte et al., 2015;Azevedo et al., 2012) and its socio-demographic drivers, such as poverty, low levels of education, and living in rural areas. In such a critical context, we are interested in studying women's life events, focusing on the interplay between sexual initiation (debut), fertility, partnership, and participation in the labor market. Thus, rather than focusing on a specific life event, as in previous relevant studies (e.g. Gimenez Duarte et al., 2015;Azevedo et al., 2012;Restrepo Martínez et al., 2017), we adopt a broader perspective, considering a collection of events describing transition to adulthood and their relation with a set of structural baseline characteristics of the women's environment and family. Besides some of the well known relevant factors, such as cohort, region, and area (urban or rural) of residence, we also study whether a violent family context contributes to shape transition to adulthood and possibly impairs women's agency.
To this purpose, we analyze data arising from the survey conducted in Colombia in 2010 as a part of the Demographic and Health Survey Program (DHS, https: //www.dhsprogram.com). The data are cross-sectional, thus, no follow-up information on the life events of interest is recorded. Specifically, information is available on the age when the considered focal events (sexual debut, marriage or cohabitation, motherhood) were experienced for the first time, whereas work information concerns only the employment status of the woman (working or not) at the moment of the interview. Thus, we jointly analyze response variables with different levels of measurements (times at event and binary variables). Additionally, the focal events may not have been experienced (right-censoring) and are subject to constraints, e.g. motherhood can only occur after sexual debut. Furthermore, the available set of baseline explanatory variables is limited, so that heterogeneity may be present which would not be properly captured by a parametric model. This encourages the use of a flexible model to best exploit the explanatory structure without imposing possibly penalizing constraints. Additionally, such a model can encompass competing sociological theories which may be relevant in different subpopulations.
The data present various features that challenge existing parametric and semiparametric models (e.g. Korsgaard et al., 2003;Jara et al., 2010;Hanson and Johnson, 2004;Kottas and Gelfand, 2001). First, some women postpone the events to relatively late in life, which induces right-skewed distributions. Also, the joint relationships between the age-at-event variables show different patterns, with gaps of various lengths between events. Moreover, these behaviors change depending on the covariates. Modeling such dependence structure is an ambitious task, requiring a model that allows for i) nonlinear response curves, ii) non-normal distributions whose features may change with the covariates, iii) multivariate response and covariates of mixed nature, and iv) censoring and constraints of the responses. To the best of our knowledge, a model that can simultaneously deal with these issues does not exist.
We propose a Bayesian multivariate density regression model that extends the univariate model of Antoniano-Villalobos et al. (2014) to the case of multiple mixed-type responses with censoring and constraints. This approach is promising for our data, due to its ability to capture their peculiar features. Our infinite mixture model has interpretable covariate-dependent weights constructed through normalization, allowing for combinations of categorical and numerical covariates. In addition, the multivariate approach permits us to study the joint relationship between the response variables, for example by considering one response conditioned on the others. With data on over 10,000 women and a multivariate response and covariate, the Markov chain Monte Carlo (MCMC) algorithm originally proposed for the univariate model becomes unsuitable. We therefore propose an algorithm for posterior inference that extends the adaptive truncation scheme of Griffin (2016).
The paper is structured as follows. Section 2 describes the data. The model and posterior simulation algorithm are presented in Sections 3 and 4, respectively. The results for the data on Colombian women are analyzed in Section 5. Section 6 summarizes and concludes. In addition, the Supplementary Material (SM) includes derivations and details for predictive inference, as well as additional results for both the simulated data example and the case study.

The data
The DHS Program collects and disseminates data on random samples of households selected from random clusters from a national sampling frame 1 . The 2010 survey in Colombia was conducted by the Profamilia association, and we refer to the final report for a detailed description of its features (Ojeda et al., 2011). Since all women of childbearing potential (i.e. aged 13-49) in the same household were interviewed, we randomly select at most one case from each household to avoid unwanted dependencies.
To describe the characteristics of fertility and partnership patterns, we consider the discrete variables recording the ages at Sexual Debut (Z 1 ), at Union (first marriage or cohabitation, Z 2 ), and at First Child (Z 3 ). Work Status is recorded as a binary variable (Z 4 ) indicating whether the respondent worked in the 12 months before the interview. We exclude women who gave inconsistent information, namely, those who report the birth of the first child as preceding the first sexual intercourse, and those who report union with a partner but for whom sexual intercourse never occurred. We also filter out women who experienced sexual violence or were forced to have sex in exchange for money, since their union and childbearing choices may be related to the experienced violence. By the same reasoning, we remove women who were forced to use contraceptive methods. Thus, we attempt to focus as much as possible on life choices and plans rather than on events imposed by circumstances, even if the latter may be unknown and unmeasured, so that the observed events may not necessarily reflect choices.
1 Note that the data arise from a complex survey design, and thus have associated weights with a complicated structure. Moreover, additional post-stratification is carried out to adjust the weights for various factors, such as the total number of women interviewed in each municipality, non-response, etc. (see Ojeda et al., 2011, for full details on the survey design and weighting scheme for the Colombian survey). A discussion on the suitability of accounting for such sampling weights is offered in the SM. We focus on the relationship between the responses and some baseline socio-demographic factors. First, we consider the woman's Age (in years) at the moment of interview. We focus on women aged 15 or more, as most younger women had not yet experienced any event at the time of the survey. Next, we include the type of Area (urban or rural) where the respondent lives, as well as her Region. Following the partition used in the DHS dataset (as well as in the report by Ojeda et al., 2011) we consider the five regions Atlantica, Oriental, Central, Pacifica, and Territorios Nacionales 2 ; the capital Bogota, located in the Oriental region, is treated as a sixth region because of its peculiar features in terms of social and economic development. A map of Colombia and the considered regions is reported in the SM. Since information is only available on the current region of residence and on the age when respondents moved there, we limit attention to women who were raised in the current region at least from the age of 6, to properly account for regional effects. Moreover, to assess the respondent's wellbeing in her original family, we consider whether in her childhood she was disciplined using Physical Punishment (spanking, hitting, pushing, throwing water), and whether she was exposed to Parental Domestic Violence and ever witnessed her father beating her mother. All respondents missing at least one explanatory or response variable are excluded from the dataset.
Even if the DHS dataset is very rich, including other covariates is not straightforward. Most of the variables refer to the moment of interview, and thus cannot be considered as antecedents of the focal events. For example, although it would be interesting to include information regarding education and wealth, only the highest level of education attained and the wellness of the respondent's family at the moment of interview are available. Another relevant aspect that could be taken into account concerns women's ethnicity. However, most (about 80%) of the interviewed women do not recognize themselves as part of an ethnic minority. Furthermore, those who do belong to a heterogeneous variety of ethnic groups, none of which are sufficiently represented in the sample. We therefore exclude ethnic minorities from our study.
Our final dataset consists of n = 10740 women. Table 1 reports a summary of the number of censored cases for the first three response variables within age groups. Figures 1 and 2 offer some insights about the distinctive features of the densities of the ages at events and of their relationships (smoothed regressions) with the Age at interview across different covariates values. Even if the displayed results only relate to Figure 1: Kernel density estimates of (non-censored) ages at events conditioned to Age (at interview, in groups) and area of residence (Urban or Rural). non-censored cases, they emphasize the difficulties implied by modeling the dependence structures. To achieve such an ambitious task, in the next section we propose a model that allows for non-linear response curves (see e.g. Figure 2), non-normal distributions whose features may change with the covariates (see e.g. Figure 1), multivariate response and covariates of mixed nature, and censoring of the responses 3 .

Bayesian nonparametric density regression
We develop a Bayesian nonparametric mixture model that can capture the relationship between n conditionally independent d-dimensional response vectors, Z i , and a vector x * i of predictors. To simplify notation, whenever possible we drop the sub-index i indicating individual observations. The predictors x * = (x 1 , . . . , x p , x * p+1 , . . . , x * q * ) may be of mixed nature. Without loss of generality, we assume that the first p are numerical while the rest are categorical. As is common in regression models, we expand the categorical predictors with binary dummy variables and let x = (x 1 , . . . , x p , x p+1 , . . . , x q ), where q = p + q * k=p+1 (R k − 1) and R k denotes the number of categories of x * k . The observed responses are also of mixed nature. For example, in our application, we consider two types of responses: three positive integer-valued variables with possible censoring and constraints, representing the ages at events, and one binary variable indicating work status. In this case, we refer to the density of the mixed response Z = (Z 1 , . . . , Z d ) with respect to the appropriate measure, e.g. Lebsegue or counting measure, for each response type. To frame our model within existing literature, we review some related contributions. Bayesian nonparametric mixture models (Lo, 1984) are useful tools for density estimation, due to their attractive balance between flexibility and smoothness and ability to recover a wide range of densities (Ghosh and Ramamoorthi, 2003, Chapter 5). Further developments for conditional density estimation, also known as density regression, can be found in the pioneering works of Müller et al. (1996) and MacEachern (1999). Extensions of the former for categorical responses can be found in Shahbaba and Neal (2009). We focus on the latter work of MacEachern (1999), which extends the Bayesian nonparametric mixture model by allowing the mixing measure to depend on the covariates. This yields flexible density regression, where the entire density and not only the mean is regressed on the covariates. Several approaches exist in literature to specify the covariate-dependent mixing measure, but it is not clear how to choose between them. Examples include single-p dependent Dirichlet processes (MacEachern, 2000;De Iorio et al., 2004), with covariate-dependent component parameters but single weights. Alternatively, numerous proposals have been introduced for covariate-dependent weights (Griffin and Steel, 2006;Dunson and Park, 2008;Rodriguez and Dunson, 2011, to name a few). In terms of the support and consistency properties of the model, the distinc-tion is irrelevant (see e.g. Barrientos et al., 2012Barrientos et al., , 2017Pati et al., 2013). However, the analysis of Wade et al. (2014b) suggests that prediction issues may arise at moderate sample sizes for single weights mixtures with linear predictors, when the true relation between response and covariates is non-linear. Given the complex nature of the application that we are considering and the lack of theory justifying transformations that would simplify the shape of the relation between the variables, we have decided to consider a linear predictor approach. Therefore, in this work, we build on the construction of the covariate-dependent weights developed by Antoniano-Villalobos et al. (2014), which allows for combinations of continuous and discrete covariates and favors interpretability.
We require extending the model to multivariate responses of mixed type with possible censoring and constraints. An appealing approach for this relies on a latent Gaussian representation, which provides a simple construction for dependence of the multivariate mixed-type data through the full covariance matrix of the latent Gaussian variables. Moreover, Bayesian inference can be carried out through Gibbs sampling and data augmentation techniques. A Bayesian parametric model based on this idea was proposed by Korsgaard et al. (2003) for multivariate data combining Gaussian, right-censored Gaussian, ordinal, and binary traits. To increase model flexibility, Bayesian nonparametric versions were proposed by De Yoreo et al. (2017) for mixed ordinal and nominal data, by De Yoreo and Kottas (2018a) for multivariate ordinal regression (see De Yoreo and Kottas, 2018b, for a dynamic extension) and by Papageorgiou et al. (2015) for mixed-type spatial data in an epidemiological context. Due to the increased flexibility of nonparametric mixtures, the cut-offs used to define the discrete data from the latent Gaussian variables can be fixed and not estimated or inferred. Moreover, Canale and Dunson (2011) show that Bayesian nonparametric mixtures for discrete data (specifically counts) based on latent Gaussian variables can approximate and consistently estimate a wider range of distributions than mixtures based on discrete distributions, e.g. Poisson or multinomial (see also Kottas et al. (2005); Bao and Hanson (2015) in the context of multivariate ordinal data and Jara et al. (2007) for binary response regression). Another relevant extension is the Bayesian semiparametric model of Jara et al. (2010) for multivariate doubly-censored data indicating time to event, based on a log transformation linking the observed responses to the latent Gaussian variables. When modeling time-to-event data, the log transformation is more appropriate than others, notably truncation. This allows recovery of the underlying structure with fewer and more interpretable components with possibly heavy right tails. Recently, Norets and Pelenis (2020) demonstrated that optimal adaptive estimation of mixed discretecontinuous distributions can be achieved via the latent Gaussian mixture approach.
We combine some of these ideas to build a model which can deal with the challenges presented by the data. We adopt the latent Gaussian approach, associating to each response variable Z a latent real-valued Y . Specifically, an observed value z of the response Z is linked to the realization y = (y 1 , . . . , y d ) of the latent Y = (Y 1 , . . . , Y d ), through a function h whose characteristics depend on the nature of the observable. Examples of transformations for different response types include: where · denotes the floor function, and 1 B (y) denotes the indicator function taking the value one when y ∈ B. Note that the last case considers an ordinal response with A categories and fixed cutoffs of α ,1 < . . . < α ,A −1 . In these examples, the functions h do not depend on x or y for = , but they may, for example when accounting for censored or constrained responses, as for the case study described in Section 5.
The basic building block for our model is the multivariate multiple linear regression: where β is a (q+1)×d matrix of regression parameters and Σ is a d×d covariance matrix. Slightly abusing notation, x = (1, x 1 , . . . , x q ) denotes the vector of observed covariate values extended by a unitary entry. As previously discussed, this parametric model is not flexible enough to capture the complex dependence structures contained in the data. We therefore extend the nonparametric density regression framework introduced by Antoniano- Villalobos et al. (2014) . (1) This model results from considering a mixture where θ = (β, Σ) and a nonparametric prior is assigned to the set of covariate-dependent mixing measures P x , which places mass one on the set of discrete probability measures: Here, δ θ denotes the Dirac-delta function with unit mass at θ. For computational purposes and to ensure convergence of the normalizing constant in w j (x), it is convenient to adopt a stick-breaking representation for the weights, setting w 1 = v 1 and . The parameters of the local linear regression components, θ j , and of the covariate-dependent weights, ψ j , are assumed to be independent and identically distributed according to a base measure P 0 and independent of the weights. Together with the functions h linking the latent variables with the responses, this defines the likelihood structure for the observed data.
In this model, the regression parameters β j and Σ j capture the local linear relation between the latent response and covariates, with normal errors; whereas the ψ j determine, through g, how the influence of each local component to the overall model changes across the covariate space. This deals with situations when the stochastic relation between y and x is too complicated to be captured by a single parametric model. It can also be used when the population is assumed to be constituted by an unknown number of (covariate-dependent) groups such that, within each group, a linear regression model provides a good description of the data. While identifiability issues may prevent the individuation of such groups, this intuition can help in understanding the elements of the model.
Note that the Bayesian nonparametric model for the joint density of y and x introduced by Müller et al. (1996) for density regression, taking the form results in a conditional density coinciding with equation (1). However, an important difference is that in the joint mixture model, posterior inference for the parameters (w j , θ j , ψ j ) is based on the joint likelihood in (2); whereas, for our model, it is based directly on the conditional likelihood of interest. Furthermore, we emphasize that the converse is not true: our conditional density model in (1) does not imply the joint density model in (2). This can be easily seen by constructing a joint density model as the product of (1) and any, say parametric, marginal density model for x. This is a valid construction, which nonetheless recovers the joint model in (2) only when the marginal has the form: This is an important concept, as it highlights that the form chosen for g does not imply a modeling of the distribution for covariates, which may indeed be fixed. The choice and shape of this kernel, however, defines how the conditional distribution changes as x varies (given the parameters ψ). Thus, it determines the amount of information borrowed when making inference at unobserved points in the space of covariates. By choosing model (1) we maintain the same natural and interpretable structure for the weights of the joint mixture model, but exploit all the information available in the data to learn about the relation between x and y, thus improving the quality of the estimation for the conditional distribution (see Wade et al., 2014a), which is the main focus of our application. Clearly, a practitioner interested also in capturing the structure of the random covariates would require a different approach.
The covariate-dependent weight w j (x) represents the probability that an observation with a covariate value x is allocated to the j-th regression component. Such probability can be decomposed into the unconditional probability w j that the parametric model j fits an individual observation, and the likelihood g(x|ψ j ) that an individual allocated to the j-th component is characterized by a covariate value x. The g(·|ψ) can be defined to accommodate different types of covariates. We adopt a factorizable structure: with ψ k = (µ k , τ k ) for k = 1, . . . , p, and ψ k = ρ k for k = p + 1, . . . , q. The use of distribution kernels guarantees convergence, for all x, of the denominator in equation (1). For the unconditional probability w j , different choices of the stick-breaking parameters (ζ j,1 , ζ j,2 ) result in different nonparametric priors (see Ishwaran and James, 2001). For instance, if (ζ j,1 , ζ j,2 ) = (1, ζ), the prior on the weights w j corresponds to that obtained from a Dirichlet process prior. The base measure is chosen as P 0 (β, Σ, ψ) = P 0 (β|Σ)P 0 (Σ)P 0 (µ|τ )P 0 (τ )P 0 (ρ). We use the conjugate matrix-variate Normal-Inverse Wishart for the regression parameters: Notice that the Inverse Wishart assigns prior mass to full covariance matrices. Other prior specifications can be used to allow for other types of covariance structures, e.g. a product of Inverse Gammas for diagonal covariance matrices or a G-Wishart for sparse precision matrices. As for β, we are assuming a structured dependence, allowing for efficient computations through Kronecker products and a reduced number of hyperparameters compared to a full Gaussian distribution. Alternatively, a multivariate Gaussian distribution could be used, assuming independence between columns. To complete the specification of the base measure, we set: . In the next section, we describe an adaptive truncation algorithm allowing posterior inference for our model. The algorithm is general and only requires specific adjustments depending on the h functions linking the observed responses with their latent counterparts.

Adaptive truncation algorithm
To scale appropriately with the sample size and data dimensions, we devise an algorithm for posterior inference based on a finite truncation of the mixture, where the number of components is allowed to increase adaptively to obtain a good approximation of the infinite-dimensional posterior. The truncated latent model with J components is: popular truncated stick breaking method (Ishwaran and James, 2001) where v J = 1. However, re-normalized stick-breaking may provide a better finite-dimensional approximation by evenly distributing the remaining mass across components, as opposed to assigning all remaining mass to the last component in truncated stick-breaking.
The proposed algorithm is based on the adaptive truncation scheme developed by Griffin (2016), extended for density regression and mixed type responses. It consists of two main steps, namely an MCMC step for a fixed truncation level, J 0 , followed by a sequential Monte Carlo (SMC) step used to increase the number of components of the mixture. The first step produces M posterior draws (w m 1:J0 , θ m 1:J0 , ψ m 1:J0 , y m 1:n ) M m=1 , which are then used as particles in the SMC step. We provide a concise summary below, with software and full details provided through the authors' GitHub repository 4 and accompanying documentation.
MCMC for fixed truncation. Since the truncation level J 0 is fixed throughout this step, we omit it from the notation, writing w = w 1:J0 , θ = θ 1:J0 , and ψ = ψ 1:J0 . Similarly, the observed response is denoted by z = (z 1 , . . . , z n ), with z i = (z i,1 , . . . , z i,d ), and analogously for the covariates x and the latent y. The approximate posterior given the sample (x, z) of size n, using the truncated likelihood (3), takes the form: where P J0 (w, ψ, θ) indicates the restriction of the prior (as detailed in Section 3) to the parameters in the truncated space. Dependence w j (x) = w j (x|ψ j ) of the weights on the parameters has been made explicit. Moreover, the functions h i, = h (y i , x i ) linking the latent variables to the observed responses are tailored to the specific application in Section 5.
Due to lack of conjugacy, we resort to a generic Metropolis-within-Gibbs scheme to perform posterior sampling, that updates blocks of parameters adaptively. The adaptive random walk algorithm used here, based on Algorithm 6 in Griffin and Stephens (2013), adapts the proposal covariance matrix to achieve both a specified average acceptance rate (a 0 = 0.234) and a proposal covariance matrix equal to 2.4 2 /p times the posterior covariance matrix, p being the dimension of the parameter block of interest. These criteria have been shown to be optimal in many settings (Roberts et al., 1997;Roberts and Rosenthal, 2001). In more detail, suppose that we want to sample a block of parameters φ of dimension p from a distribution with probability density function Q. In order to utilise the adaptive random walk algorithm, we first consider a transformation t(φ) that has full support on R p . At each iteration m, we propose a new φ * such that: where ξ m−1 is the adaptive covariance matrix. We accept φ m = φ * with probability equal to the minimum between 1 and the ratio: with |J t (φ)| denoting the determinant of the Jacobian of the transformation.
Transformations of β j , µ j , τ j , ρ j , and v j are straightforward through identity, log, and logit functions. Instead, transformations of Σ j and y i are more involved. For each Σ j , we consider a vectorization of a decomposition of the matrix, Σ j = L j D j L j , where L j is a lower triangular matrix with unit entries on the diagonal and D j is a diagonal matrix with positive entries, and we take the log of the diagonal entries. In this case, the proposed Σ * j can be obtained from the proposed t * in equation (5) by inverting this transformation. In addition, the determinant of the Jacobian, which is required in the acceptance ratio in (6), depends only on the diagonal elements D j, , of the matrix D j , specifically, . For each latent vector y i = (y i,1 , . . . , y i,d ), the terms h (y i , x i ) = z i, define constrained regions for the latent y i , such that y i, ∈ (l i, , u i, ), which are provided for the case study in Section 5. We assume that an appropriate ordering of the responses leads to bounds (l i, , u i, ) that may in general depend on y i, for < . This allows us to define a sequential logistic transformation t(y i, ; y i,1: −1 ) for = 1, . . . , d, based on the bounds (l i, , u i, ). From the proposed t * in equation (5), the inverse transformation can be applied to obtain the proposed y * i , sequentially for = 1, . . . , d, where the bounds may also be updated sequentially if they depend on y * 1:( −1) , e.g. for age at first child in our application. This ordering also guarantees that the Jacobian matrix is lower triangular, so its determinant is simply the product of the diagonal elements, |J t (y SMC for adaptive truncation. The second stage involves the selection of the truncation level J by sequentially increasing it from the initial level J 0 . The addition of a new component improves the quality of the approximation to the infinite-dimensional model but increases the computational burden, due to the considerable number of parameters added. Therefore, devising an algorithm that can select the level of truncation parsimoniously is crucial. To achieve this, possible approaches are presented in Norets (2020) and Griffin (2016). We focus on the latter, which adaptively increases the number of mixture components via SMC.
The MCMC draws from the previous step are used as the M initial particles in the SMC. At each iteration of the SMC, a new component is added to the mixture, by sampling the additional set of parameters (w m J+1 , ψ m J+1 , θ m J+1 ) from a suitable importance distribution. We sample from the prior Beta(v m J+1 )P 0 (ψ m J+1 , θ m J+1 ), independently for m = 1, . . . , M , making use of the recursive stick-breaking relation .
When the effective sample size (ESS) is lower than a threshold, indicating poor mixing, the particle values are resampled according to such weights (Del Moral et al., 2006). Here, we resort to systematic resampling (Kitagawa, 1996) and perform a rejuvenating step (Gilks and Berzuini, 2001), where the particles are replaced with new values sampled through m * iterations of the adaptive MCMC with J 0 = J + 1. The SMC provides weighted samples from the sequence of truncated posteriors P n J , converging to the infinite posterior P n . To decide when a sufficiently accurate approximation has been obtained, we follow Griffin (2016) and stop at the truncation level J * , such that the discrepancy D(P n J , P n J+1 ) = |ESS J − ESS J+1 | is less than a specified δ > 0, for a fixed number I of consecutive increments, J = J * −I +1, . . . , J * . We use the suggested values of δ = 0.01M , I = 4, and m * = 3. As an alternative to the ESS, we also consider a discrepancy based on the conditional effective sample size (CESS), which was proposed by Zhou et al. (2016), in the context of model comparison via SMC.
Simulation study. To assess the performance of the proposed procedure, we applied our model to a simulated dataset with a known structure, mimicking the most relevant features of our motivating data. Specifically, we considered q * = 3 covariates; the first, x 1 , is continuous and observed at a discrete scale (resembling the age at interview in our case study); the others, x * 2 and x * 3 , are categorical with three and two levels, respectively. We generate two positive integer-valued responses, Z 1 and Z 2 , and one binary response, Z 3 , related in different extents to the covariates. Z 1 is a discretized noisy observation of a nonlinear function of x 1 . Similarly, Z 2 is a discretized noisy observation of a nonlinear function of x 1 and the realized z 1 . In both cases, the response curves are the same for x * 2 = 2, 3 and differ for other categorical combinations, while the errors are not normal but right skewed, additionally depending on x 1 and x * 3 for the second response. Censoring is defined before discretization, when the responses are greater than the first covariate. Finally, Z 3 was simulated from a linear probit model depending only on x 1 . Complete details of the data-generating distributions are provided in the SM, together with the specification of the prior parameters.
We performed a robustness analysis on the simulated dataset, comparing several initializations, namely by setting J 0 = 2, 3, 5, 10, 15, 20, 30. We found that initializing the algorithm with a conservative number of components for moderate sample sizes provides a good compromise between computational time, mixing, and accuracy. For large sample sizes however, we suggest to initialize with a generous number of components because of the computational burden implied by resampling. Specifically, for modest sample sizes, we can save the parametric mixture likelihoods, unnormalized weights, and normalizing constants for each observation and for every particle, with a memory complexity of O(nJM ). For large sample sizes, this becomes too costly and these terms need to be recomputed at each block update of the MCMC rejuvenation step of the SMC. Thus, it is convenient to minimize the need of resampling for large sample sizes, by not being too conservative in setting J 0 for the problem at hand. We also considered two different discrepancy measures defining the stopping rule of the SMC, i.e. ESS and CESS, and results confirmed robustness to such choice.
Finally, focusing on the model flexibility and its ability to recover the correct structure present in the data, we performed additional simulations, exploring longer chains, increased sample size, and omitted censoring. The results, reported in the SM, are satisfactory for all scenarios, indicating (as expected) improvements when a longer chain is run or the censoring is removed. Overall, our model was able to recover the underlying structure present in the data. The true conditional behavior was well recovered in areas where data is available. However, as can be expected, the model struggled when predicting at values far from the observed data. We highlight that interpretation of the conditional dependence structure in the latent scale as well as the latent covariance matrices of the mixture components and its relation to the dependence structure on the observed scale is an open and interesting direction of research, which would expand the work of García-Zattera et al. (2007) in the parametric setting. We also observed improvements in comparison with a parametric version of our model.

Application: life patterns of Colombian women
We employ our model to study the joint distribution of the ages at Sexual Debut (Z 1 ), Union (Z 2 ), and First Child (Z 3 ) as well as Work Status (Z 4 ) at the moment of the interview, conditional on the considered covariates. These are Age at interview (X 1 ), Region (X * 2 ) and Area (X * 3 ) of residence, having (P) or not (P) been disciplined using Physical Punishment (X * 4 ) during childhood, and having (B) or not (B) been exposed to Parental Domestic Violence (X * 5 ), referring to whether the respondent witnessed her father beating her mother.
To specify our model, we define the link functions: with c (y, x) = 1 (0,x1+1) (exp(y )), for = 1, 2, and c 3 (y, x) = 1 (0,x1+1) (exp(y 1 ) + exp(y 3 )). In this case, exp(y 1 ) and exp(y 2 ) can be interpreted as the latent continuous ages at sexual debut and union, respectively. The constraint that age at first child must be greater than age at sexual debut is strictly enforced through the transformation, and we can interpret exp(y 3 ) as the latent continuous time between sexual debut and first child and exp(y 1 ) + exp(y 3 ) as the latent continuous age at first child. The bounds required in the adaptive MCMC are obtained from inverting z = h (y, x); concretely, (l , u ) = (log(x 1 + 1), ∞) for censored z = 0 (log(z ), log(z + 1)) for uncensored z = 0 , when = 1, 2, The prior parameters are specified as follows. For the linear coefficients and covariance matrix of each component, they are set empirically based on a multivariate linear regression fit. Specifically, we set y i, = (l i, + u i, )/2. For = 3, when the lower bound is −∞, i.e. age at sexual debut is equal to age at first child, we set y i,3 = u i,3 − 1. For censored observations, we sample y i, from a truncated normal distribution with mean and covariance computed from the uncensored observations. For the binary response, y 4,i = −1 for z 4,i = 0 and y 4,i = 1 for z 4,i = 1. A multivariate linear regression fit for this auxiliary response gives estimates β of the linear coefficients and Σ of the covariance matrix. We then define Together, U and Σ j reflect the variability of β j across components, and we set the matrix U such that min(diag( Σ)) U = 20(X X) −1 . The factor 20 for this g-prior was selected to ensure reasonable ages (i.e. mostly lower than 100) in prior simulations and to avoid very extreme age values that can result when the constant is too big (making the prior too vague). Indeed, we explored more uninformative and vague prior choices but found that this could lead to quite large and unreasonable imputed ages for censored data. We further set ν = b + 3. Other specified hyperparameters include µ 0,1 =x 1 ; u 1 = 1/2; α 1 = 2; γ 1 = u 1 (range(x 1:n,1 )/4) 2 ; k = (1, 1) for k = 2, . . . , q; and the parameters of the stick-breaking prior are ζ j,1 = 1 and ζ j,2 = 1.
We initialize the MCMC algorithm with a number of components, J 0 = 35, large enough to avoid a small ESS and subsequent resampling. The MCMC is run for 20,000 iterations after discarding the first 30,000 as burn-in, and one in every 10 iterations is saved to produce 2,000 particles. For the SMC, we choose the ESS-based stopping rule, due to the robustness observed for the simulated data sets.

Posterior predictive checks
The assessment of the goodness of fit of the proposed model is of crucial importance when the data are of complex nature, such as the data analysed in this work. Therefore, before presenting the main results, we report posterior predictive checks in order to validate our model. Using the weighted particles produced by the algorithm in Section 4, we first replicate the data and use the replications to check the predictive power of our model. Figures 3 and 4 report such predictive checks for the regions Bogota and Territorios Nacionales. In particular, for the discretized ages at events, the figures compare the Kaplan-Meier curves for the data and each replicated data set. For the Work response, the histogram of the proportion of working women across the replicated datasets is displayed, along with the proportion in the observed data. In general, the estimates observed in the original data match the ones computed for the replicates, within all the considered categories, indicating good fit of the model to the data.
In addition, we compute the posterior predictive p-values (Gelman et al., 1996): where T (z rep ) and T (z) represent the selected discrepancy computed respectively for replicated and observed data, and ξ = (w 1:J * , ψ 1:J * , θ 1:J * ) indicates the model parameters with posterior π(ξ|z). The integral above can be computed based on the weighted particles, by simulating a replicated dataset for each particle. In our application, the discrepancy measures used, for = 1, 2, 3, are: Note that for the censored data, we only observe that the event is not experienced by the given age, and thus the discrepancy T cens is defined as for a binary variable. On the other hand, for non-censored events, the discrepancy T noncens can be based on the estimated age at event. The posterior predictive p-values for the full sample and for subsets defined based on selected combinations of categorical covariates are reported in Table 2. Results are encouraging and support the goodness of fit observed in Figures 3  and 4, but highlight possible issues in interpreting censoring results.

Results
In the SM, we describe various posterior and predictive quantities that can be computed from the weighted particles to describe the relationship between the observed responses  Table 2: Posterior predictive p-values for selected discrepancies computed for the full sample and subsets corresponding to violent (P, B) and non-violent (P,B) families. We report the results obtained for the regions Bogota and Territorios Nacionales. and the covariates. For the sake of conciseness, here we display only a selection of predictive quantities, which offer some insights about the situation of Colombian women. Specifically, we compare women who were raised in violent family environments (P,B) with those who were not (P,B). Figure 5 displays the predictive medians of the (undiscretized) ages at events and the posterior probability of working as functions of Age. More detailed information arises from the analysis of the predictive densities, some of which are reported in Figure 6. Notice that due to the clear asymmetry in the densities, the predictive median allows a better representation of the center, as opposed to the mean. We highlight that to improve visualization we focus on predictive quantities (medians, probability of success, densities, etc.), but the Bayesian approach permits to also study uncertainty and report credible intervals for these quantities (e.g. Figure 9).
The data presents heavy censoring for younger cohorts (summarized in Table 1). This information is included by imputing, at each iteration of the algorithm, ages at events which must be higher than Age. Indeed, above the dashed lines of Figure 6, the density estimates are based on these imputed ages and borrowing of information at other covariate levels. Therefore, while we can reliably estimate the mass above the dashed line given Age, caution should be used when interpreting the shape of the right tail in this region, as this is not identifiable from the observed data. Moreover, when such mass exceeds 0.5, the predictive median is affected by the imputed values and is therefore less reliable. This corresponds to median values of age at event which are higher than Age, represented as dotted lines in the figures. Further, censored data also arise from women who will never experience an event. This is the prevailing cause of censoring for the older cohorts, contributing to higher medians and heavier right tails. Accommodating censored cases is clearly useful; however, results arising from heavily censored data should be interpreted with caution.
Starting with Figure 5, observe that the shapes of the median curves change across combinations of the categorical covariates, which justifies the employment of a flexible model that does not impose a single functional form. A clear difference is evident between urban and rural areas, the latter presenting lower ages at events, controlling for other covariates. This is expected since rural areas are generally characterized by lower levels of education and wealth indicators, both identified in the literature as factors  related to anticipation of sexual activity and family formation. Comparing cohorts, we observe that younger women tend to anticipate sexual debut, a phenomenon largely recognized as a consequence of the better knowledge and the more diffuse use of contraceptive methods. Instead, the curves for the ages at union and at first child appear flatter, particularly for urban women with non-violent family environments and are even increasing for women from violent families. At first, this may seem counter-intuitive, because one would expect the younger generations to postpone family formation, particularly in urban areas, due to an expected prolonged education. However, an incorrect use of contraceptive methods, particularly among very young or less educated women, and the violent conditions linked to the armed conflict may result in early pregnancies (Ali et al., 2003;Núñez and Flórez, 2001;Daniels, 2015). Indeed, an increase in teenage childbearing in Colombia has been observed since 1990, mainly among women from disadvantaged backgrounds (Batyra, 2016;Soto, 2007, 2013).
Focusing on the predictive densities for the least and the most developed regions, Territorios Nacionales and the capital city Bogota (Figure 6), further justifies the use of a density regression model. In fact, the observed flat median curves correspond to rather different distributional behaviors of ages at union and child, across covariate values. Moving from the least to the most developed context (top to bottom in the figure) entails an increase of the median curves, dispersion, and probabilities of not having experienced the events by a given Age. An increased dispersion, with pronounced rightskewness, is more evident for older cohorts in urban environments. This is in line with the greater heterogeneity in urban contexts as well as with the wider range of opportunities offered, for example in terms of education. Such heterogeneity becomes more pronounced among the older cohorts who have had time to profit from such opportunities. The flexibility gained in urban contexts is offset in violent environments, thus resulting in more concentrated distributions. While our definition of a violent environ-   The joint modeling approach permits us to study also the conditional relation between responses. For example, Figure 7 shows the conditional predictive medians of the time from sexual debut to union given the age at sexual debut for women with Age = 20, 30, 40 (dotted lines indicate predicted ages at event higher than Age; the corresponding conditional densities are reported in the SM). Interesting differences can be observed across regions, likely related to their socio-demographic characteristics (Ojeda et al., 2011). We observe that women in Atlantica and Territorios Nacionales (and to a lesser extent Oriental) compared with Pacifica and Bogota tend to experience sexual debut and union closer in time, suggesting that sexual debut is possibly delayed until union. Such tendency is more pronounced, compared to the other regions, for rural women raised in violent families. Similar results are observed for the time from sexual  debut to child (details in the SM).
Finally, the probability of working ( Figure 5, bottom row) is, as expected, higher in urban areas. Moreover, women who grew up in violent environments show a higher propensity to work, more pronounced among younger women. These same women, as previously observed, show a tendency to anticipate events. A possible explanation is that young women who leave the parental house to escape violence may start cohabitation and decide to drop out of school, entering the labor market to contribute to family income. This apparently contradicts studies (see e.g. Gimenez Duarte et al., 2015) pointing to the difficulties of young women, especially those with children, to participate in the labor market. However, this paradox is solved when analyzing the estimated predictive probabilities of working as functions of Age, conditional on having the first child at ages 15, 20 and 25 (Figure 8, top to bottom). Indeed, the probability of working at each Age increases with the age at first child. In particular, we observe a much lower probability of working for young mothers, that persists even when considering their labor market participation later in life. This suggests a scaring effect of teenage motherhood. This effect is further supported by Figure 9, which depicts the 95% point- Figure 9: Predictive (unconditional) probability of working as function of Age with 95% pointwise credible intervals (top row) for women who grew up in violent (P, B) and nonviolent (P,B) families. Subsequent rows depict the probability conditional on different ages at first child. Results are reported for urban and rural areas of the least developed region (Territorios Nacionales) and for the capital (Bogota). Left of the dashed line indicates when the age at first child is higher than Age.
wise credible intervals for the probability of working as functions of Age, comparing the unconditional probability (top row) and the probability conditional on different ages at first child (subsequent rows). Consistent with previous results, we focus on the least developed region (Territorios Nacionales) and on the capital (Bogota).

Concluding remarks
In this work, we proposed a novel Bayesian nonparametric model for density regression, allowing for mixed responses with censored, constrained, and binary traits, that can flexibly change with combinations of the categorical and numerical covariates. We developed a general algorithm for posterior inference, that effectively scales to large datasets by adaptively determining the necessary truncation level to approximate the infinite-dimensional posterior. We customized the model and algorithm to a specific case study, but they can be applied in other contexts through minor modifications, by appropriate definition of the link functions. Note that our model accommodates for non-informative (or random) censoring. Interesting extensions concern other types of censoring. From a technical point of view, our results highlight the advantage of a flexible model, accounting for a different shape, location, and dispersion of the response distribution across the covariate levels, as well as for censoring. Additionally, a variety of classic graphic tools and quantities of interest, such as survival curves and hazard functions, can be derived. Importantly, the joint analysis of the responses allows for a rich variety of conditional analyses, which can be conducted focusing on different aspects, a very useful feature when studying complex phenomena.
For our case study, the findings suggest interesting considerations regarding life patterns of Colombian women. In the first instance, we found a confirmation of the differences between rural and urban areas, which evidence the need of interventions towards a more balanced development of the country. Furthermore, our results signal that the regions with a higher risk of early transition to adulthood are those with the worse development and wellness indicators, thus corroborating studies on the risks related to disadvantageous conditions. One of the most interesting results is the rather clear evidence of the impact of family violence on women's choices and behaviors. An anticipation of the considered events is observed for women who were physically punished during childhood and witnessed parental domestic violence, two factors we used as proxies for a violent family environment. The relation between child abuse and neglect and the child's future family choices has been discussed in the literature. Nonetheless, to our knowledge, this is the first attempt to study the possible relation between parental family violence and the events marking the transition to adulthood. Our findings confirm that a violent family environment can be regarded as a key risk factor that may nullify the positive influence of developed areas.
Overall, our case study may contribute to the planning of targeted interventions. Even if recent governments have shown an increased attention to the conditions of women and children, a formal statistical approach to systematically identify and quantify critical situations is crucial to support such a process. For example, teenage pregnancy is recognized as a priority issue in Colombia by the Government (Gimenez Duarte et al., 2015;Daniels, 2015), due to its hindering personal development and agency (Azevedo et al., 2012); our results confirm its scaring effect and quantify the risk of teenage pregnancy, identifying some of the most vulnerable groups. We conclude with the hope that the present work may stimulate further reflection, research and survey on the topic, and possibly lead to additional investigations exploiting the availability of DHS surveys on other developing countries and the flexibility and wide applicability of our model.