Semiparametric Multivariate and Multiple Change-Point Modeling

. We develop a general Bayesian semiparametric change-point model in which separate groups of structural parameters (for example, location and dispersion parameters) can each follow a separate multiple change-point process, driven by time-dependent transition matrices among the latent regimes. The distribution of the observations within regimes is unknown and given by a Dirichlet process mixture prior. The properties of the proposed model are studied theoretically through the analysis of inter-arrival times and of the number of change-points in a given time interval. The prior-posterior analysis by Markov chain Monte Carlo techniques is developed on a forward-backward algorithm for sampling the various regime indicators. Analysis with simulated data under various scenarios and an application to short-term interest rates are used to show the generality and usefulness of the proposed model.


Introduction
Multiple change-point models allow for changes of model distributions at multiple, unknown, time points. These models have been intensively studied in statistics and econometrics over the last several years. The earliest Bayesian change-point models are explored by Chernoff and Zacks (1964), who assume a constant probability of regime change, and by Smith (1975), who investigate the single change-point model under the assumption of exchangeable intra-regime observations and inter-regime independence. The Bayesian approaches usually model the change-point as a stochastic process and conduct inference through Markov chain Monte Carlo (MCMC) algorithms (Carlin et al. 1992; Albert and Chib 1993 among many others) or through the reversible jump MCMC (Green 1995;Green and Mira 2001). Other recent papers on Bayesian changepoint problems include Giordani and Kohn (2008); Pesaran et al. (2006); Maheu and Gordon (2008); Geweke and Jiang (2011).
Our aim is to extend the literature by proposing a new model in which groups of structural parameters (for example location and dispersion parameters) are subject to change-points at different times. Both the number and locations of the change-points are unknown, and are estimated separately for each group of parameters. Our model takes advantage of the widely used formulation of Chib (1998), where the change-point modeling is in terms of a latent discrete state variable that follows a unidirectional Markov process and indicates the regime from which a particular observation has been drawn. In applying this formulation to the proposed model, our multivariate changepoint process is described by p latent state variables, one for each group of structural parameter, where each state variable evolves in the manner of the Chib (1998) model. To achieve some further flexibility in the evolution of these state variables we assume that the transition matrix (one for each state variable) depends on time. Thus, the change-point probability depends not only on the regime between two adjacent change points, but also, realistically, on time. In addition, to robustify the distribution of the outcomes we suppose that the prior distribution of each group of structural parameters is unknown and model each prior distribution by a separate Dirichlet process (DP) prior. The Dirichlet concentration parameters are state-dependent and inferred from the data, so that the extent to which estimates deviate from the base parametric measures can differ among groups of structural parameters. We refer to this model as a Bayesian semiparametric multivariate multiple change-point model.
All the model features above have practical relevance in our motivating financial application where we provide a semiparametric Bayesian analysis of a model of shortterm risk-less interest rates in which a) the conditional mean and variance are subject to separate change-points b) the probability of regime changes is time-dependent and c) the distribution of the data is heavy-tailed and skewed. We show that our Bayesian semiparametric multivariate multiple change-point model produces improved inferential results and better predictions.
Surveys of related frequentist semiparametric and nonparametric change-point modeling are given by Brodsky and Darkhovsky (1993) and Chen and Gupta (2011). Within the Bayesian semiparametric literature, Muliere and Scarsini (1985) extend Smith (1975) to two independent DPs on the conditional distributions of the observations and for two different regimes, separated by an independent random change-point. Such independence assumptions were subsequently relaxed by Mira and Petrone (1996) through a mixture of products of DPs that introduces dependence among the two regimes. Hartigan (1992, 1993); Quintana and Iglesias (2003); Loschi et al. (2003); Loschi and Cruz (2005) and Martinez and Mena (2014) propose a different approach for changepoint models, based on random partition distributions, whilst Park et al. (2012) present a Bayesian Poisson change-point regression model with a nonparametric step function as baseline rate.
Closer to our approach, Ko et al. (2015) and Maheu and Yang (2015) use the hierarchical DP to model the rows of the transition matrix of the regimes. DPs on both structural parameters and latent transition matrix are implemented in change-point models by Dufays (2016). This last framework is extended to multivariate change-points in Bauwens et al. (2015), following the lines of the parametric model of Eo (2012). We differ from the other approaches, since we model the prior distributions of the structural parameters as DPs, letting the state indicator of the regimes follow Chib (1998). This allows for greater parsimony, while still permitting recurrence of regimes and an unknown number of regime changes. With no DPs on transition rows, we further detach from the framework of the infinite hidden Markov model of Teh et al. (2006): the hierarchical DP structure on the transition rows potentially allows for an infinite number of change-points, a property that in finite sample sizes shows its usefulness in the possibility of not specifying a priori the number of regime changes. The price we pay is that we need to specify a maximum number of regimes. Furthermore, with the exception of Bauwens et al. (2015), in the mentioned works all structural parameters follow the same change-point process, whilst we distinguish different types of regimes for different structural parameters. Finally, we contribute to the extension of existing approaches through the introduction of more realistic heterogeneous time-dependent latent transition matrices, a property that can be important as we show in simulation experiments. By being anchored to the Chib (1998) model, the proposed semiparametric multivariate multiple change-point model still keeps the attractive feature of being completely tractable MCMC techniques.
The rest of the paper is organized as follows. We introduce the model and study its properties in Section 2, with focus on interarrival times and number of change-points, respectively in Section 2.2 and 2.3. Posterior sampling of change-points and structural parameters are discussed in Sections 3.1 and 3.2. The algorithm is applied in Section 4 to simulated data: in Section 4.1, we omit, one at the time, the different components of the proposed model, to informally gauge the relevance of each component; in Section 4.2 we stress the robustness of our method in the face of heavy-tailed data and skewness; finally we implement and compare our method on autocorrelated data in Section 4.3 and in terms of prediction performance in Section 4.4. We present the empirical application to short-term riskless rates in Section 5 and finally provide summary comments in Section 6, highlighting avenues for further work. Proofs of propositions and corollary are available in Supplementary Material (Peluso et al., 2018). All the codes, available upon request, have been written in the R programming language and run on a PC with core i7-7500U CPU @ 2.70GHz.
The Chib (1998) model is a conceptually and computationally useful reparameterization of this change-point model. The model is defined in terms of a state variable that follows a uni-directional Markov process. Specifically, let s t denote a discrete-time discrete-state latent state variable that takes values {1, 2, . . . , m + 1} such that s t = j means that the structural parameter at time t belongs to the j-th regime. Next, suppose that this state variable is Markovian, and that it can either stay in the current state or move to the next higher state with transitions governed by the probabilities Thus, transitions of this state variable from one state to the next higher state isolate the change-points (τ 1 , . . . , τ m ). Now suppose that the structural parameters are grouped as ξ t = (ξ 1t , . . . , ξ pt ), where each group of parameters ξ it (i ≤ p) changes at idiosyncratic latent time points τ i,1 , . . . , τ i,mi . Then, ξ it = θ ij for all t ∈ [τ i,j−1 , τ i,j ). Following the preceding Chib (1998) model, we introduce p discrete-state, discrete time state variables {s i t }, one for each group of structural parameter, such that s i t = j now indicates that the i-th structural parameter at time t belongs to the j-th regime, and let each state variable progress as above. To achieve further flexibility in the unidirectional evolution of these state variables we assume that the transition matrix (one for each state variable) depends on time, so that, for i = 1, . . . , p, Then 1 − w i j,t denotes the probability that the i-th structural parameter moves from the j-th regime at time t to the (j + 1)-th regime at time t + 1. For brevity, we denote the change-point process driven by the heterogeneous transition matrices formed by w i j,t as We complete the model with a prior F i on θ ij and on w i j,t . Instead of assuming that each F i is parametric, we assume that F i , i = 1, . . . , p, are unknown. We model these unknown distributions by separate DP priors. More specifically, we suppose that F i ∼ DP (M i F i0 ), where M i is the concentration parameter and F i0 is the base distribution function. Thus, under this formulation, the distribution of the observations within each regime is a DP mixture (Lo 1984).

Interarrival Times
We now derive some theoretical properties of our model. Here we focus on the implied prior probabilities of interarrival times and the distribution of the number of changepoints. As a benchmark, we derive the equivalent results assuming that the transition matrix is time-homogeneous. Note that a priori (τ 11 , . . . , τ 1m1 ), . . . , (τ p1 , . . . , τ pmp ) are independent, so that the analysis in the present subsection can ignore the multivariate nature of change-points: we can focus on the generic multiple change-points sequence (τ i1 , . . . , τ imi ), and suppress from all relevant quantities the notational dependence on i.
At time t = 1, no change-point can occur and s t = 1. Define this first time point, immune to change by construction, as k 0 , so that τ 1 > k 0 . In the time-homogeneous case, the transition matrix among the latent regimes is P = (w ij ), where w ij is the probability (independent of time). Suppose that the single free element in each row of P is given a Beta prior distribution, w ii ∼ Beta(α, β) and w i,i+1 = 1 − w ii for i = 1, . . . , m, w m+1,m+1 = 1 and all other elements of the matrix are null. It is important to realize that the bi-diagonal nature of the transition matrix is not restrictive in any way, because subsequent states may refer to distributions of previous regimes. Then, in the timehomogeneous setting, p(τ 1 = k 0 +k 1 |P ) = (1−w 11 )w k1−1 11 , so that τ 1 −k 0 ∼ Geo(1−w 11 ), conditionally on the transition matrix. Unconditionally, Similarly, still in the same setting but with multiple change-points τ 1 , . . . , τ r , for any r ≤ m and conditionally on P , the r-th interarrival time is Geo(1−w rr ), whilst, once the transition matrix is marginalized out, the joint distribution of the interarrival times and the distribution of the r-th interarrival time given the previous ones are, respectively, where we have defined τ 0 = k 0 . We stress that, even if the number of change-points is bounded above by m, the interarrival times can assume any integer value lower or equal to m: a realized r-th interarrival time, r ≤ m, inducing a change-point beyond the sample size, means that only r − 1 change-points actually occurred in the sample.
To ease the analysis of the proposed model, denote

Proposition 1. The probability mass function of the r-th interarrival time in Model
(2), conditionally to previous interarrival times and to transition probabilities, is In words, if the transition matrix is time-homogeneous, then the interarrival times are independent of each other, since the realization of the generic j-th interarrival time does not depend on the previous interarrivals. On the other hand, in the proposed model the time-dependent transition matrices create a dependence among interarrival times, more precisely a Markov dependence between an interarrival time and the previous one. Identical distributions of the transition matrices can be obtained as a special case if we choose w j,t ∼ Beta(α, β) for all j and t. If we further assume that w j,t = w j,t for all t and t , homogeneity is restored and it is possible to show that the j-th interarrival time (2), conditionally to previous interarrival times, is

Number of Change-Points
To study the number of change-points, define the stochastic process {N t } as N t := {# of change-points in {1,2,. . . ,t}}.
To simplify the notation in the present subsection, we impose m = T , a maximum number of change-points fixed equal to the number of observations. This is a choice that avoids any restriction to the number of regimes. Similar results can be obtained for m < T , with truncated versions of the formulas below. If the transition probability from one latent state to the next one is not dependent on the current latent regime, i.e.
, and the beta prior on w implies a beta-binomial distribution for the number of change-points: with an expected number of change-points in {1, . . . , t} of E(N t ) = (t−1)β/(α+β). Chib (1998) generalizes from beta-binomial distributed number of change-points to regimedependent transition probabilities, collected in the transition matrix P . Therefore, in the time-homogeneous case, Unconditionally, the previous expression can be written as Remark 1. Fixing α t = α, β t = β for all t, the number of change-points N t simplifies to a binomial random variable with parameters t − 1 and β/(α + β). Prior information are reflected in the choice of α t and β t : β t /(α t +β t ) can be considered as the prior probability of observing a change-point at time t, conditionally to the absence of change-points before t. This implies an expected number of change-points in Larger deviations from this average probability are accepted as reasonable for lower values of α t and β t .

Posterior Sampling of Change-Points
We use the upper-case for defining sequence of random variables up to the subscript, for instance Y t := (y 1 , . . . , y t ). Recall that w i j,t is the prior probability for the i-th structural parameter to move from the j-th regime at time t to the (j + 1)-th regime at time t + 1. Starting from p = 1, that is with all regimes parameters moving according to common change-points, S T = (s 1 , s 2 , . . . , s T ) is sampled following Chib (1996), exploiting the decomposition of p(S T |Y T , θ) in where θ := (θ 1 , . . . , θ m+1 ) and S t := (s t , s t+1 , . . . , s T ). If s t+1 = k + 1, s t is restricted to be equal to k or k + 1, then the term p(s t |Y T , S t+1 , θ) in (5) is .
, with all the mass concentrated on s 1 = 1. Repeated updating and forecasting forward in time allow the computation of the generic p(s t = k|Y T , S t+1 , θ) in (5), starting from the last term p(s T = k|Y T , θ). Then, proceeding backwards, all the terms in (5) can be obtained. Finally, we recover τ from the sampled S T .
It is relevant to stress that m 1 (m 2 ) does not represent a number of change-points in the location (variance) parameter fixed in advance, but it is only the maximum number of change-points that can occur: s 1,T ≤ m 1 + 1 and s 2,T ≤ m 2 + 1, and when the inequalities are strict, the number of change-points effectively sampled is less than m 1 and m 2 . Therefore, the methodology proposed does not restrict to a fixed number of change-points. From the posterior samples of S 1,T and S 2,T we can recover the estimated number of change-points: in each Gibbs iteration the final values of the processes indicating the regimes, s 1,T and s 2,T , are respectively the extracted number of changepoints for the first and second structural parameter, from which we can derive joint and marginal posterior distributions.

Posterior Sampling of Structural Parameters
Assume for the moment that the change-points are univariate, that is all regimespecific parameters move according to common change-points (p = 1). For sampling θ, when p(y t |Y t−1 , θ) and F 0 (θ) are conjugate, Algorithm 2 in Neal (2000) can be slightly modified to be applied in our context. Consider θ * as the distinct values in θ and c = {c 1 , . . . , c m+1 } as the clustering configuration: c i = j tells that θ i belongs to the j-th cluster, that is, θ i = θ * j . Sampling of θ is conducted through sampling of θ * and c. In particular, defining where m −i,c = |{j ∈ {1, . . . , m + 1} : θ j = θ * c , j = i}| is the number of elements in θ −i equal to θ * c and (i) = {t ∈ {1, . . . , T } : ξ t = θ i } is the set of times belonging to the i-th regime. Once c is given, θ * is sampled element wise with probability proportional to the prior F 0 , updated with all the data belonging to those regimes which share the same structural parameters: Note that when p(y t |Y t−1 , θ) and F 0 (θ) are not conjugate, sampling of θ can be performed under similar considerations, with Algorithm 8 of Neal (2000) replacing the Algorithm 2 mentioned above. Furthermore, the extension to separate change-points for different parameters (p > 1) can be handled. For instance, if ξ t = (ξ 1t , ξ 2t ) = (μ, σ 2 ), then μ|Y T , S 1,T , S 2,T , σ 2 and σ 2 |Y T , S 1,T , S 2,T , μ each follows the scheme above.
Finally, we highlight that, when the sampled latent processes S 1,T and S 2,T , indicator of the regimes, show a number of change-points that is less than the maximum allowed, all the remaining structural parameters (corresponding to the regimes not sampled) are drawn from their priors, following the jump method of Carlin and Chib (1995).
Our full model, denoted FullMod, is summarized, for t = 1, . . . , T and for i = 1, 2 as where N (·) and IG(·) are, respectively, the Gaussian and Inverse Gamma distributions. The hyperparameters are fixed to μ 0 = 0, λ = 1, α σ = 1, β σ = 1, w i j,t ∼ Beta(1, 1) for all i, t and j < m i and w i mi,t = 1 for all i, t. Note that all priors are centered on values distant from the true ones, since we actually want to test if the proposed model is able to correct for prior information not necessarily coherent with the data. The concentration parameters M 1 and M 2 have a Gamma(0.05, 0.0001) prior, centered on a highly noninformative value, and are sampled a posteriori following Escobar and West (1995). Finally, we always start the algorithm with no change-points for the structural parameters.
We run the algorithm for posterior sampling proposed in Section 3 for a total number of iterations N = 1000, of which the first N/2 are discarded as burn-in, over K = 40 simulated datasets. In the top right and in the bottom left plots of Figure 1 we report the posterior probabilities that s 1t , the latent regime for the conditional mean, is equal to k = 1, . . . , m 1 +1 and that s 2t , the latent regime for the conditional standard deviation, is equal to k = 1, . . . , m 2 + 1, for t = 1, . . . , T , averaged over the K simulated datasets and for fixed values of m 1 = 3, m 2 = 3. It is clear that the proposed multivariate algorithm perfectly identifies all the change-points, properly distinguishing between changes in mean from changes in variance. Furthermore, we correctly identify the unknown number of regimes: the posterior marginal distributions for the number of mean regimes and for the number of variance regimes are correctly centered on the true values.
To highlight the importance of the single features of the proposed multivariate change-point model (multivariate change-point process, random structural parameters distributions, heterogeneous transition matrices), we compare, for equal hyperprior values and starting points, FullMod with alternative models having specific features turned off: a) model UniMod, with a single change-point process, so that conditional variances and means have common univariate change-points, b) model NoDpMod, with structural parameters θ drawn from a fixed, and not from a random DP, distribution, c) model No-HetMod with transition matrices among latent states homogeneous over time, d) model CauchyMod, where structural parameters θ are drawn from a fixed distribution, with, in particular, the variance following a priori the half-Cauchy distribution (see Polson and Scott 2012), characterized by tails fatter than those in the Inverse Gamma distribution.
In more details, the univariate model UniMod can be written in hierarchical form as whilst model NoDpMod can be represented as FullMod, with the exception that θ ij |F i and F i in FullMod are replaced by Model NoHetMod replaces the last row in FullMod with where the change-points are implied by latent processes and transition matrices sampled as in Chib (1998). Finally, model CauchyMod retraces model NoDpMod, but with a fat-tails half-Cauchy prior for the standard deviation structural parameters, for j = 1, . . . , m 2 , proportional to 1/(1 + θ 2j ).
For all models under comparison, we run the Gibbs samplers for N iterations (whose N/2 discarded as burn-in), over the same K simulated datasets. For the algorithm with univariate change-points (model UniMod ), the results on the regime posterior probabilities are reported in the bottom right plot of Figure 1. The univariate algorithm is not able to identify all the change-points, missing one change-point in the mean and two change-points in the variance. Every mean change-point is closely followed by a variance change-point, therefore a unique sampling step for both types of change-points makes the inferential problem more complicated, since the structural parameters of the regime between every two close change-points can rely on few data. For the Gibbs sampler of model NoDpMod, top left and right plots in Figure 2 show that the removal of the DP random distribution for the structural parameters causes some estimation problem in the variance change-points, whilst change-points in the mean are correctly identified. On average the model identifies the correct variance change-points, but with much lower posterior probabilities. The weakness of model NoDpMod is solved by the variance prior with fatter tails in the parametric model CauchyMod in Figure 3. Finally, the sampling algorithm for NoHetMod in bottom left and right plots of Figure 2 does not correctly identify the conditional mean change-points, whilst it is able to find the variance change-points, but with low posterior probabilities.
The performance of all models on the estimation of the structural parameters and of mean and variance change-point identification is summarized in Table 1, for various scenarios on the allowed maximum number of change-points. We let (m 1 , m 2 ), ranging in {(3, 3), (10, 3), (3, 10), (10, 10)}, to test the robustness of the results to different specifications of m 1 and m 2 . Root Mean Square Errors are reported, separated for mean and variance structural and change-point parameters. For instance, at the row denoted with θ 1 , is reported the quantity 1 1j − θ 1j ) 2 , in which θ 1j is the true mean structural parameter of the jth regime, andθ (k) 1j is the corresponding estimate in the kth dataset, and similarly for the other parameters. There is a clear distinction among the competing models as far as estimation of the regime means is concerned: models FulMod and CauchyMod perform well, with no clear ranking between the two: FulMod seems to better infer the mean structural parameters and the mean change-points, whilst Cauchy-Mod has a better performance on the variance parameters, and there is no majority of scenarios in which one model prevails on the other. Models NoDpMod, NoHetMod and UniMod are more distant from the true parameter values. In the sequel, we focus on the comparison between FullMod and CauchyMod, omitting for brevity the results related to the other models, since those are the two best performing models.

Mispecified Heavy Tails and Skewness
We test the robustness of the simulation results to model mispecifications that do not account for heavy tails and skewness in the data generating mechanism. We first generate To test the robustness of the proposed model to mispecified skewed data, we generate the sample from a skew-normal distribution (Azzalini, 2013), with location and scale fixed to the structural parameters of the previous section, and with asymmetry parameter equal to 4, which is an average case among the simulation settings presented in Chapter 2 of Azzalini (2013). The performance of FullMod and CauchyMod are reported, for the structural parameters and for change-point locations, in Table 2 (columns Full-S and Cauchy-S ), from which we see that FullMod performs better: the heavy-tailed half-Cauchy prior is able to react to the mispecification induced by data with heavy tails, but not to skewed data.

Autocorrelated Scenarios
We now introduce autocorrelation in the generated data and into the models FullMod and CauchyMod. Therefore the hierarchical structure of FullMod can be written as y t |ξ t ∼ N (y t |ξ 1t + ξ 2t y t−1 , ξ 3t ),  Table 1: Estimation performances in the independent Gaussian simulation study. Root Mean Square Errors are reported, for the structural parameters θ 1 and θ 2 , and for the mean and variance change-point locations τ 1 and τ 2 , with true values m 1 = 3, m 2 = 3, averaged over 40 simulated datasets.

RMSE FullMod CauchyMod NoHetMod NoDpMod UniMod
where, for a sample size T = 240, we have fixed one change-point for θ 1 at t = 120, and two change-points for θ 2 at times 80 and 160, in order to have a simulation setting where the two structural parameters have a different number of change-points. The location structural parameter changes from (0, 0) to (2, 0.3) , whilst the variance structural parameter changes from 3 to 0.125 and then to 3.5. Hyperprior parameters are chosen to have diffuse prior distributions for θ 1 and θ 2 : μ γ = (0, 0) , V γ = 10I 2 , where I 2 is the 2-by-2 identity matrix, α σ = β σ = 0.1.
The scenarios considered are averaged over K = 40 artificial datasets, and maximum number of change-points, m 1 and m 2 , span in {(1, 2), (10, 2), (1, 10), (10, 10)}. A Gibbs   sampler is implemented for N = 2000 iterations, whose the first N/2 are discarded, for FullMod and CauchyMod. The RMSEs in the last two columns of Table 2 (Full-C and Cauchy-C ) show that the estimation of the structural parameters and of the change-points is significantly better in FullMod.

Prediction Ability
We compare FullMod and CauchyMod in terms of their prediction capabilities, using the log predictive density: we estimate the log predictive density of each observation at time t * conditionally to the whole past, for t * in [T/2 + 1, T ], and use the average log predictive density over this range, as a measure of predictive ability. The log predictive density at time t * is estimated as where N is the number of posterior samples, and ξ * 1t,i , ξ * 2t,i and ξ * 3t,i are the extractions of the parameters at the i-th iteration of a Gibbs sampler that uses data only up to t * excluded.
A model with better prediction capability presents a higher average log predictive density. The results over the K simulated datasets with autocorrelated data, shown in Figure 4, strenghten the conclusions of the previous subsection since, in all scenarios (m 1 , m 2 ) ∈ {(1, 2), (10, 2), (1, 10), (10, 10)}, FullMod performs better than CauchyMod in terms of predictive capability: its average log predictive density is higher than in CauchyMod, with an advantage which is more significant when m 2 is mispecified.
The fitting properties can be analyzed through a comparison of the marginal likelihoods under the two models. From Chapter 9 of Müller et al. (2015), the only calculation of the marginal likelihood that is explicitly designed for DP mixture models is the one by Basu and Chib (2003). Application of the Basu and Chib (2003) approach to our multiple DP set-up, however, requires a non-trivial extension of their approach for finding the likelihood ordinate, an extension which is beyond the scope of this paper.
The Gibbs sampler iterates between sampling from the full conditionals of S 1T and S 2T (latent regimes for the mean parameters θ 1 and for the variance parameter θ 2 ) and from the full conditionals of θ 1 and θ 2 . Given θ 1 and θ 2 , the processes S 1T and S 2T are sampled as in Section 3.1, fixing a maximum of m 1 = 10 and m 2 = 10 change-points. The clustering configuration and the distinct values are sampled following Section 3.2, for θ 1 and θ 2 separately and conditionally on each other. In sampling θ 1 and θ 2 , the integrals involved in (6) can be computed in closed form.
The methodology is applied to secondary market 3-month T-bills, publicly available from the Board of Governors of the Federal Reserve System. The data are weekly, Friday    Escobar and West (1995).
Posterior estimates are reported in Table 3, with 95% credible intervals in parentheses. The algorithm estimates 5 change-points for the location vector parameter, and 2 change-points for the variance. We report in Figure 5 the marginal posterior probability mass functions of τ 11 and τ 21 , the first change-point for both structural parameters.
regime changes through the posterior distribution of the latent processes S 1T and S 2T . From Table 3 the change-points show narrow credible intervals particularly for σ 2 , for which the jumps are more visible. More volatile is the estimate of τ 15 , coherently with the top right plot in Figure 6, which shows no visible change in the conditional mean around τ 15 . Regime changes for (λ, β) are mainly driven by changes in λ, whilst credible intervals of βs are highly overlapped. This is also suggested from the behaviour of the estimated conditional mean, which evidences clear vertical jumps also in periods with no big price variations. The dynamics of the estimated variance in Figure 6 (top right) shows a feature that is not apparent from Table 3: whilst the change-points τ 21 and τ 22 could also be dirtily guessed by looking at the raw data, more latent is the asymmetric behaviour of the conditional variance around the two change-points, with the abrupt increase at τ 21 , followed by the steep decrease towards τ 22 . Finally, we compare the predictive performances of FullMod and CauchyMod in Figure 7, by comparing their predictive densities, estimated as in Section 4.3. It is clear that FullMod shows higher predictive density and therefore better prediction capability.

Conclusions and Directions of Investigation
We have introduced a new Bayesian semiparametric change-point model with various attractive features, to robustly analyze time series with unknown locations and number of multivariate change-points. A Dirichlet Process (DP) prior on each structural parameter (or group of structural parameters) provides robustness to mispecification in the conditional distributions of the structural parameters. Furthermore, the discrete random distribution assigns positive mass to previous realizations of structural parameters, in agreement with recurrent regimes. In simulation we evidence a relevant improvement in the inferential results, relative to the model with no DP. Through a forward filtering backward sampling algorithm on the latent regimes, we introduce different change-point processes for different (or different groups of) structural parameters, and we are able to conduct the change-point analysis without specifying a priori the number of change-points. We highlight the estimation advantage of a model that treats separately the change-points of different structural parameters, relative to a model with common change-points. Finally we introduce, to our knowledge for the first time in the literature, more realistic time-dependent transition matrices, and again demonstrate, in simulation, how turning off this feature has significant impact on change-points estimation. After deriving some distributional results on interarrival times between regime changes and on the number of change-points, we present an efficient MCMC algorithm for posterior inference, implemented on simulated data and on short-term interest rates. The results show that the proposed semiparametric model and an alternative parametric model with heavy-tailed prior on the variance perform better than alternative less realistic models in terms of estimation of the number and locations of the change-points and for the estimation of the structural parameters, when the data are iid Gaussian or heavy-tailed. When the generated data are skewed or autocorrelated, the proposed semiparametric model performs better than the benchmarks.
A limit of the proposed model is the assumed prior independence between the DPs. Still, a posteriori a dependence between the DPs is implied, since each DP is, a posteriori, centered on a base measure that depends on the structural parameters extracted from the other DPs. It would be interesting to extend the current setting to dependent DPs and test the gain in terms of inferential performance that would derive, for instance, through the mixture process of DPs (Cifarelli and Regazzini, 1978;Peluso et al., 2017), hierarchical DP (Beal et al., 2002;Teh et al., 2005) or the bivariate DP (Walker and Muliere, 2003). Also, the DP induces a number of clusters that grows at an exponential rate, and having sizes with exponential tail behaviour. This clustering structure can represent a limit in some applications, for instance in all those cases with cluster sizes decaying in power-law. An extension of the proposed model to more general Bayesian nonparametric priors (Lijoi and Prünster, 2010) can introduce a more flexible clustering of the structural parameters.