Bayesian Multiple Changepoint Detection for Stochastic Models in Continuous Time

A multiple changepoint model in continuous time is formulated as a continuous-time hidden Markov model, defined on a countable infinite state space. The new formulation of the multiple changepoint model allows the model complexities, i.e. the number of changepoints, to accrue unboundedly upon the arrivals of new data. Inference on the number of changepoints and their locations is based on a collapsed Gibbs sampler. We suggest a new version of forwardfiltering backward-sampling (FFBS) algorithm in continuous time for simulating the full trajectory of the latent Markov chain, i.e. the changepoints. The FFBS algorithm is based on a randomized time-discretization for the latent Markov chain through uniformization schemes, combined with a discrete-time version of FFBS algorithm. It is shown that, desirably, both the computational cost and the memory cost of the FFBS algorithm are only quadratic to the number of changepoints. The new formulation of the multiple changepoint models allows varying scale of run lengths of changepoints to be characterized. We demonstrate the methods through simulations and a real data example for earthquakes.


Introduction
Changepoint models arise in many practical applications, e.g. signal processing, system control, bioinformatics, finance, among many others. Typically, chronologically ordered data streams are collected over a period during which a (random) number of structural breaks happen at unknown locations, so that observations within the same segment, or equivalently, observations between two consecutive changepoints, are regarded as homogeneous and observations across different segments are heterogeneous. The interest lies in making inference for the number and the locations of changepoints and other model parameters. Among many literatures, recently, pruned exact linear time PELT (Killick et al. 2012), wild binary segmentation WBS (Fryzlewicz 2014) and simultaneous multiscale change point estimator SMUCE (Frick et al. 2014) are popular.
In Bayesian context of retrospective changepoint detection, Monte Carlo strategies are often applied for posterior inference, see e.g. Stephens (1994), Green (1995), Chib (1998), Lavielle and Lebarbier (2001), Fearnhead (2006) and Ruggieri and Lawrence (2014), among many others. Online inference for multiple changepoint problems by use of particle filtering is discussed by Chopin (2007), Fearnhead and Liu (2007) and Yildirim et al. (2013) and some others. Ko et al. (2015) proposes a Bayesian multiple changepoint model based on the Dirichlet process hidden Markov model, in which the number of changepoints is not required to be specified. Martinez and Mena (2014) proposes another Bayesian nonparametric approach to changepoint detection based on random partition distributions (Barry and Hartigan 1992) and a split-merge Markov Chain Monte Carlo (MCMC) algorithm is applied for the posterior inference. Most of these studies are suitable only for detecting changepoints appearing in discrete time series or discretely observed random processes. Although some continuous time random processes may be approximated ideally by their discrete time counterparts, it is necessary to develop multiple changepoint models in continuous time in their own right. So, the approximation error introduced by using the discrete-time counterpart, which is often difficult to quantify, can be avoided.
We consider Bayesian multiple changepoint problems for some continuous-time random processes. Multiple changepoint problems in discrete time are often treated in the framework of hierarchical Bayesian models, e.g. latent Markov models, due to some efficient posterior simulation strategies such as forward-backward recursions (Chib 1998, Scott 2002, Fearnhead and Liu 2007 and Viterbi dynamics can be utilized for the inference of changepoints. Similarly, we formulate multiple changepoint problems of random processes into continuous-time hidden Markov models. Recently, some multiple changepoint models for continuous-time stochastic models, such as (marked) Poisson process and finite Markov chain, are formulated as continuous-time hidden Markov models (Rao and Teh 2013, Lu 2019, Lu 2020. However, these models are only suitable for multiple changepoint problems with a fixed number of changepoints. Although the number of changepoints may be determined by use of some model selection criteria after a series of multiple changepoint models with diverse numbers of changepoints fitted, the definition of the model complexities in different criteria can be sharply different. In some cases, different numbers of changepoints may be identified by a range of model selection criteria. The uncertainties in the number of changepoints are difficult to quantify. A new multiple changepoint model for stochastic models in continuous time is formulated, in which the changepoint locations are associated with a left-to-right latent Markov chain defined on a countable infinite state space. In a full Bayesian approach, the joint posterior of the number of changepoints and their locations is simulated via efficient Markov chain Monte Carlo strategies. We suggest a continuous-time version of forward-filtering backward-sampling (FFBS) algorithm within the Gibbs sampling scheme, which is an extension of the discrete-time forward-filtering backward-sampling algorithm of Fearnhead and Liu (2007). It is based on a randomized time-discretization for the latent Markov process on a countable infinite state space through uniformization schemes. Therefore, there are no time-discretization errors required to be accounted for. For multiple changepoint problems in discrete time, it is well-known that the computational cost of the FFBS algorithm is proportional to the number of observations and the memory cost increases with respect to the number of observations quadratically, which is prohibitive when the data set is large and the changepoints are only sparsely distributed. Often, particle filters or pruning techniques have been applied to deal with these problems: see Fearnhead and Liu (2007). Nevertheless, with the uniformization scheme, it is possible to deal with the issue of computational cost and storage cost of a Bayesian multiple changepoint problem in continuous time directly at the scale of changepoint numbers. It is shown that both the computational cost and the memory cost of the FFBS algorithm are only quadratic to the number of uniformization times generated from the uniformization schemes, which can be scaled down to the number of changepoints. This is desirable in comparison to its discrete-time counterpart. Furthermore, By integrating out the model parameters and targeting the marginal posteriors of the number of changepoints and their locations, the efficiency of the posterior simulation scheme is improved.
In Section 2, a multiple changepoint model for a stochastic process is formulated as a continuous-time hidden Markov model. We then introduce a continuous-time FFBS algorithm for the simulation of the trajectories of the latent Markov chain defined on a countable infinite state space. The posteriors of the number of changepoints and their locations are approximated via a Gibbs sampling scheme. In some cases, it might be desirable to obtain the MAP estimates of changepoints. In section 3, we discuss the continuous-time version of the Viterbi algorithm, particularly for multiple changepoint problems of Markov jump processes and (marked) Poisson processes. We demonstrate the methods through 5 numerical examples. The first two numerical examples are simulation studies for the multiple changepoint problem of a Markov jump process. The last three numerical examples include simulation studies for Poisson process and a real data example of New Zealand deep earthquakes.

Model Formulation and Bayesian Multiple
Changepoints Detection

Model Formulation
Let Y θ (t) be a continuous-time random process on a time interval [0, T ]. Assume it is subject to abrupt changes at unknown locations 0 τ 0 < τ 1 < · · · < τ m < τ m+1 T , such that the observed random process Y θ (t) is partitioned into m + 1 segments by m changepoints τ = (τ 1 , · · · , τ m ). The number of changepoints m is often unknown. The main interest lies in making inference for the number of changepoints m, their locations τ and model parameters θ = (θ 1 , · · · , θ m+1 ) in each segment of Y θ (t).
In a Bayesian context, the priors for the number of changepoints m, their locations τ and θ need to be specified. For computational reasons illuminated later, we implicitly define priors for the number of changepoints and their locations by defining priors for the run lengths, i.e. the segment lengths, between consecutive changepoints. The priors for the model parameters θ in Y θ (t) will be given in Section 4 after Y θ (t) is defined. We assume a priori run length of the i-th segment is exponentially distributed with rate q i . This type of priors for the changepoints can be defined as a latent Markov chain X(t) on a numerable infinite state space, with a left-to-right transition rate matrix Q parameterized exactly in terms of a multiple changepoint model. We define the transition rate matrix of a left-to-right Markov chain X(t) by −q 2 q 2 · · · 0 0 0 · · · · · · · · · · · · · · · · · · · · · · · · 0 0 0 · · · −q m q m 0 · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · When X(t) is in the i-th state, Y θ (t) is parameterized by θ i . This type of model formulation is a particular type of continuous time hidden Markov models (CTHMMs) and also a type of product partition models (Barry and Hartigan, 1992).
In the following, we define the brackets [j, i] after a matrix A as the (j, i)-th entry of A. The sample path of a random process Z(t) on a time interval [a, b] or [a, b) is denoted by Z[a, b] or Z[a, b) respectively. Generically, p(.) denotes a probability density (mass) function. For convenience, Y θ (t) is abbreviated to Y(t). Throughout the discussion, we denote a vector (z 1 , · · · , z n ) by z 1:n .
With m changepoints τ on [0, T ], Y(t) is partitioned into m + 1 segments. The i-th segment of Y(t) is associated with the model parameter θ i . Given τ and θ, we assume conditional independence of Y(t) across segments. The likelihood is written as After defining priors for θ, the marginal likelihood of Y(t) on an interval [s, t) is given by We assume the marginal likelihood of Y(t) within an interval can be exactly evaluated, which is achieved either by choosing a conjugate prior p(θ) for θ or carrying out a numerical integration. The priors for m changepoints located at τ 1 , · · · , τ m are given by Thus, the marginal posterior of the number of changepoints and their locations is written as which is known up to a normalizing constant.
Generally, reversible jump MCMC (RJMCMC) methods can be applied to simulate approximately from the marginal posterior of the number of changepoints and their positions. However, these methods can be computationally inefficient, leading to problems such as slow convergence and poor mixing. Instead, we may consider a time discretization of Y(t) by assuming that the locations of changepoints are only at the endpoints of a set of consecutive time intervals, partitioned evenly at 0 < t 1 < t 2 < · · · < t n < T . Then, efficient posterior simulation strategies such as forward-backward recursions (Chib 1998, Scott 2002, Fearnhead and Liu 2007 can be applied. Nevertheless, the approximation error by time-discretization techniques is often difficult to quantify. We suggest a randomized time-discretization by use of the uniformization scheme for a continuous-time Markov jump process on a countable infinite state space, so that quantifying the timediscretization error is bypassed.

Uniformization Scheme
Let Z(t) be a time-inhomogeneous Markov chain on a countable infinite state space with transition rates Assume for some constant λ > 0, Let P s,t [i, j] denote the transition probability of observing Z(t) in state j at time t given that it was in state i at time s. Define a transition probability matrix P t by Then, for all s ≤ t and all i, j, where U(t 1 , t 2 , · · · , t k ) is the k -dimensional uniform distribution of the order statistics t 1 < t 2 < · · · < t k and P t1 P t2 · · · P t k is the matrix product of the transition matrices in (3), see Van Dijk et al. (2018) and Jensen (1953).
In this case, we say that the Markov chain is subordinated to the Poisson process. The above equation has a natural interpretation as Poisson randomization. During [s, t], jumps occur according to a Poisson process with rate λ. Given that k jumps have occurred, the jump times are uniformly distributed on [s, t]. At these Poisson epochs, the state transitions happen in terms of a discrete-time Markov chain with the transition probability matrix defined by (3). It is noted that the uniformization scheme is also viable for a Markov jump process on a countable infinite state space, which can be used in multiple changepoint models with a potentially unbounded number of changepoints.

Gibbs Samplers with a Continuous-Time FFBS Algorithm
Assume {t 1 , t 2 , · · · , t n } is a sequence of uniformization times of X(t) generated from the uniformization schemes, such that X(t 1 ), · · · , X(t n ) is a discrete-time Markov chain with the transition probability matrix Given t 1:n , the locations of changepoints of Y(t) are choosen from the set of dis- as a sequence of granularized observations. Let s i denote X(t i ) for i = 1, . . . , n and let g(i, k) denote the probability mass function of the run length of a segment, beginning from t i and ending at t k , so that k>i.
The associated cumulative distribution function of the run length for this segment is written as Let C t be a state variable taking values in 0, 1, 2, · · · , t − 1, which is defined to be the location of the most recent changepoint before time t. If there is no changepoint before t, C t = 0. {C t } is a discrete-time Markov chain on 0, 1, 2, · · · , t − 1. In Fearnhead and Liu (2007), the transition probability of C t is given by The forward filtering algorithm gives the filtering probabilities When the filtering probabilities are stored, the backward sampling step generates samples from the posteriors of the number of changepoints and their locations. So, the last changepoint C n is generated first according to the filtering probabilities P(C n |Y[0, T ]). Given the latest changepoint C n = t, the next changepoint is generated recursively and backwards according to The recursion terminates when C t = 0. This is another version of the continuous-time forward-filtering backward-sampling (FFBS) algorithm: see Rao and Teh (2013) for another version of the FFBS algorithm for Markov-modulated Poisson processes with a finite number of states and continuous-time Bayesian network.
The computational cost of this version of FFBS algorithm is quadratic in n, that is the actual number of simulated uniformization times. The memory cost of this version of FFBS algorithm is only quadratic in n. By computing directly on the scale of changepoints, the memory cost is significantly lowered, which is otherwise prohibitive as in the usual case when the filtering probabilities are computed and stored in an event-by-event scale, particularly for a very long sequence of observations in which the changepoints are only sparsely distributed. The proposed method is potentially suitable for the detection of sparse changepoints in a very long sequence of observations. Gibbs sampling is completed by sampling the model parameters conditioned on the full path of the latent Markov chain. Let Γ(a, b) be the prior distribution of q i . So, the joint prior of Q is given by The full conditional of q i is distributed according to Γ(a +1, b+(τ i − τ i−1 )). A two-block Gibbs samper is given in Algorithm 1.
One strength of this model formulation is that it allows a varying scale of segment lengths to be characterized in the transition rate (changepoint recurrence rate) matrix Q. When the rate parameters in Q are highly variable, the Poisson rate λ utilized in the uniformization scheme will be dominated by the largest q i in Q, potentially leading to an excessive number of Poisson events generated by the uniformization procedure, which will significantly inflate the memory cost and computational cost and slow the algorithm. Instead, we consider a modification of the previous uniformization scheme by using a nonhomogeneous Poisson process with the Poisson rate λ(t) adaptively tuned with respect to the hazard rate (changepoint recurrence rate) q(t). We choose a piecewise constant intensity function λ(t) to implement the randomized time-discretization for the latent Markov chain piecewisely (Van Dijk 1992). To simulate the subordinated Poisson process with a piecewise constant intensity function λ(t), the standard thinning methods (Lewis and Shedler 1979) can be employed. Algorithm 1 is replaced by the following Algorithm 2.
In many cases, the posterior of the model parameter θ in each segment is also of interest. We may sample from p(θ|Y[0, T ]) by a collapsed Gibbs sampler, in which the model parameter θ is integrated out from the joint posterior and the block Gibbs sampler (Algorithm 2) is applied to the remaining components until convergence. Then, the model parameter θ in each segment is sampled according to the full conditionals, see Liu (1994).
In simulation studies, we observed that the simulated trajectories of the latent Markov chain, the locations of changepoints, typically show some "knots". These "knots" happens when the filtering probabilities of a changepoint location are not concentrated in a single uniformization time, but instead spread across several consecutive uniformization times. In this case, the backward sampling step will sometimes treat part of them or all of them as individual changepoints. This sort of "knot" will inflate the memory and computational costs if kept into the next iteration. A direct approach to pruning these "knots" is to slightly modify the backward sampling step by resampling the most recent changepoint before t several times according to (9) and choosing the most distant one from t as the most recent changepoint before t. Our numerical studies show that this is a very effective approach to prune the extra changepoints cheaply.
In step 2 of Algorithm 3, the changepoints themselves are treated as uniformization times in the next iteration, so that likely changepoint locations can be kept in following iterations.
The current model formulation is more flexible than multiple changepoint models with the number of changepoints fixed or the maximum number of changepoints bounded. It is also noted that when all q i s are given in advance, we may run an exact Monte Carlo sampling algorithm for changepoints. However, this is unrealistic in practice. As q i s in Q might be different, varying scale of run lengths between changepoints can be incorporated into the model, which is desirable in practice to avoid model bias. This approach is applicable for multiple changepoint models for any random process when the marginal likelihood can be exactly evaluated, which is of course evident when conjugate priors are introduced, for example for the exponential family of random processes, or the marginal likelihood can be numerically computed.
We call the Poisson rate in the uniformization scheme a resolution parameter. The use of a high intensity rate will lead to the generation of a large number of uniformization times, so that the positions of changepoints can be located in a finer time scale. However, both the memory cost and computational cost will inflate significantly, making the algorithm less efficient. Nevertheless, choosing a relatively small Poisson rate will reduce both the memory cost and computational cost on average, but the accuracy of estimation for the locations of changepoints will be lowered: see numerical examples in Section 4. The resolution parameter needs to be carefully tuned for a tradeoff between the costs and accuracies of the estimation.
However, even using Algorithm 2-3, it is still possible the uniformization scheme occasionally produces too many Poisson events (uniformization times), causing issues of inflated computational costs and memory costs. In this case, we may set an upper bound on the number of Poisson events. The truncation error can be quantified as follows: where . denotes the standard supremum norm andP L s,t is the truncated version of (4) at level k = L: see Van Dijk (2018).

Continuous-Time Viterbi Algorithm
In some cases, it is desirable to obtain maximum a posteriori (MAP) estimates of the set of changepoints. The MAP estimates of the set of changepoints, the optimal path of the latent Markov chain X(t), is selected from potentially an infinite number of trajectories, which is inaccessible in general. Nevertheless, when Y(t) is Markov jump process or (marked) Poisson process as illustrated in the next section. The following algorithm is able to retrieve exactly the most probable sample path by a continuoustime version of the Viterbi algorithm. Let u 1 < u 2 < · · · < u n be the set of event times, the arrival times of a (marked) Poisson process or the transition times of a Markov jump process respectively. Define For these Markov renewal processes, Viterbi recursion shows that for u k < u ≤ u k+1 and j = i or j = i + 1, in which log p X(u) = j, Y(u k , u] X(u k ) = i needs to be maximized over the path space from u k to u. When Y(t) is Markov jump process or (marked) Poisson process, log p X(u) = j, Y(u k , u] X(u k ) = i can be explicitly maximized: see Bebbington (2007), Lu (2019) and Lu (2020). It turns out that the most probable path of the latent Markov chain X(t) has state transitions exactly at the event times, the arrival times of a (marked) Poisson process Y(u) or the state transition epochs of the observed Markov jump process Y(u) respectively. As a result, the optimal set of changepoint locations is chosen exactly from these event times: see Lu (2019) and Lu (2020) for nearly the same argument. The search for the set of most probable changepoints can be equivalently implemented by a discrete-time version of the Viterbi algorithm (Viterbi, 1967).
Let f k (j) be the probability density of the most probable path of x(t) up to time u k upon reaching state j, and φ k (j) be the optimal state at time u k−1 for the path reaching state j at the next time step u k .
The memory cost in this case is quadratic in n, which is prohibitive for a long sequence of event times. Instead, we may first obtain the MAP estimates of the number of changepoints from the marginal posteriors. Conditioned on the number of changepoints, then, the MAP estimates of the locations of changepoints can be obtained from Algorithm 4 Continuous-time Viterbi Algorithm. 1. Initialize f 1 (j) = π 1 p X(u 1 ) = j, Y(0, u 1 ] X(0) = 1 and set φ 1 (j) = 0 for j = 1. 2. For k = 2, · · · , n and j = i or i + 1, recursively compute 3. Let x n = arg max j f n (j) and backtrack the state sequence as follows: Algorithm 4 is only suitable for special cases as demonstrated in the above examples. In general, we may obtain approximately the MAP estimates of changepoints via the uniformization scheme as follows. First, we obtain MAP estimates of the number of changepoints directly from the marginal posteriors. Conditioned on the number of changepoints, we may perform a randomized time-discretization for both X(t) and Y(t) through the uniformization scheme. Conditioned on a particular time-discretization of X(t) and Y(t), we carry out the discrete-time Viterbi algorithm to obtain a "MAP" estimate of the locations of changepoints. This randomized time-discretization for the latent Markov chain X(t) by the uniformization scheme and the following discrete-time Viterbi algorithm can be performed many times. We may choose the one which gives the highest posterior density of θ as the "MAP" estimate among the potentially many different segmentations. We call it a "randomized Viterbi algorithm".

Multiple Changepoint Models for Markov Jump Processes
Markov chains are widely used in modelling weather, bioinformatics, speech recognition and modern credit management, see Polansky (2007), Arnesen et al. (2016) and Xing et al. (2012). Let Y(t) be a r-state continuous-time finite Markov chain observed on [0, T ], specified by the transition rate matrix D X(t) . Y(t) is subject to abrupt changes at unknown locations corresponding to a random number of changepoints. We consider multiple changepoint models of Markov jump processes with the changepoint locations associated with a latent Markov chain X(t). When X(t) is in state k, Y(t) is parameterized by D k = (d ij (k)) r×r . The likelihood of the trajectory of Y(t), with the transition where the sufficient statistics N ij and T i are the total transition numbers of Y(t) from i to j and the overall sojourn time of Y(t) in state i respectively. By choosing priors of D from the Gamma distribution Γ(α, β), which is a conjugate prior, the marginal likelihood of Y(t) can be explicitly written as: We demonstrate two numerical examples for Markov jump processes. In the first numerical example, a 3-state Markov jump process is observed, which is partitioned by several changepoints at unknown locations. The trajectory of the observed Markov jump process and the exact locations of changepoints are displayed in the top of Figure 1. The parameters of the simulated Markov jump process are specified in Table 1. In this numerical example, each segment consists of 300 state transitions. We run the Gibbs sampler in Algorithm 3 a total of 6000 iterations, with the last 2800 iterations collected as independent draws. The posterior of the number of changepoints is indicated in the left top of Figure 2. In the figure, the dominant mode of the posterior of the number of changepoints is 3, which is consistent with the true model. The MAP estimate of changepoint locations given by the continuous-time Viterbi algorithm is listed in Table 1. In the uniformization scheme, the Poisson intensity rate is k times the changepoint recurrence rate. In this numerical example, we also demonstrate the scaling effects of k on the efficiencies and accuracies of the estimates. The right of Figure 2 indicates the central processing unit (CPU) time and the effective sample size per iteration with 3 different scalings k = 6, 3 and 11. From the figure, it is indicated that the computational cost per iteration is quadratic to the scaling coefficient k. However, the effective sample size is only roughly linear with respect to the scaling parameter. In all cases, we observed a bias of the estimates for the locations of changepoints by posterior means (see Figure  1), which is a price paid for pruning nearly repeated changepoints. In Figure 1, the blue dash lines mark the exact locations of changepoints and the solid red lines mark the posterior means of changepoint locations. It is observed that all the posterior means of changepoints are located to the left of the exact changepoint locations. The degree of bias is obviously dependent on the scaling: see Figure 1. The bias is significant when  Figure 1. This is a poor estimate of the uncertainties of changepoint locations due to low resolution. With increasing k, the 95% HPD intervals of changepoint locations are narrowed. Further increasing k above 11 is unnecessary as it won't improve the accuracies of the credible intervals much. The posterior means of the transition rate matrix of the observed Markov jump processes are listed in Table 1. The estimates are close to the true values. Some of the posteriors of the transition intensity rate and their 95% highest posterior density (HPD) intervals are shown in Figure 3. From the figure, it is apparent that the estimation is reasonably accurate. In this numerical example, to alleviate the memory cost and the computational cost, the number of Poisson events generated from the uniformization scheme is upper-bounded by 200. The truncation error can be evaluated by (10). In practice, this upper bound is rarely reached and the truncation error is negligible.
The second numerical example is still a multiple changepoint model for Markov jump processes. The observed two-state Markov jump process is partitioned into 5 segments by 4 changepoints. In this case, each segment consists of only 70 state transitions. The transition intensity rate matrices of the observed Markov chain in adjacent segments are close to each other. The true parameters of the observed Markov jump processes are listed in Table 2. This numerical example is designed to make the detection of changepoints more difficult. For this numerical example, the Gibbs sampler in Algorithm 3   was run for 1000 burn-in iterations followed by 5000 iterations, which are collected as posterior samples. In the uniformization scheme, the Poisson intensity rate is 6 times the changepoint recurrence rate. We set an upper bound of 300 for the number of uniformization times. In all the simulations, the number of uniformization times rarely reaches this bound. The truncation error is negligible. The left top of Figure 4 indicates the trajectory of the observed Markov jump process and the exact locations of changepoints. The posterior of the number of changepoints is indicated in the right top of In the third numerical example, we consider multiple changepoint models for a Poisson process Y(t) with the changepoint locations associated with a latent Markov chain X(t), so that when X(t) is in the i -th state, the intensity rate of Y(t) is parameterized by λ i . The stochastic intensity rate of the doubly stochastic point process Y(t) is specified by λ X(t) . The likelihood of a Poisson process with intensity rate λ is given by λ N e −λT , where N is the total number of Poisson arrivals and T is the overall observation time. After taking a conjugate prior Γ(α, β) and integrating out λ from the posterior, the marginal likelihood of Y(t) is exactly written as: A total of 10,000 Poisson events are partitioned into 5 segments by 4 changepoints, with 2,000 Poisson events in each segment. The Poisson intensity rates specified in the 5 segments are 4,1,3,5 and 2. For this relatively long sequence of observations, it would be very expensive to detect the sparsely distributed changepoints on an event-by-event scale. One may discretize the Poisson process into a sequence of discrete Poisson observations by splitting the time intervals evenly and applying extant methods to detect changepoints in a discrete-valued time series framework. To control the discretization error, one may choose a relatively small bin width. However, with increasing bin numbers, the computational costs and memory costs inflate quickly: see Fearnhead (2006). To lower the computational costs and memory costs, one may decrease the number of bins by choosing a large bin width. In this case, however, the discretization error will inflate and the estimation of the locations of changepoints will be at low resolution. The advantage of randomized time discretization via uniformization is that there is no time-discretization error and the changepoints can be detected directly on the scale of the number of changepoints.
For this numerical example, the Gibbs sampler in Algorithm 3 was run for 500 burnin iterations followed by 500 iterations, which are collected as posterior samples. In the uniformization scheme, the Poisson intensity rate is 7 times the changepoint recurrence rate. We set an upper bound of 300 for the number of uniformization times. The number of uniformization times rarely reaches this bound. With this uniformization scheme, the FFBS algorithm is efficient to simulate changepoints from a moderate number of uniformization times, which is often less than 100. The left top of Figure 5

Time-Varying Patterns of Deep Earthquakes via Multiple Changepoint Models
We apply the method to model the occurrence patterns of New Zealand deep earthquakes. The data set studied in this analysis is the New Zealand earthquake catalogue, freely available from GNS Science of New Zealand via Geonet (www.geonet. org.nz). We choose those events from the New Zealand catalogue within a poly-  Figure 7, it seems that the yearly counts of New Zealand deep earthquakes varies form time to time. Whether the time-varying behavior of the earthquake occurrence rate is associated with structural breaks or it is just random fluctuation is of importance for seismic hazard evaluation and seismic risk forecasts. The centralized and normalized cumulative sum of inter-event times Δt i , which is given by Δti − j n , is indicated in the bottom of Figure 7. This statistic behaves as a Brownian bridge on [0, 1] if the observations are from a time-homogeneous Poisson process. Otherwise, if the statistic deviates from the line segment: y = 0, 0 ≤ x ≤ 1 significantly, the observations should not be regarded as a homogeneous Poisson process: see Galeano (2007) and Yang and Kuo (2001) for the detection of changepoints of Poisson processes. From the bottom of Figure 7, it appears that a single-changepoint or a two-changepoint model might be suitable to characterize the time-varying behavior of the earthquake occurrence rate.
In this numerical example, the Gibbs sampling scheme iterates 8000 times, with the last 2,000 samples treated as the posterior samples. We set an upper bound of 200 for the number of Poisson event numbers produced by the uniformization scheme. The truncation error is negligible. The left top of Figure 8 shows the magnitude versus  Figure 8. The credible interval for the changepoint at 1987 looks a bit over-diffused, which might be attributed to the Poisson rate varying with some sort of trend. The MAP estimation of the changepoint locations is at 1957.5 and 1991.2. In this numerical example, we also run the RJMCMC (Green 1995) algorithm for a significantly longer time to obtain the posterior of the number of changepoints of deep earthquakes (Figure 9), which is very diffusive in comparison to the right top of Figure 8. Generally, these methods have problems of poor mixing and highly correlated posterior samples. Figure 9: The histogram of changepoint numbers produced from 40,000 posterior samples collected after a significant longer burning time via the RJMCMC method. The prior distribution of the number of changepoints is uniformly distributed between 0 and k max = 65. The prior distribution for the intensity rate parameter λ is Γ(2, 2).