Sequential change point test in the presence of outliers: the density power divergence based approach

In this study, we consider a problem of monitoring parameter changes particularly in the presence of outliers. To propose a sequential procedure that is robust against outliers, we use the density power divergence to derive a detector and stopping time that make up our procedure. We first investigate the asymptotic properties of our sequential procedure for i.i.d. sequences, and then extend the proposed procedure to stationary time series models, where we provide a set of sufficient conditions under which the proposed procedure has an asymptotically controlled size and consistency in power. As an application, our procedure is applied to the GARCH models. We demonstrate the validity and robustness of the proposed procedure through a simulation study. Finally, two real data analyses are provided to illustrate the usefulness of the proposed sequential procedure.


Introduction
We often face various events that can cause structural changes in underlying dynamics. In the field of finance, for example, global financial crises or changes of monetary policy can be such events. This changes are usually represented through structural breaks or parameter changes in a fitted model. The statistical analysis for testing or detecting such changes is referred to as change point analysis. Due to its importance in statistical inferences and actual practice, vast amount of papers have been devoted to this area. For historical background and general review, see Csörgö and Horváth (1997), Aue and Horváth (2013), Horváth and Rice (2014) and the references therein.
Most of the literature have dealt with the change point problem in the retrospective settings, that is, testing for structural or parameter changes in an observed (or historical) data set. However, since Chu et al. (1996) developed the sequential test procedure that performs a monitoring of parameter change using newly arrived data, this type of test procedure has also attracted special attention from many authors. See, for example, Leisch et al. (2000), Horváth et al. (2004), Berkes et al. (2004), Aue and Kokoszka (2006), Aue et al. (2014), Bardet and Kengne (2014), and Kirch and Weber (2018).
In this study, we are interested in a sequential test for parameter change, particularly in the presence of outliers. Indeed, our study is started from the empirical observation that parameter changes and deviating observations can coexist and, in this case, the existing parameter change tests are likely to draw an erroneous conclusion. Strictly speaking, when atypical observations are included in a data set being suspected of having parameter changes, whether the testing results are due to genuine changes or not cannot be readily determined. Such concern has been addressed, for example, in the recent studies by Song and Kang (2021) and Song (2020), where they reported that naive parameter change tests can be severely distorted by outliers.
The problem still applies to the case of the sequential test procedures. However, it should be more emphasized that the impact of outliers in the sequential settings may be more complex than in the retrospective settings because the influence of outliers can vary depending on the location of outliers. Here, we note that outliers can be included in a historical data or they can occur in arrived data and that both cases are also possible. In particular, the situation may be more difficult when there is no outlier in the historical data but outliers occur during the monitoring period. In such cases, the analysis is likely to be performed through a non-robust method and, of course, it is very possible that outliers occurring in arrived data mislead the sequential analysis. In this regard, there is more need to develop a robust sequential test procedure, but little effort has been made so far.
In the literature, robust parameter change tests were dealt with by several authors. Tsay (1988) investigated a procedure for detecting outliers, level shifts, and variance change in a univariate time series and Lee and Na (2005) and Kang and Song (2015) introduced a estimates-based CUSUM test using a robust estimator. Recently, Fearnhead and Rigaill (2019) proposed a robust penalized cost function for detecting changes in the location parameter. Song and Kang (2021) introduced robust tests based on the divergence introduced below and Song (2020) proposed a trimmed residual based robust test. These works, however, were conducted in the retrospective settings.
The aim of this study is to propose a robust sequential test procedure for parameter change. For this, we use the density power (DP) divergence to construct a detector that gives an alarm signal for indicating a parameter change in monitoring period. Since Basu et al. (1998) introduced DP divergence (DPD), the divergence has been successfully used in developing robust estimators. The main property of the DP divergence is that it provides a smooth bridge between Kullback-Leibler (KL) divergence and L 2 -distance. So, the estimators induced from the DP divergence, the so-called minimum DPD estimator (MDPDE), get efficient and robust properties. For more details, see Basu et al. (1998) and Fujisawa and Eguchi (2008). The DPD based tests were also introduced by several authors. Basu et al. (2013Basu et al. ( , 2016 used the objective function of the MDPDE to propose a Wald-type test and Song and Kang (2021) also introduced a DPD based score type test for parameter change, which has recently been extended to integer-valued models and dynamic factor models. See Kang and Song (2020), Kim and Lee (2020), and Kim et al. (2021). Like MDPDE, these DPD based tests were also found to inherit the robust and efficient properties from the DP divergence, thereby motivating us to consider a DPD based sequential test.
This paper is organized as follows. In Section 2, we develop a DPD-based sequential test for parameter change in i.i.d. sequences and investigate its asymptotic behaviours. In Section 3, we extend our method to general time series models and provide an application to GARCH models. We examine our test procedure numerically through Monte Carlo simulations in Section 4. Section 5 illustrates two real data applications and Section 6 concludes the paper. The technical proofs are provided in Appendix.

DP divergence based sequential change point test
We first review the MDPDE introduced by Basu et al. (1998) and provide some conditions for the strong consistency of the estimator. Then, we propose a detector and a stopping time for sequential test procedure based on the DP divergence.
For two density functions f and g, Basu et al. (1998) defined the DP divergence, d α (f, g), as follows: Here, note that the divergence becomes KL divergence and L 2 distance when α = 0 and α = 1, respectively. Since d α (f, g) converges to d 0 (f, g) as α → 0, the DP divergence with 0 < α < 1 provides a smooth bridge between KL divergence and L 2 distance. Let X 1 , · · · , X n be a random sample from an unknown density g and consider a family of parametric densities {f θ |θ ∈ Θ}. Then, the MDPDE with respect to the parametric family {f θ } is defined as the minimizer of the empirical version of the divergence d α (g, f θ ). That is, where As was well demonstrated in Basu et al. (1998), the efficiency of the estimator gets closer to that of the MLE as α decreases to 0 and it has strong robustness when α increases. That is, the tuning parameter α controls the trade-off between efficiency and robustness. In particular, the MDPDE with α close to 0 was found to tend to enjoy both high efficiency and robustness. Another virtue of the MDPD estimation procedure is that it can be conventionally applied to other parametric models, resulting in robust and efficient estimators in various models. See, for example, Ghosh and Basu (2013), Dierckx et al. (2013), and Song (2017). Throughout the paper, a major focus is made on the change point problem in the parametric framework. Hence, we assume that g belongs to {f θ }, that is, g = f θ 0 for some θ 0 ∈ Θ. To establish the limiting behaviors of the stopping time below, we introduce some assumptions. Particularly, the following three assumptions are made to ensure the strong consistency ofθ α,n . We assume that θ 0 belongs to the parameter space Θ.
A1. The parameter space Θ is a compact subset in R d .

A2
. The density f θ and the integral f 1+α θ (x)dx are continuous in θ.
A3. There exists a function B(x) such that |l α (x; θ)| ≤ B(x) for all x and θ and E[B(X t )] < ∞.
In the following subsection, we will introduce a detector and a stopping time for monitoring change in parameters using the MDPDE above and its objective function, and then investigate its asymptotic behaviors. Hereafter, · denotes any convenient vector norm and will also be used to denote induced matrix norm without causing confusion. That is, for a square matrix A and a vector x, A = sup x =1 Ax . For notational convenience, we use ∂ θ and ∂ 2 θθ to denote ∂ ∂θ and ∂ 2 ∂θ∂θ , respectively.

DPD based sequential change point test under the null hypothesis
Let {X 1 , · · · , X n } be i.i.d. observations from a density f θ 0 and suppose that we sequentially observe X n+1 , X n+2 , · · · . Here, the historical data {X 1 , · · · , X n } is assumed not to undergo parameter change. At a monitoring instant t = n+k, we wish to test the following null hypothesis, particularly in the presence of outliers: H 0 : θ 0 does not change over t ≤ n + k against H 1 : θ 0 changes at some time t ≤ n + k.
Sequential procedure for testing the hypotheses above consists of a detecting statistics, which is called detector, and a boundary function. When a detector crosses a pre-given boundary function, the sequential procedure gives an alarm signal indicating a parameter change. In the literature, mainly two types of detectors have been proposed: fluctuation based detectors and CUSUM based detectors. The first type detectors compare the estimates obtained in historical data and arrived data to calculate the fluctuations of the estimates. This approach was introduced by Chu et al. (1996) and has been generalized, for example, by Leisch et al. (2000) and Bardet and Kengne (2014). The second one considered by Chu et al. (1996 and Aue and Kokoszka (2006) usually use the CUSUM values of residuals to measure the stability of the fitted models. The score function based detector was also proposed by Berkes et al. (2004), which indeed can be considered as a CUSUM type detector. For more details on the sequential procedure, we refer the readers to the seminal paper of Chu et al. (1996).
In this study, we use the objective function of the MDPDE to propose a CUSUM type detector as follows: for each α ≥ 0, whereθ α,n andÎ α,n are the MDPDE in (1) and a consistent estimator for I α defined in assumption A6 below, respectively, and both are obtained from the historical data {X 1 , · · · , X n }. Now, let b(·) be a boundary function. Then, the sequential procedure is stopped as soon as D α,n (k) crosses b(k/n), rejecting H 0 . In other words, the stopping time of the procedure is defined as the first hitting time of the detector and may then be expressed as k α,n := min k ≥ 1 : If k α,n < ∞, H 0 is rejected and we conclude that θ 0 changed at some time t ≤ n + k α,n .
Remark 2. Since ∂ θ H α,n (θ α,n ) = 0, the proposed detector can be expressed as Here, we note that D α,n with α = 0 becomes the score based detector. This type of detector was introduced by Berkes et al. (2004). The score function is deduced from the KL divergence, so the detector D α,n with α > 0 can be thought of as a DP divergence version of the score based detector. Thus, the proposed procedure is expected to be able to adjust robustness and efficiency by controlling the tuning parameter α as does the MDPDE.
Remark 3. The robustness of the detector with α > 0 is obtained via the following two steps. First, as is well recognized in the previous studies, the MDPDEθ α,n with α > 0 estimates the true parameter robustly in the presence of outliers. Obviously, this must be a basic requirement in constructing a robust statistics. It is, however, not sufficient to make the sequential procedure robust against outliers. Note that the existing detectors are calculated from the observations, X 1 , · · · , X n+k , and the estimates,θ n . This means that the value of the detector can be distorted by outlying observations even if the true parameter is properly estimated. Thus, additional measures are needed to lessen the impact of outliers on the detectors. Our second step for the robustness is from the term ∂ θ H α,k in the proposed detector. This term gives a down-weight to the outlying observations, resulting in reducing the influence of outliers on our detector. This is actually similar to how the MDPDE has the robust property (cf. Basu et al. (1998)). To see this, observe that where U θ (x) = ∂ θ log f θ (x). By comparing with ∂ θ H α,k (θ) with α = 0 in the score based detector, one can see that ∂ θ H α,k (θ) with α > 0 provides density power weight, f α θ (X t ), to each U θ (X t ), whereas ∂ θ H 0,k (θ) gives equal weight. In sum, the robustness of the sequential procedure based on D α,n (k) with α > 0 is achieved via the robust estimatorθ α,n and the term ∂ θ H α,k (θ) that gives a down-weight to outliers.
In order to perform the sequential procedure with the stopping time k α,n , it needs to derive the following limiting probability: lim n→∞ P k α,n < ∞ | H 0 , which indeed represents the type I error of the test procedure. This limiting value is expressed as a function of the boundary function, so the boundary function may be chosen such that the above value is equal to a given significance level. For example, as in Remark 4 below, when a constant boundary function, i.e., b(·) = b, is employed, one can calculate the critical value b by solving the equation.
To establish the limiting behavior of k α,n , particularly in the case of α > 0, we impose further assumptions.
A4. The integral f 1+α θ (z)dz is differentiable two times with respect to θ and the derivative can be taken under the integral sign.
A5. The true parameter θ 0 is in the interior of Θ.
A7. The matrices J α and I α defined by exist and are positive definite.
We also assume the boundary function satisfies the following condition: B. The boundary function b(·) is continuous on (0, ∞) and inf t>0 b(t) > 0.
The following theorem is the first result in this study.
Theorem 2. Suppose that assumptions A1-A7 and B hold. Then, we have that for each α ≥ 0, Remark 4. If the maximum norm and a constant boundary functions, b(·) = b, are considered, then we have Since the summands in the infinite series decrease exponentially with increasing k, the critical value b can be accurately computed by numerical method. The critical values corresponding to the significance levels of 1%, 5%, and 10% are given in Table 1.

DPD based sequential change test under the alternative hypothesis
Let {X 0,t |t ∈ N} and {X 1,t |t ∈ N} be the sequences of i.i.d. random variables from f θ 0 and f θ 1 , respectively, where θ 0 = θ 1 . Denote the observations up to the monitoring time by {X 1 , · · · , X n+k } and consider the following alternative hypothesis: for some fixed k * > 0, X 1,t , t = n + k * + 1, · · · , n + k.
The alternative hypothesis represents a situation that the historical data {X 1 , · · · , X n } follows from the density f θ 0 and the parameter changes from θ 0 to θ 1 at t = n + k * + 1. To establish the limiting behavior of k α,n under H 1 , we assume the followings: A8. For some closed neighborhood N (θ 0 ) of θ 0 , E sup θ∈N (θ 0 ) ∂ θ l α (X 1,t ; θ) < ∞.
Remark 5. Selection of an optimal α can be an issue in actual practice. In the sequential test procedure, parameters are assumed not to change in the historical data, so one could consider to use the existing selection criteria such as Warwick (2005), Fujisawa and Eguchi (2006), and Durio and Isaia (2011) to determine an optimal α. However, it could not be appropriate because the degree of contamination in a monitoring period can be different from that in the historical data. As an illustrative example, consider a situation that outliers are not included in the historical data but the arriving data is contaminated by outliers. In this case, α = 0 is usually chosen as the optimal α, but the sequential procedure implemented with α = 0 is likely to be distorted by outliers occurring in the monitoring period. In short, determining α in advance may or may not be helpful depending on the situation, so it seems difficult to establish a systematic selection rule. Nevertheless, we recommend practitioners to use an α in [0.2, 0.3] based on our simulation results. This is because too large α such as α ≥ 0.5 can lead to a significance loss in powers and the procedure with α ≤ 0.1 can be influenced to some degree by outliers. The procedure with an α in [0.2,0.3] shows strong enough robustness while performing as efficiently as the score based procedure (see Section 4).
As aforementioned in Section 1, the MDPDE can be conveniently applied to various parametric models including time series models and multivariate models. Once such MDPDE is set up, our sequential test procedure can be extended to the corresponding models. In the following section, we develop a sequential change point test for time series models based on the DP divergence. As an application, the sequential test procedure for GARCH models are provided. All the remarks mentioned in this section still hold for the extended cases.

DP divergence based sequential change point test in time series models
We first briefly introduce MDPD estimation procedure in time series models. However, since we are focused not on estimation but on sequential testing procedure, the asymptotic properties of the MDPDE for time series models are not dealt with in detail. Indeed, the MDPD estimation procedure has been already applied to various time series models, for example, in GARCH models, multivariate Gaussian time series models, and time series models of counts, and its strong consistency and asymptotic normality were well established. See, for example, Lee and Song (2009), Kim and Lee (2011), and Kang and Lee (2014). Hence, in this section, we assume that the strong consistency and √ n-consistency are established, and aim to provide some sufficient conditions for deriving the asymptotic behaviors of our stopping time below. For simplicity of presentation, we consider univariate time series model.

MDPDE for time series models
Assume that a time series {X t |t ∈ Z} is strictly stationary and ergodic, and that the conditional mean and variance are specified by a d-dimensional parameter vector θ ∈ Θ. Then, the time series model that generates {X t } may be represented by , and { t |t ∈ Z} is a sequence of i.i.d. random variables from a density f with zero mean and unit variance. Various times series models such as ARMA models, GARCH-type models, and ARMA-GARCH models are included in the model above. It is noteworthy that, due to Theorem 20.1 in Billingsley (1995), each µ t (θ) and σ t (θ) can be expressed as a measurable function of {X t−1 , X t−2 , · · · } and θ. We assume that the parameter space Θ is a compact subset of R d and the true parameter θ 0 is in the interior of Θ. To apply MDPD estimation procedure, we consider the situation that the error distribution f is specified. Hereafter, we denote the process from the above model with θ by {X θ t |t ∈ Z} and X θ 0 t is simply denoted by X t except when θ 0 changes in the observations. Let f θ (x|F t−1 ) be the conditional density of X θ t given F t−1 . Then, the DP divergence between the two conditional densities f θ 0 (x|F t−1 ) and f θ (x|F t−1 ) is defined by Given the observations X 1 , · · · , X n , the MDPDE for the model (4) is obtained in the same manner as in (1) We note that due to (4), the conditional density is given by Without confusion, we share the same notations l α (X t ; θ), H α,n (θ), I α , and J α in Section 2. Throughout this section, l α (x; θ) is assumed to be twice continuously differentiable with respect to θ. We impose the further assumptions on {l α (X t ; θ)|t ∈ Z} as follows: for each α ≥ 0, M1. {l α (X t ; θ)|t ∈ Z} is strictly stationary and ergodic for each θ ∈ Θ.
When estimating the model (4), µ t (θ) and σ t (θ) are usually not explicitly obtained because of the initial value issue. In this case, they are generally approximated by using a recursion with some initial values chosen. In GARCH models and ARMA-GARCH models, one can find the approximated processes, for exmple, in Berkes and Kokoszka (2003) and Francq and Zakoïan (2004).
Hereafter, we denote the approximated processes by {μ t (θ)|t = 1 · · · , n} and {σ 2 t (θ)|t = 1 · · · , n}. Then the MDPDE actually used is given bŷ wherẽ Remark 7. In the case that the support of f is R, we can see that Since f 1+α (x)dx does not include any parameters, for each α, one can calculate the integral in advance analytically or numerically. For example, if t follows N (0, 1), the integral is equal to (1 + α) −1/2 . Hence, through replacing f 1+α (x)dx with the pre-obtained value, the computational burden in optimizing the above objective function can be significantly reduced.
As mentioned above, the strong consistency and √ n-consistency ofθ α,n is assumed to hold: M3.θ α,n converges almost surely to θ 0 and √ n(θ α,n − θ 0 ) = O P (1). Since one can see that E[l α (X t ; θ)] has the minimum value at the true parameter θ 0 . Therefore, since {l α (X t ; θ)|t ≥ 1} is strictly stationary and ergodic by assumption M1, the strong consistency of θ α,n can be shown by the standard arguments, for example, if the followings are satisfied: When fitting a model for which the MDPDE has not yet been established, one can verify the strong consistency by checking that the two conditions above hold for the model under consideration. Although we assume √ n-consistency ofθ α,n in assumption M3, this can indeed be derived under the conditions S1-S4 introduced in the following subsection. For more details, see Lemma 11 in Appendix.

DPD based sequential test in time series models
Suppose that {X 1 , · · · , X n } is a historical data from (4) with θ 0 and X n+1 , X n+2 , · · · are being observed sequentially. At monitoring time n + k, we wish to test the following null hypothesis: LetÎ α,n be a consistent estimator for I α obtained from the historical data. By assumption M2, I α,n is invertible for sufficiently large n. Hence, similarly to (2) and (3), we can define a DPD based detector and stopping time as follows: whereθ α,n is the MDPDE defined in (5) and obtained also from the historical data. The boundary function b(·) is assumed to satisfy assumption B in Section 2. We now present the following sufficient conditions under which the result in Theorem 2 holds also for the stopping timek α,n .
Condition S1 is required to use the functional central limit theorem (FCLT) and apply the Hájek-Rényi-Chow inequality in Lemma 10 below. This condition is commonly satisfied in stationary time series models. In particular, if the derivative can be taken under the integral in l α (X t ; θ), the condition is deduced from the fact that Condition S4 together with the continuity of ∂ 2 θθ l α (x; θ) is used to verify Lemma 7, which are very helpful for proving Lemma 8. As will be seen in Appendix, Lemmas 8-10 are key lemmas for deriving the asymptotic behavior ofk α,n . In most of time series models including GARCH-type terms, similar conditions like S2 and S3, usually the case of α = 0, are often established and usefully used in deriving asymptotic properties of the estimator such as QMLE.
Theorem 4. Suppose that assumptions M1-M3, B, and conditions S1-S4 hold. Then, we have that for each α ≥ 0, Next, we establish the asymptotic property ofk α,n under the alternative hypothesis below. To be more specific, let {X θ 0 t |t ∈ Z} and {X θ 1 t |t ∈ Z} be the strictly stationary and ergodic processes from the model (4) with the parameter θ 0 and θ 1 ( = θ 0 ), respectively. Then, the alternative hypothesis under consideration is expressed as follows : for some fixed k * > 0, To establish the consistency of the test procedure, we assume that assumptions M1 hold for {l α (X θ 1 the following condition is obviously fulfilled provided that the differentiation and the expectation are interchangeable. Theorem 5. Suppose that assumptions in Theorem 4 still hold for {X θ 0 t |t ≥ 1} and that assumptions M1 hold for {l α (X θ 1 t ; θ)|t ∈ Z}. If conditions S5-S7 are satisfied and sup t>0 b(t) < ∞, then we have that for each α ≥ 0, lim n→∞ P k α,n < ∞ | H 1 = 1.
Remark 8. Let {k n } be an increasing sequence of positive integers satisfying k n / √ n → ∞. In the proof of Theorem 5, we show that For more details of some theoretical results on delay time in sequential procedure, we refer the reader to Aue and Horváth (2004) and Aue et al. (2009).

Application of DPD based sequential test to GARCH models
Consider the following GARCH(p, q) model: where θ = (ω, α 1 , · · · , α p , β 1 , · · · , β q ) and { t |t ∈ Z} is a sequence of i.i.d. random variables with zero mean and unit variance. We assume that the process {X t |t ∈ Z} from the model (9) with the true parameter θ 0 is strictly stationary and ergodic. It is well known that GARCH model has a unique strictly stationary and ergodic solution if and only if the top Lyapunov exponent is strictly negative. For more details, see, for example, Francq and Zakoian (2019). We further impose the following assumption to ensure the stationarity and ergodicity of {σ 2 t (θ)|t ∈ Z}.
As a robust estimator for the GARCH model, Lee and Song (2009) introduced the following MDPDE:θ α,n = argmin wherel Here, the initial values could be arbitrarily chosen.
is strictly stationary and ergodic by assumption G1, {l α (X t ; θ)|t ∈ Z} also becomes a strictly stationary ergodic process and thus assumption M1 in Subsection 3.1 is satisfied. The following assumptions are further made to establish the asymptotics of the MDPDE above.
G3. If q > 0 , A θ 0 (z) and B θ 0 (z) have no common root, A θ 0 (1) = 1, and α 0p The following results are established by Lee and Song (2009), from which and Lemma 12 we can see that assumptions M2 and M3 hold.
As a consistent estimator of I α , let us consider For the consistency of the estimator, see Lemma 6 in Song and Kang (2021). Then, based on (11) andH α,n (θ) in (10), we can construct a detector and stopping time for GARCH models as in (7) and (8), respectively. We now check whether the conditions introduced in Subsections 3.2 are fulfilled. Under assumptions G1-G4, conditions S1-S3 were already shown in Lee and Song (2009) when proving the asymptotic normality of the MDPDE. For details, see (i) and (iv) on page 337 of Lee and Song (2009). Condition S4 can be found in Lemma 3 in Song and Kang (2021). Hence, the result in Theorem 4 holds for the GARCH models with Gaussian errors.
For the consistency of the test procedure, we assume that assumptions G1-G4 are also fulfilled for θ 1 ( = θ 0 ). Then, similarly to the above, assumptions M1-M3 are also satisfied for θ 1 . Since condition S7 is a generally accepted assumption when θ 1 = θ 0 , we just deal with conditions S5 and S6. For this, we introduce a further moment condition, that is, E[(X θ 1 t ) 4+ ] < ∞ for some > 0. For θ 1 satisfying this moment condition, Lemma 13 shows that where we note that Θ can be taken as any subset of Θ whose elements have components α i and β j bounded away from zero. Since θ 0 is in the interior of Θ, one can take a neighborhood N 3 (θ 0 ) satisfying conditions S5 and S6. Hence, the consistency of the test procedure is asserted.
Theorem 6. Letk α,n be the stopping time in (8) that is constructed using (11) andH α,n (θ) in (10). Suppose that assumptions G1-G4 hold for θ 0 and θ 1 ( = θ 0 ). If the boundary function b(·) satisfies assumption B, we have that under H 0 , Remark 9. The detectorD α=0,n (k) in the GARCH models above is essentially equal to the score based detector of Berkes et al. (2004) that is constructed based on the quasi-MLE of Berkes and Kokoszka (2003). Their sequential procedure has been recently extended to ARMA-GARCH models by Song and Kang (2020).

Simulation study
In the present section, we evaluate the finite sample performance of the proposed sequential procedure and compare with the score based procedure in the following GARCH(1,1) models: where { t } is a sequence of i.i.d. random variables from N (0, 1) and θ = (ω, α 1 , β 1 ). We use the maximum norm and the constant boundary function to implement the test procedures at the significance level of 5%. The corresponding critical value for GARCH(1,1) model is 2.632, which is given in Table 1. The sample sizes of the historical data under consideration are n =500, 1000, and 1500, and the detectorD α,n (k) is monitored until t = n + 2000. If the detectorD α,n (k) crosses over the critical value during the monitoring period, H 0 is rejected. The empirical sizes and powers are calculated as the ratio of the rejection number out of 2000 repetitions. We consider α values in {0, 0.1, 0.2, 0.3, 0.5}. As mentioned in Remark 9,D α,n (k) with α = 0 represents the score based detector. We shall check the robustness of the proposed procedure usingD α,n (k) with α > 0 and compare with the procedure with α = 0.

(i) Simulation results for uncontaminated cases
We first address the cases where data are not contaminated by outliers. The following two parameters are considered to evaluate the empirical sizes: where the second one is employed to see the performance in a more volatile situation. The empirical sizes are depicted in Figure 1, where k denotes the monitoring time after n. We first note that some oversizes are observed particularly when n=500 and k=2000 but all the empirical sizes become less than the nominal level 0.05 as n increases. This means that when no outlier exists, both of the score based and the proposed procedures perform properly under H 0 . Additionally, we can see that the empirical sizes decrease with an increase in α, and a little higher sizes are observed in the case of the parameter θ 2 .
To evaluate the empirical powers, we set the parameter θ 0 := (0.2, 0.2, 0.6) as the default parameter and then change to θ 1 , θ 2 , and θ 3 =(0.5, 0.2, 0.6) at k * = 250. The first change s a change to a less volatile state, which is set to be relatively larger than the second change to θ 2 . The third one reflects the level shift of the volatility without any change in α 1 and β 1 . The results are presented in Figure 2. One can see that all the procedures yields reasonably good powers approaching one after the change point k * =250. In particular, the empirical powers of the score based procedure approaches more rapidly. As expected, it is observed that the power of the proposed procedure decreases generally with an increase in α. It should be noted that although the powers of the procedure with a large α such as α = 0.5 get closer to one as k increases, the powers evaluated near after the change point are quite low compared to the procedure with α = 0 or α = 0.1. See, for example, the values at k = 500 in the middle and bottom panels in Figure  2. This indicates that too large α can decreases the efficiency of the proposed procedure under H 1 . When the change is comparatively large, all the procedures perform similarly, see the top panel. Our findings can also be seen in the box plots of the delay times, i.e.,k α n − k * , displayed in Figure 3, where we can see the general tendency that the smaller α, the shorter the detection delay tends to be. Table 2 presents the average delay times and its ratios to that of the score based procedure. The delay time tends to be longer as the sample size of the historical data n increases. The procedure with α in [0.1, 0.3] is observed to take about 1.1 to 1.5 times longer than the score based one in the second and the third changes.
Overall, our proposed procedure performs reasonably in the case of no outlying observations. The score based procedure generally outperforms the proposed procedure and the performance of our procedure decreases as α increases. It should also be noted that a large tuning parameter α can lead to some loss in efficiency and thus, based on the above results, we first recommend practitioner to use α less than 0.5 when data appears not to be contaminated or the degree of the contamination does not seem to be severe.    2: Average delay times of the score based procedure(α = 0) and the proposed procedure (α > 0). The figures in the brackets are the ratios of the average delay time to that of the score based procedure.
(ii) Simulation results for contaminated cases Next, we explore the performances in the contaminated cases. For this, we consider the following three scenarios: (i) outliers are observed in the historical data (say, H-type); (ii) outliers occur during the monitoring period (M-type); (iii) outliers are involved in both of the historical data and the monitoring period (HM-type). In the case (i), we generate the historical data {X 1 , · · · , X n } as the following scheme: X t = X t,o + s · p t · sign(X t,o ), where {X t,o } is the uncontaminated path from the GARCH(1,1) model above, s = 5 ω/(1 − α 1 − β 1 ), i.e., 5 times the standard deviation of the GARCH process, and {p t } is a sequence of i.i.d. random variables from Bernoulli distribution with parameter p. In this simulation, p = 0.03 is considered. {X t,o } and {p t } are assumed to be independent. For the case (ii), we use the same method to contaminate {X n+1 , · · · , X n+200 }, which describes the situation that outliers occurs before the change point k * = 250. In the case (iii), we contaminate both of the historical data and {X n+1 , · · · , X n+200 } using the aforementioned method.
The empirical sizes for the three contaminated cases are given in Figures 4 -6. As shown in the figures, the score based procedure exhibits severe size distortions for all three scenarios. On the other hand, the proposed procedure, in particular with α ≥ 0.2, yields fairly stable sizes. The procedure with α = 0.1 is observed to be somewhat distorted by outliers except for M-type case. As can be seen in Figure 5, the score based procedure is distorted near the point where the outliers occur whereas our procedure with α > 0 are hardly affected. From this, it can be surmised that the proposed procedure is more robust to outliers occurring in the monitoring period. Interestingly, the results of the third scenario shown in Figure 6 are similar to the first case in Figure 4. The procedure damaged by the outliers in the historical data seems to become insensitive to the newly occurring outliers.
The empirical powers and the corresponding box plots of the delay times under H-, M-, and HM-type contaminations are presented in Figures 7-12, respectively. From Figures 7 and 11, we can observe the significant power losses of the score based procedure whereas the procedure with α ≥ 0.2 shows the power curves similar to the ones in Figure 2, i.e, obtained in the uncontaminated cases. This indicates that the proposed procedure with α ≥ 0.2 is adequately robust against the outliers. Although some power loses are observed for smaller α = 0.1 in the first and third scenarios,  for all α values considered, our procedure shows strong robustness to outliers in the arrived data, see Figure 9, where the higher powers of the score based procedure near k=200 reflect that the score based procedure is compromised by the outliers in the monitoring period. The box plots in Figures 8 and 12 also consistently show the distortions of the score based procedure and the robustness of our procedure with α ≥ 0.2. The early detections of the score based procedure in Figure 10 are due to the outliers that occur before the change point. This phenomenon is clearly observed particularly when n is small. For example, in Table 3, we can see minus average delay times in the case of M-type outlier and n = 500. Here d α in the brackets is defined as follows: d α := Average delay time of the procedure with α in contaminated case Average delay time of the procedure with α in uncontaminated case .
A value of d α closer to one means that the procedure is less affected by the corresponding contamination. It can be seen in the table that d α with α ≥ 0.2 is much closer to one than d α=0 .
Overall, our findings strongly support the validity and robustness of the proposed sequential procedure. In particular, our procedure with a small α is sufficiently robust against outliers and powerful as the score based procedure. For the reasons mentioned in Remark 5, we do not select an optimal α, but we recommend to consider α values in [0.2, 0.3] based on our simulation results.   3: Average delay times of the score based procedure (α = 0) and the proposed procedure (α > 0) in the presence of outliers. The figures in the brackets are the ratio d α .

Real data analysis
This section presents two real data applications. The first one is given to describe that our sequential procedure works as well as the score based procedure when there is no outlying observation. In the second application, the data including some deviating observations are considered and we demonstrate that the score based procedure could result in a misleading result whereas the proposed procedure produces reliable result. We analyse the log return series of the S&P500 index from Jan 2000 to Dec 2004 and the Hang Seng index from Jan 1988 to Dec 1996. We fit the GARCH(1,1) model to each data set since each series shows typical features such as arch effect and it is also the most commonly used model in empirical practice. Considering the possibility of outliers being present in the historical data, we use a robust test to check whether a parameter change exists in the historical data. In this study, we employ the following robust test for parameter change introduced by Song and Kang (2021): where α ≥ 0 is a tuning parameter controlling the trade-off between efficiency and robustness as in MDPDE andÎ α is a consistent estimator of I α = E ∂ θ l α (X t ; θ 0 )∂ θ l α (X t ; θ 0 ) usually given  (11). Under the null hypothesis of no parameter change,T α n converges in distribution to is the d-dimensional standard Wiener process and d is the number of parameters. WhenT α n is large, H 0 is rejected and the change point is located as argmax 1≤k≤nT α n (k). We also note thatT α n with α = 0 becomes the score test for parameter change. As demonstrated in Song and Kang (2021),T α n with α close to zero can effectively detect changes in parameters when seemingly outliers are included in a data set being suspected of having parameter changes.
As in the simulation study above, we employ again the maximum norm and the constant boundary function to conduct the monitoring procedure. Here, we also consider α values in {0, 0.1, 0.2, 0.3, 0.5}. As mentioned in Remark 5, we do not try to select an optimal α. In stead, we implement the sequential procedure with each α and incorporate the results to make a decision. The log return series and the plot of the detectorD α,n (k) for the S&P500 index and the Hang Seng index are presented in Figures 13 and 14, respectively, where the dashed blue vertical line in left subfigure denotes the monitoring start date and the dashed horizontal line in the right subfigure indicates the critical value, 2.381, corresponding to the significance level of 10%. More detailed analyses are as follows.
(i) The S&P500 index from Jan 2000 to Dec 2004 As can be seen in the left part of Figure 13, there seems to be no observations that can be regarded as outlier compared to the log return series in Figure 14. We consider the series from Jan 2000 to Dec 2001 (n=499) as the historical data and start monitoring at the first trading day of Jan 2002 (k=1). In order to check the parameter constancy in the historical data, we first implement T α n for each α considered. The results are summarized in the first row of Table 4, where we can see that all the p-values ofT α n are greater than 10% and thus accept the null hypothesis that there exists no parameter change in the historical data.
The results of the sequential procedure, that is, the plots of the detectorD α,n (k), are presented in the right part of Figure 13. One can see that the all the plots show similar shapes going over the critical value about two years later. More exactly, each procedure stops atk α,n =546, 540, 539, 539, and 538 for α=0, 0.1, 0.2, 0.3, and 0.5, respectively, indicating that the parameter changed somewhere in the monitoring period. To locate the change point, for each α, we applyT α,n again to the return series up to each stopped point, that is, n(= 499) +k α,n . The change points, say c α , are obtained to be Aug 30, 2002 (t= 667) andNov 6, 2002 (t= 714)   T α,n and the estimated change points are obtained using the data up to t = n(= 499) +k α,n . The 1st (resp. 2nd) period covers from t = 1 (resp. t = c α + 1) to t = c α (resp. t = n +k α,n ).
α ∈ {0.3, 0.5}, respectively. The results of change point analyses and the parameter estimates are summarized in Table 5, where the estimates in the first and second periods are obtained based on the data up to t = c α and the data from t = c α + 1 to t = n +k α,n , respectively. The estimates in each row are obtained by the MDPDE with the corresponding α. We recall that the MDPDE with α = 0 becomes the MLE, hence the estimates in the first row are the results by the MLE. From the table, we can clearly see that the parameters are estimated differently before and after each change point: ω and α 1 are estimated to be smaller and β 1 is obtained to be greater in the 2nd period. It should be noted that the DPD based inferences prefer to use α close to zero when the data is not contaminated or the degree of contamination is small (cf. Basu et al. (1998) and Song and Kang (2021)). Since this example is presumed not to include outlying observations, we rely on the results obtained by α ≤ 0.2. Therefore, we conclude that t = 667 is a proper change point. The change point and the (ii) The Hang Seng index from Jan 1988 to Dec 1996 In this application, we start the monitoring at January 4, 1991 (k=1) and thus the series from 1988 to 1990 is considered as the historical data (n=741). As is clearly shown in the log return series in Figure 14, this data has some apparent outlying observations in the historical data. Since the score test for parameter change, i.e.,T α n with α = 0, could be unduly influenced by such observations (cf. Song and Kang (2021)), we useT α n with α > 0 to judge whether or not the parameters change over the historical observations. The test results are presented in the second row of Table 4. From the p-values ofT α n for α > 0, we accept the null hypothesis that the historical data does not undergo parameter change.
The plots ofD α,n (k) are provided in the right part of Figure 14. Unlike the previous analysis, the proposed and the score based procedures yield different results. That is, allD α,n (k) with α > 0 cross over the dashed horizontal line but the path ofD α,n (k) with α = 0 evolves under the line. In particular, it is noteworthy thatD α,n (k) with α = 0 shows a jump before long, which may be due to the deviating observation occurred before 1992, while allD α,n (k) with α > 0 move stably during that period. This would be taken as an indirect indication thatD α,n (k) with α > 0 is robust against such outlying observations. Each monitoring stops atk α,n =828, 804, 803, and 809 for α=0.1, 0.2, 0.3, and 0.5, respectively. After which, to locate a change point that may have occurred before each stopped date, we again conduct the testT α n using the data up to t = n +k α,n . In the case of α = 0, since the score based procedure does not give an alarm signal for parameter change,T α=0 n is applied to the whole series. Table 5 shows the test results and the parameter estimates before and after the change points. We first note thatT α n with α = 0 produces the p-value over 10% and thus it retains the null In the case of α = 0,T α,n and the estimates are obtained based on the whole series. The 1st (resp. 2nd) period covers from t = 1 (resp. t = c α + 1) to t = c α (resp. t = n +k α,n ).
hypothesis that the parameters do not change over the whole observations. In contrast, all the p-values ofT α n with α > 0 are obtained to be less than 1%, rejecting the null hypothesis, and eachT α n locates a change point before each stopped date. For each α > 0, we can see the distinct differences between the estimates in the first and second periods. Here, one may doubt that the differences might be due to outliers observed in the first period. We however note that the MDPDE as a robust estimator can reduce the impact of outliers on the estimation. This means that the differences in the estimates are highly likely due to the genuine change in the parameters. Hence, we primarily conclude that the parameter has changed in the monitoring period and presumably conclude that the score based procedure fails to catch the change due to the outlying observations. Recalling one of our findings in the simulation study that the procedure with α = 0.1 is somewhat affected by outliers, we estimate the change point based on the results obtained by α=0.2 and 0.3. Therefore, we decide that Apr 4, 1992 (t = 1056) is a suitable change point, which and the stopped point, Mar 23, 1994Mar 23, (t = 1544, are also depicted by the red solid and dotted lines in the left part of Figure 14, respectively.

Concluding remark
This study proposed a robust sequential procedure for monitoring parameter changes. We constructed the DP divergence based detector and investigated the asymptotic behaviors of the induced stopping time under the null and alternative hypotheses. In particular, we provided a set of sufficient conditions for time series models under which the proposed procedure has an asymptotically controlled size and consistency in powers. The simulation study showed that the score based procedure is sensitively influenced by outliers whereas our procedure is strongly robust to outliers. The usefulness of our procedure was also demonstrated in real data analysis, where the proposed procedure detected a change point that was missed by the score based procedure.
Extension to other models including multivariate models are also interesting. In particular, we expect that our sequential procedure can naturally applied to inter-valued time series models that have recently attracted lots of attention. We leave the extension to multivariate models as a possible topic of future study.

Appendix
In this appendix, we provide the proofs of Theorems 2-5 for the case of α > 0. We begin with two technical lemmas, which are usefully used in proving the lemmas and theorems below.
Lemma 1. Let {X n |n ≥ 1} be a sequence of random variables on a probability space (Ω, F, P ) such that X n converges almost surely to a random variable X. Then, for any fixed integer m, lim n→∞ sup k≥m X n+k = X a.s.
Proof. Let S := {ω ∈ Ω|X n (ω) → X(ω)} and Z n := sup k≥m |X n+k − X|. Then, for a fixed ω ∈ S and any > 0, there exists an integer N such that |X n (ω) − X(ω)| ≤ for all n ≥ N + m. This implies that Z N (ω) = sup k≥m |X N +k (ω) − X(ω)| ≤ . Since {Z n (ω)} is a decreasing sequence, one can see that Z n (ω) ≤ for all n ≥ N . Thus, we have that for all n ≥ N , which asserts the lemma.
Lemma 2. Let C(Θ, R d ) be a space of continuous functions from Θ to R d , where Θ is a compact subset of R d , and {f t |t ≥ 1} a stationary ergodic sequence of random elements in C(Θ, R d ).
If E sup θ∈Θ f t (θ) < ∞, then for a constant c > 0, where {k n } is an increasing sequence of positive integers.
Proof. It is sufficient to show the above for d = 1, so we assume that f t (θ) is a random element in C(Θ, R). We follow the arguments in the proof of Theorem 16(a) in Ferguson (1996).
Proof. First, observe that by assumption A6, Let N r (θ 0 ) := {θ ∈ Θ : θ − θ 0 ≤ r}. Noting that ∂ 2 θθ l α (X; θ) is continuous in θ, one can see from the dominate convergence theorem that lim n→∞ E sup Thus, for any > 0, we can take a neighborhood N (θ 0 ) such that E sup Sinceθ α,n converges almost surely to θ 0 , we have that for sufficiently large n, Noting that {l α (X t ; θ)} is a sequence of i.i.d. random variables, we have that Hence, it follows from Lemma 1 that which establish the lemma.
Proof. By Lemma 3, we have which yield the result.

Proof of Theorem 3
It suffices to show that there exists a sequence of real numbers, say {k n |n ≥ 1}, such that ∂ θ H α,n+kn (θ α,n ) √ n 1 + kn n P −→ ∞ as n → ∞.
This completes the proof.