Nonparametric sequential change-point detection for multivariate time series based on empirical distribution functions

The aim of sequential change-point detection is to issue an alarm when it is thought that certain probabilistic properties of the monitored observations have changed. This work is concerned with nonparametric, closed-end testing procedures based on differences of empirical distribution functions that are designed to be particularly sensitive to changes in the comtemporary distribution of multivariate time series. The proposed detectors are adaptations of statistics used in a posteriori (offline) change-point testing and involve a weighting allowing to give more importance to recent observations. The resulting sequential change-point detection procedures are carried out by comparing the detectors to threshold functions estimated through resampling such that the probability of false alarm remains approximately constant over the monitoring period. A generic result on the asymptotic validity of such a way of estimating a threshold function is stated. As a corollary, the asymptotic validity of the studied sequential tests based on empirical distribution functions is proven when these are carried out using a dependent multiplier bootstrap for multivariate time series. Large-scale Monte Carlo experiments demonstrate the good finite-sample properties of the resulting procedures. The application of the derived sequential tests is illustrated on financial data.


Introduction
Let X 1 , . . . , X m , m ≥ 1, be a stretch from a d-dimensional stationary times series of continuous random vectors with unknown contemporary distribution function (d.f.) F given by F (x) = P(X 1 ≤ x), x ∈ R d . These available observations will be referred to as the learning sample as we continue. The context of this work is that of sequential change-point detection: new observations X m+1 , X m+2 , . . . arrive sequentially and we wish to issue an alarm as soon as possible if the contemporary distribution of the most recent observations is not equal to F anymore. If there is no evidence of a change in the contemporary distribution, the monitoring stops after the arrival of observation X n for some n > m.
The theoretical framework of our investigations is that adopted in the seminal paper of Chu, Stinchcombe and White (1996). Unlike classical approaches in statistical process control (SPC) usually calibrated in terms of average run length (ARL) (see, e.g., Lai, 2001;Montgomery, 2007, for an overview) and leading in general to the rejection of the underlying null hypothesis of stationarity with probability one, the approach of Chu, Stinchcombe and White (1996) guarantees that, asymptotically, stationarity, if it holds, will only be rejected with a small probability α to be interpreted as a type I error and thus called the probability of false alarm. The latter paradigm is increasingly considered in the literature; see, e.g., Horváth et al. (2004), Aue et al. (2012a), Aue et al. (2012b), Fremdt (2015), Kirch and Weber (2018), Dette and Gösmann (2019) or Kirch and Stoehr (2019).
Among the approachesà la Chu, Stinchcombe and White (1996), one can distinguish between closed-end and open-end procedures. The latter can in principle continue indefinitely if no evidence against the null is observed. Our approach is of the former type in the sense that at most n − m new observations will be considered before the monitoring stops.
As already mentioned, the null hypothesis of the procedure that we shall investigate is that of stationarity and can be more formally stated as follows: H 0 : X 1 , . . . , X m , X m+1 , . . . , X n is a stretch from a stationary time series with contemporary d.f. F. (1.1) Notice that, if one is additionally ready to assume the independence of the observations, H 0 simplifies to H ind 0 : X 1 , . . . , X m , X m+1 , . . . , X n are independent random vectors with d.f. F. (1. 2) The aim of this work is to derive nonparametric sequential change-point detection procedures particularly sensitive to the alternative hypothesis H 1 : ∃ k ∈ {m, . . . , n − 1} such that X 1 , . . . , X k is a stretch from a stationary time series with contemporary d.f. F and X k +1 , . . . , X n is a stretch from a stationary time series with contemporary d.f. G = F.

(1.3)
In other words, unlike some of the approaches reported for instance in Kirch and Weber (2018), Dette and Gösmann (2019) or Kirch and Stoehr (2019), we are not solely interested in being sensitive to a change in a given parameter of the d-dimensional time series such as the mean vector or the covariance matrix. We aim at deriving nonparametric monitoring procedures that, in principle, provided m and n are large enough, can detect all types of changes in the contemporary d.f. In the considered context, the two main ingredients of a sequential change-point detection procedure are a sequence of positive statistics D m (k), k ∈ {m + 1, . . . , n}, and a sequence of suitably chosen strictly positive thresholds w m (k), k ∈ {m+1, . . . , n}. For k ∈ {m+1, . . . , n}, the statistic D m (k) (called a detector in the literature) is used to assess a possible departure from H 0 in (1.1) using only observations X 1 , . . . , X k and is such that the larger D m (k), the more evidence against H 0 . After the arrival of observation X k , k ∈ {m + 1, . . . , n}, the detector D m (k) is computed and compared to the threshold w m (k). If D m (k) > w m (k), the available evidence against stationarity is considered to be large enough and an alarm is issued resulting in the monitoring to stop. If D m (k) ≤ w m (k) and k < n, a new observation X k+1 is collected and the previous iteration is repeated. This monitoring process can be naturally represented by a graph illustrating the evolution of the sequence of detectors against the 2 This paper is organized as follows. In the second section, we propose three classes of detectors based on differences of empirical d.f.s and study their asymptotics, both under H 0 and H 1 . A more general perspective is adopted in the third section: for an arbitrary detector in the considered closed-end setting, a procedure for estimating the threshold function such that the probability of false alarm remains approximately constant over the monitoring period is investigated and its asymptotic validity is proven under both H 0 and H 1 when the estimation is based on an asymptotically valid resampling scheme. The results of the third section are applied in the fourth section to the three proposed classes of detectors based on differences of empirical d.f.s. To do so, the consistency of a dependent multiplier bootstrap for the detectors is proven under strong mixing. The fifth section presents a summary of large-scale Monte-Carlo experiments demonstrating the good finite-sample properties of the resulting sequential testing procedures. An application on real financial data concludes the article.
Auxiliary results and proofs are deferred to a sequence of appendices, some of which are provided in the supplement Kojadinovic and Verdier (2020). The studied tests will be soon available in the package npcp (Kojadinovic, 2019) for the R statistical system (R Core Team, 2019).

The detectors and their asymptotics
After defining three classes of detectors based on empirical distribution functions and noticing that they are margin-free under H 0 in (1.1), we study their asymptotics under the null and H 1 in (1.3).

Detectors based on empirical distribution functions
Let F j:k be the empirical d.f. computed from the stretch X j , . . . , X k of available observations. More formally, for any integers j, k ≥ 1 and x ∈ R d , let where the inequalities X i ≤ x are to be understood componentwise and 1(X i ≤ x) is equal to 1 (resp. 0) if all (resp. some) of the d underlying inequalities are true (resp. false). After the kth observation has arrived, the available data take the form of the stretch X 1 , . . . , X k . If we were in the context of a posteriori change-point detection, a prototypical test statistic would be the maximally selected Kolmogorov-Smirnov-type statistic practically considered for instance in Gombay and Horváth (1999) or Holmes, Kojadinovic and Quessy (2013). The intuition behind R k is the following: every j ∈ {1, . . . , k − 1} 4 is treated as a potential break point in the sequence and the maximum in (2.2) implies that R k will be large as soon as the difference between F 1:j and F j+1:k is large for some j.
The weighting j(k − j)/k 3/2 ensures that R k converges in distribution under stationarity as k → ∞. As explained for instance in Csörgő and Szyszkowicz (1994a) in the case of independent observations, replacing for instance these weights simply by √ k would result in a statistic that diverges in probability to ∞ under stationarity. The part j(k − j) in the weighting favors however the detection of potential break points in the middle of the sequence. Test statistics that are more sensitive to changes at the beginning or at the end of the sequence but still converge in distribution under stationarity can be obtained by considering weights of the form j(k − j)/{k 3/2 q(j/k)} for some suitable strictly positive function q on (0, 1); see, e.g., Csörgő and Szyszkowicz (1994b,a), Csörgő, Horváth and Szyszkowicz (1997) and .
Going back to the setting of sequential change-point detection considered in this work, a first meaningful modification of (2.2) is to restrict the maximum over j to j ∈ {m, . . . , k − 1} since a change cannot occur at the beginning of the sequence given that X 1 , . . . , X m is the learning sample known to be a stretch from a stationary sequence. Another modification is a rescaling consisting of replacing k 3/2 by m 3/2 in the weighting. The latter is made solely for asymptotic reasons as shall become clear in Sections 2.3 and 2.4. These modifications essentially lead to the first detectors considered in this work: (2.3) In the previous display, q is a strictly positive function whose role is to potentially give more weight to recent observations. In the sequel, we consider the parametric form q(s, t) = max{s γ (t − s) γ , δ}, 0 ≤ s ≤ t, (2.4) where δ ∈ (0, 1) is a small constant and γ is a parameter in [0, 1/2]. If γ = 0, q is the constant function 1 and R m,q (k) is then a straightforward adaptation of R k in (2.2) to sequential change-point detection. In that case, the general form of R m,q (k) can also be heuristically justified through a likelihood ratio approach; see Section 2 in Dette and Gösmann (2019). When γ = 1/2, as shall be discussed in Remark 2.6 (in Section 2.3) using asymptotic arguments, R m,q (k) can be regarded, under H 0 in (1.1), as a maximum of random variables with, approximately, the same mean and variance. This heuristically implies that all the potential break points j ∈ {m, . . . , k − 1} are given roughly the same weight in the computation of R m,q (k) unlike in the case γ = 0 in which potential break points closest to k/2 are given more weight. Hence, for certain types of alternatives to H 0 in (1.1), choosing γ ∈ (0, 1/2] might accelerate the detection of the corresponding change in the contemporary distribution of the underlying time series. Examples of such alternatives will be given in Section 5 in which the results of numerous Monte Carlo experiments are summarized. Two similar Cramér-von Mises-like detectors are also considered in this work. They are defined, for k ∈ {m + 1, . . . , n}, by (2.6) Remark 2.1. The detectors in (2.3) and (2.5) are related to those used in Ross and Adams (2012, pp 104-106). The latter are also based on differences of empirical d.f.s but deal only with independent univariate observations. The analogue of (2.3) is apparently defined as a maximum of the quantities sup x∈R |F 1:j (x) − F j+1:k (x)|, j ∈ {m, . . . , k − 1}, previously normalized using an empirical probability integral transformation, probably based on simulations. The analogue of (2.5) takes the form of a maximum of the quantities . . , k − 1}, after centering and scaling. The asymptotics of these detectors were not studied. Given that the detectors of Ross and Adams (2012) are distribution-free (see also Section 2.2 hereafter), that their approach assumes serially independent observations and is based on simulations, the absence of asymptotic theory is not problematic. Remark 2.2. The detectors in (2.3), (2.5) and (2.6) are almost of the Page-CUSUM type considered initially in Fremdt (2015); see also Kirch and Weber (2018). For instance, in the case of (2.3), the adaption of the latter construction to the present setting would have instead involved a maximum of the quantities sup x∈R d |F 1:m (x) − F j+1:k (x)|, j ∈ {m, . . . , k − 1}. As explained in Dette and Gösmann (2019), the use of such detectors may result in a loss of power in the case of a small learning sample and a rather late change point. In the Monte Carlo experiments carried out in Dette and Gösmann (2019), Page-CUSUM detectors were always outperformed by their analogues of type (2.3), which is why we do not consider them in this work. Remark 2.3. In addition to the detectors (2.3), (2.5) and (2.6), we also considered in our Monte Carlo experiments the following natural competitors which are straightforward adaptions of the so-called CUSUM construction considered for instance in Horváth et al. (2004) and Aue et al. (2012b). Their Kolomogorov-Smirnov versions and Cramér-von Mises versions are respectively given, for any k ∈ {m + 1, . . . , n}, by (2.8) The asymptotic theory for these detectors being simpler than for the detectors (2.3), (2.5) and (2.6), it will not be stated in the forthcoming sections for the sake of readability. 6

The detectors are margin-free under the null
The detectors defined previously are actually margin-free under H 0 in (1.1), a property that shall be exploited in the forthcoming sections to carry out the corresponding sequential change-point detection procedures.
Recall that X 1 , . . . , X n are assumed to be continuous random vectors. Saying that the detectors are margin-free under the null means that they do not depend on the d univariate margins F 1 , . . . , F d of F (the unknown d.f. of X 1 ) or, equivalently, that they can alternatively be written in terms of the unobservable random vectors U 1 , . . . , U n defined from X 1 , . . . , X n through marginal probability integral transformations: (2.9) Notice that we can recover the X i from the U i by marginal quantile transformations: where, for any univariate d.f. G, G −1 denotes its associated quantile function defined by with the convention that the infimum of the empty set is ∞.
To verify that the detectors are margin-free under the null, for any integers j, k ≥ 1 and u ∈ [0, 1] d , let be the analogue of F j:k in (2.1) based on the U i in (2.9). For any j ∈ {1, . . . , d}, by (right) continuity of F j , we have that 1{F − j (u) ≤ x} = 1{u ≤ F j (x)} for all u ∈ [0, 1] and x ∈ R; see, e.g., Proposition 1 (5) in Embrechts and Hofert (2013). The latter property combined with (2.10) implies that, under H 0 in (1.1), for any i ∈ {1, . . . , n}, where F (x) = (F 1 (x 1 ), . . . , F d (x d )). Hence, for any k ∈ {m + 1, . . . , n}, Similarly, In the case of univariate independent observations, the margin-free property under the null implies that the detectors are distribution-free under the null. When d > 1, this is not true anymore as the null distribution of the detectors depends on the copula C associated with F . The latter is merely the d.f. of the random vector U 1 obtained through (2.9). Equivalently, C is a d-dimensional d.f. with standard uniform margins further uniquely defined (see Sklar, 1959) through the relationships Remark 2.4. To be able to handle both the univariate and the multivariate situations, in the rest of the paper, we adopt the convention that C is the copula associated with F when d > 1 and merely the identity function when d = 1.

Asymptotics of the detectors under the null
As shall become clear in the forthcoming sections, the knowledge of the asymptotic behavior of the detectors under H 0 in (1.1) is instrumental in showing the asymptotic validity of the corresponding sequential change-point detection procedures. To study these asymptotics, we follow Dette and Gösmann (2019), among others, and set n = m(T + 1) for some fixed real number T > 0. This will imply that, in the asymptotics, as the size m of the learning sample goes to infinity, the maximum number of new observations considered in the monitoring increases proportionally. Let ∆ = {(s, t) ∈ [0, T + 1] 2 : s ≤ t} and let λ m (s, t) = ( mt − ms )/m, (s, t) ∈ ∆. (2.14) Then, for any (s, t) ∈ ∆ and u ∈ [0, 1] d , let where C 1: ms and C ms +1: mt are generically defined by (2.12), and let where q is defined in (2.4). Notice that, with the definitions adopted thus far, G m (s, s, ·) = G m,q (s, s, ·) = 0 for all s ∈ [0, T + 1]. For any k ∈ {m + 1, . . . , n} with n = m(T + 1) and any j ∈ {m, . . . , k − 1}, there exists (s, t) ∈ ∆ ∩ [1, T + 1] 2 such that k = mt and j = ms . We can thus write R m,q (k) as (2.17) Similarly, it can be verified that As we continue, we adopt the convention that R m,q (m) = S m,q (m) = T m,q (m) = 0. Furthermore, given a set S, the space of all bounded real-valued functions on S equipped with the uniform metric is denoted by ∞ (S). The main purpose of this section is to study the asymptotics under the null of the elements R m,q , S m,q and T m,q of ∞ ([1, T + 1]) defined respectively, for any t ∈ [1, T + 1], by (2.20) Specifically, we will provide conditions under which they converge weakly in the sense of Definition 1.3.3 in van der  under H 0 in (1.1). Throughout the paper, this mode of convergence will be denoted by the arrow ' ' and all convergences will be for m → ∞ unless mentioned otherwise. From the expressions given in (2.17), (2.18) and (2.19), we see that, under the null, the detectors studied in this work are functionals of G m,q in (2.16), and thus of G m in (2.15). Under stationarity, the latter is in turn a functional of the sequential empirical process defined, for any s ∈ [0, T + 1] and u ∈ [0, 1] d , by (2.21) Indeed, under H 0 in (1.1), for any (s, t) ∈ ∆ and u ∈ [0, 1] d , In the forthcoming asymptotic results, we shall assume that the underlying stationary sequence (X i ) i∈Z (or, equivalently, the corresponding unobservable stationary sequence (U i ) i∈Z defined through (2.9)) is strongly mixing. Denote by F k j the σ-field generated by (X i ) j≤i≤k , j, k ∈ Z ∪ {−∞, +∞}, and recall that the strong mixing coefficients corresponding to the stationary sequence (X i ) i∈Z are then defined by α X 0 = 1/2, and that the sequence (X i ) i∈Z is said to be strongly mixing if α X r → 0 as r → ∞. The following result is proven in the supplement Kojadinovic and Verdier (2020).
Proposition 2.5. Assume that H 0 in (1.1) holds and that, additionally, X 1 , . . . , X n is a stretch from a stationary sequence (X i ) i∈Z of continuous d-dimensional random vectors whose strong mixing coefficients satisfy α X r = O(r −a ) for some a > 1 as r → ∞. Then, where B C is a tight centered Gaussian process with covariance function with ∧ the minimum operator and where G m and G m,q are defined in (2.15) and (2.16), respectively, and, for any (s, t) ∈ ∆ and u ∈ [0, 1] d , (2.25) Furthermore, for any interval [t 1 , t 2 ] ⊂ [1, T + 1] such that t 2 > 1, the distributions of sup t∈[t 1 ,t 2 ] R C,q (t), sup t∈[t 1 ,t 2 ] S C,q (t) and sup t∈[t 1 ,t 2 ] T C,q (t) are absolutely continuous with respect to the Lebesgue measure.
Combined with a generic result on the threshold estimation procedure to be stated in Section 3 and additional bootstrap consistency results to be stated in Section 4, the last claims of Proposition 2.5 constitute a first step in proving that the derived change-point detection procedures hold their level asymptotically. Remark 2.6. Proposition 2.5 can be used to heuristically justify the form of the weight function q in (2.4) appearing in the expression of the detectors (2.3), (2.5) and (2.6). From (2.24), for any (s, t) ∈ ∆ and u ∈ [0, 1] d , we obtain that As a consequence, for any 1 ≤ s < t ≤ T + 1 and u ∈ [0, 1] d , Under the conditions of the proposition, when γ = 1/2 and m is large, we could then expect that, very roughly, Var{G m,q (s, t, u)} ≈ Var{s −1/2 (t − s) −1/2 G C (s, t, u)} does not depend on s and thus regard the quantities sup u∈[0,1] d G m,q (j/m, t, u), j ∈ {m, . . . , mt − 1}, appearing in the expression of R m,q ( mt ) in (2.17) as random variables with, approximately, the same mean and variance. The latter conveys the intuition that, when γ = 1/2, all the potential break points j ∈ {m, . . . , mt − 1} are given roughly the same weight in the computation of R m,q ( mt ). This is, of course, only approximately true because of the presence of the constant δ in the expression of the weight function q in (2.4). In practice, the setting γ = 1/2 might accelerate the detection of certain types of changes.

Asymptotics of the detectors under H 1
Under H 1 in (1.3), the detectors are not margin-free anymore. As we shall see in the forthcoming proposition, their asymptotic behavior is then a consequence of that of the process the empirical d.f.s F 1: ms and F ms +1: mt are generically defined by (2.1), and q is defined in (2.4).
The following result is proven in the supplement Kojadinovic and Verdier (2020). The arrow ' P →' in its statement denotes convergence in probability.
Proposition 2.7. Assume that H 1 in (1.3) holds with k = mc for some c ∈ (1, T +1). Assume additionally that X 1 , . . . , X mc , denoted equivalently by Y 1 , . . . , Y mc , is a stretch from a stationary sequence (Y i ) i∈Z of continuous d-dimensional random vectors whose strong mixing coefficients satisfy α Y r = O(r −a ) for some a > 1 as r → ∞, and that X mc +1 , . . . , X m(T +1) , denoted equivalently by Z mc +1 , . . . , Z m(T +1) , is a stretch from a stationary sequence (Z i ) i∈Z of continuous d-dimensional random vectors whose strong mixing coefficients satisfy α (2.27) Combined with a generic result on the threshold estimation procedure to be stated in the forthcoming section and additional results on the asymptotic validity of adequate resampling methods to be stated in Section 4, the three last claims of the previous result will be instrumental in showing that the derived change-point detection procedures have asymptotic power one under H 1 .

A generic threshold estimation procedure
In the studied context, the second ingredient of a sequential change-point detection procedure is a set of strictly positive thresholds to which detectors will be compared. In this section, we consider a generic threshold estimation procedure that can be employed with any type of detector, and provide conditions under which it is asymptotically valid. The derived results will be applied in the next section to establish the asymptotically validity of sequential change-point detection procedures based on the detectors studied in Section 2.

A constant probability of false alarm at each step
Within the context of closed-end monitoring from time m + 1 to time n, let D m (k), k ∈ {m + 1, . . . , n}, be arbitrary detectors. As discussed in the introduction, it seems natural to choose the corresponding thresholds w m (k), k ∈ {m+1, . . . , n}, so that, under H 0 in (1.1), the probability of rejection of H 0 is the same at every step k ∈ {m + 1, . . . , n} of the procedure. More formally, this idea, possibly first appearing in Margavio et al. (1995) (see also, e.g., Hawkins and Zamba, 2005;Ross, 2014) consists of choosing the w m (k), k ∈ {m + 1, . . . , n}, such that, under stationarity, for some small ξ m > 0, and, for all k ∈ {m + 2, . . . , n}, We then obtain that, under H 0 , Given a desired significance level α ∈ (0, 1/2) for the sequential testing procedure, a simple way to ensure that (1.4) holds under H 0 is then to choose ξ m such that 1 − α = (1 − ξ m ) n−m , that is, ξ m = 1 − (1 − α) 1/(n−m) . As one can see from (3.1), w m (m + 1) is then a quantile of order (1 − α) 1/(n−m) of D m (m + 1) under stationarity and, for any k ∈ {m + 2, . . . , n}, Before we discuss the estimation of the thresholds and its validity, let us give an alternative view of (3.1). In Sections 2.3 and 2.4 in which n was taken equal to m(T + 1) , we saw 12 that the asymptotic results for the detectors are given in terms of elements of ∞ ([1, T + 1]). With the convention that D m (m) = w m (m) = 0, another equivalent way of looking at sequential change-point detection procedures of the considered type is then to consider that the piecewise constant detector function . Some thought reveals that (3.1) is then equivalent to choosing the threshold function τ m such that, under H 0 in (1.1), for any k ∈ {1, . . . , n − m},

A formulation compatible with asymptotic validity results
With n = m(T + 1) , the threshold setting procedure as given in (3.1) or (3.3) makes no sense asymptotically since the number of (conditional) probabilities tends to infinity as m → ∞. A natural solution consists of keeping the number of probabilities fixed, or, equivalently, of considering a time grid that does not depend on m. Let p ≥ 1 and let t 0 = 1 < t 1 < · · · < t p = T + 1 be a fixed uniformly spaced time grid such that T /p > 1/m (a condition that will always be satisfied for m large enough). Let τ m be a piecewise constant threshold function taking the value g i,m on the interval with the convention that I 0 = ∅. Some thought reveals that the formulation in (3.4) is equivalent to choosing τ m such that, under H 0 in (1.1), (3.5) In other words, g 1,m is a quantile of order (1 − α) 1/p of sup t∈I 1 D m (t) under stationarity and ,m under stationarity. Notice that the suprema in (3.5) are actually maxima since D m is a piecewise constant function. Remark 3.1. A further generalization of (3.4) or, equivalently (3.5), would be to consider that τ m is not necessarily piecewise constant but only defined up to a multiplicative constant 13 on each of the intervals I i , i ∈ {1, . . . , p}. For instance, it could have one of the parametric forms considered in Dette and Gösmann (2019, Section 5), among others. For the sake of simplicity, we shall not however consider such an extension in this work.

Estimation of the threshold function
As we continue, we shall focus on the threshold setting procedure as formulated in (3.4) or, equivalently, (3.5), mostly because its asymptotic validity can be studied. To estimate the threshold function τ m in (3.4), or, equivalently, the g i,m , i ∈ {1, . . . , p}, in (3.5), it is thus necessary to be able to compute, at least approximately, the distribution of the p-dimensional random vector (3.6)

Monte Carlo estimation and asymptotic validity
Assume that the observations to be monitored are univariate and independent, and that D m is distribution-free under H ind 0 in (1.2). Notice that the latter implies that so is the random vector (3.6). To obtain a Monte Carlo estimate of the distribution of (3.6), it then suffices to consider a large integer M , generate M independent samples U m (t), for any i ∈ {2, . . . , p} and x ∈ R, . . , p}, are the associated quantile functions generically defined by (2.11). Notice that, in this particular case, the resulting estimate τ M m of the threshold function τ m does not at all depend on the learning sample.
By taking a sufficiently large M , the Monte Carlo estimates g M i,m , i ∈ {1, . . . , p}, can be made arbitrarily close to the quantiles g Interestingly enough more can be said as a consequence of the fact that Monte Carlo simulation can be regarded as a particular resampling scheme. As shall become clear in the next section, the general result stated in 14 Theorem 3.3 hereafter can actually be used to show the asymptotically validity of the Monte Carlo based threshold estimation procedure when both m and M tend to infinity, under both H ind 0 in (1.2) and H 1 in (1.3). This is discussed in more detail in Remark 3.4 below.

Bootstrap-based estimation and asymptotic validity
In settings in which D m is not distribution-free anymore, a natural alternative is to rely on a resampling scheme making use of the available learning sample X 1 , . . . , X m known to be under H 0 in (1.1). Specifically, let B be a large integer and suppose that we have available bootstrap replicates D . . , X m and depending on additional sources of randomness involved in the resampling scheme. Mimicking the previous situation in which D m was distribution-free, let m (t), and, for any j ∈ {2, . . . , p} and x ∈ R, As we shall see below, the main result of this section is that, essentially, as soon as the underlying resampling scheme for D m is consistent, the above bootstrap-based version of the threshold setting procedure (3.5) is asymptotically valid in the sense that, under m converges weakly to the weak limit of D m in ∞ ([0, T + 1]) conditionally on X 1 , X 2 , . . . in probability". A rigorous definition of the underlying mode of convergence is more subtle than that of weak convergence. From Lemma 3.1 of , the aforementioned validity statement is actually equivalent to the joint unconditional weak convergence of D m and two bootstrap replicates to independent copies of the same limit. Throughout the paper, all our bootstrap asymptotic validity results will take that form.
The following general result is proved in Appendix B.
Theorem 3.3. Assume that, under H 0 in (1.1), F are independent copies of D F . Assume furthermore that the random vector sup t∈I 1 D F (t), . . . , sup t∈Ip D F (t) has a continuous d.f. Then,under H 0 and, for any i ∈ {2, . . . , p}, As a consequence, on one hand, under Remark 3.4. Consider the Monte Carlo estimation setting of Section 3.3.1 in which the observations to be monitored are univariate independent and D m is distribution-free. Then, The aim of this section is to apply the generic results of the previous section to estimate the threshold functions for the empirical d.f.-based detector functions R m,q , S m,q and T m,q defined in (2.20). We distinguish two situations for the observations to be monitored: the independent univariate case and the possibly multivariate, time series case.

Monte Carlo estimation in the independent univariate case
As verified in Section 2.2, the detector functions R m,q , S m,q and T m,q defined in (2.20) are margin-free under H 0 in (1.1). In the univariate case, they are thus distribution-free. When dealing with independent univariate observations, one can therefore proceed exactly as explained in Section 3.3.1 to estimate the corresponding threshold functions. Furthermore, from Proposition 2.5, Remark 3.4 and Proposition 2.7, we know that the assumptions of Theorem 3.3 are satisfied. The latter then implies that the corresponding sequential change-point detection procedures are asymptotically valid both under H ind 0 in (1.2) and H 1 in (1.3).

A dependent multiplier bootstrap in the time series case
When the monitored observations are multivariate or exhibit serial dependence, the approach considered in Section 4.1 is not meaningful anymore. Having the asymptotic results of Sections 2.3 and 3.3.2 in mind, our aim in the considered time series context is to define suitable bootstrap replicates of B m in (2.21) such that, following Remark 3.2, B m and two of its replicates jointly weakly converge to independent copies of the process B C defined in Proposition 2.5. Subsequently defining corresponding bootstrap replicates of the detectors functions R m,q , S m,q and T m,q defined in (2.20) will lead to asymptotically valid corresponding sequential change-point detection procedures.
Following Bühlmann (1993, Section 3.3) and Bücher and Kojadinovic (2016), we opted for a dependent multiplier bootstrap in the considered time series context. In the rest of the paper, we say that a sequence of random variables (ξ i,m ) i∈Z is a dependent multiplier sequence if: i,m ) i∈Z , b ∈ N, be independent copies of the same dependent multiplier sequence. If we had a learning sample of size n = m(T + 1) , following , a natural definition of a dependent multiplier replicate of B m in (2.21) would bě where C 1:n is generically defined by (2.12). Since threshold functions need to be estimated prior to the beginning of the monitoring and the learning sample is only of size m, we consider a time-rescaled version ofB [b] m in which, roughly, m = (m/n)m m/(T + 1) and m play the role of m and n, respectively. Hence, in the considered context, our definition of a dependent multiplier replicate of B m iŝ thereby translating the fact that we can only rely on functionals computed from the learning sample to approximate the variability of the detector functions under the null. From the two previous displays, we see that the multipliers act as random weights and that the bandwidth m defined in Assumption (M2) plays a role somehow similar to that of the block length in the block bootstrap of Künsch (1989). Note that, in our Monte Carlo experiments to be presented in Section 5, m was estimated from the learning sample X 1 , . . . , X m 17 as explained in detail in Section 5.1 of  while corresponding dependent multiplier sequences were generated using the so-called moving average approach based on an initial standard normal random sample and Parzen's kernel as precisely described in Section 5.2 of the same reference. The latter construction based on a time-rescaling suggests to form a dependent multiplier replicate of G m in (2.22) aŝ with its weighted version beinĝ where λ m is defined as in (2.14). Finally, for any b ∈ N and t ∈ [1, T + 1], let be dependent multiplier replicates of R m,q , S m,q and T m,q , respectively, defined in (2.20), where C 1: m t is defined generically by (2.12). The definitions given in (4.3) hide the fact that the proposed dependent multipliers replicates of the detector functions R m,q , S m,q and T m,q actually depend on the learning sample X 1 , . . . , X m . To verify that this is the case, for any b ∈ N, (s, t) ∈ ∆ and x ∈ R d , let F Since (2.13) always holds for all i ∈ {1, . . . , m}, we immediately obtain that, for any b ∈ N, where F 1:m is generically defined by (2.1), and furthermore that, for any t ∈ [1, T + 1], The following result is proven in the supplement Kojadinovic and Verdier (2020).
Proposition 4.1. Assume that H 0 in (1.1) holds and that, additionally, X 1 , . . . , X n is a stretch from a stationary sequence (X i ) i∈Z of continuous d-dimensional random vectors whose strong mixing coefficients satisfy α X r = O(r −a ) for some a > 3 + 3d/2 as r → ∞.
m are defined in (4.2), B C is the weak limit of B m defined in Proposition 2.5, and B [1] C and B [2] C are independent copies of B C .
As a consequence, C,q , b ∈ {1, 2}, are independent copies of R C,q , S C,q and T C,q , respectively.
The last claims of the previous proposition along with the last claim of Proposition 2.5 and Proposition 2.7 are the assumptions of Theorem 3.3 for D m ∈ {R m,q , S m,q , T m,q }. It follows that the sequential change-point detection procedures based on these detector functions carried out as explained in Section 3.3.2 using the above dependent multiplier replicates are asymptotically valid under H 0 in (1.1) and H 1 in (1.3). Note that, in practice, since in the considered approach m = (m/n)m and m play the role of m and n, respectively, the largest possible value for p, the number of steps of the estimated threshold function τ B m , is m − m and, in this case, each of the p estimated thresholds covers approximately n/m time steps in the monitoring.

Monte Carlo experiments
Large-scale Monte Carlo experiments were carried out to investigate the finite-sample properties of the studied sequential change-point detection procedures. The aim was in particular to try to answer the following questions: • How well do the procedures hold their level, in particular, when the threshold functions are estimated using the dependent multiplier bootstrap of Section 4.2? • What is the influence of the number of steps p of the estimated threshold function (see Sections 3.2 and 3.3) on the distribution of the false alarms? • What is the effect of p on the power and the mean detection delay (the latter is the expectation under H 1 of the difference between the time at which the change was detected and the time k at which the change really occurred)? • What is the effect of the parameter γ appearing in the expression of the weight function q defined in (2.4) on the power and mean detection delay?
• How do the detectors R m,q , S m,q , T m,q , P m and Q m defined in (2.3), (2.5), (2.6), (2.7) and (2.8) compare in terms of power and mean detection delay? • How do the derived procedures compare with similar, more specialized procedures in terms of power and mean detection delay?
We tried to answer these questions in detail in the univariate independent case when the estimation of the threshold functions of the sequential change-point detection procedures can be rightfully so based on the Monte Carlo approach described in Sections 3.3.1 and 4.1. When the observations to be monitored are not univariate or independent so that resampling as described in Section 4.2 is needed for the estimation of the threshold functions, we essentially investigated how well the procedures hold their level depending on the underlying data generating mechanism. Although many other questions could be formulated given the complexity of the problem, the following experiments should allow the reader to grasp the main finite-sample properties of the studied procedures.

Monte Carlo estimation in the independent univariate case
As already discussed in Section 3.3.1, the estimation of the threshold functions when monitoring independent univariate observations can be made arbitrarily precise by increasing the number M of Monte Carlo samples. We used the setting M = 10 5 in our experiments and estimated all the rejection percentages from 10 4 samples. The change-point detection procedures were always carried out at the α = 5% nominal level.

Under the null
Unsurprisingly, all the studied tests were found to hold their level very well. Furthermore, as could have been expected given the fact that the studied detector functions have a tendency to be increasing on average, it was observed that the setting p = 1 results in a concentration of false alarms at the end of the monitoring period, while the larger p, the more uniform the distribution of the false alarms over the monitoring period (these unsurprising findings are illustrated in Figure 1 of the supplement Kojadinovic and Verdier (2020)).

Change in mean
To answer the aforementioned questions related to the behavior of the procedures under H 1 in (1.3), we first considered a simple experiment consisting of a change in the expectation of a normal distribution. Specifically, m, k , n, F and G in H 1 were taken equal to 50, 75, 100, the d.f. of the standard normal and the d.f. of the N (δ, 1), respectively. The left graph in Figure 1 displays the estimated rejection percentages for the the five detectors R m,q , S m,q , T m,q , P m and Q m with γ in (2.4) set to zero and p = 1. The right graph represents the corresponding mean detection delays which were estimated only from the samples for which neither a false alarm was obtained (which occurs when the detector function becomes larger than the threshold function before the time of change k = 75) nor the change was undetected (which occurs when the detector function remains below the threshold function during the entire monitoring period). Because the number of steps in the threshold function was set to p = 1, the left graph of Figure 1 is directly comparable with the top left graph given in Figure 1 of Dette and Gösmann (2019). An inspection of the latter seems to indicate that the powers of the procedures based on R m,q , S m,q and T m,q are not substantially different from those of the mean-specialized procedures considered in Section 5.1 of Dette and Gösmann (2019), even though the detectors R m,q , S m,q and T m,q are not specifically designed to be sensitive to changes in the expectation. The graphs are not substantially different for other values of γ and p. Overall, the procedures based on R m,q , S m,q and T m,q were always observed to be more powerful and superior in terms of mean detection delay than those based on P m and Q m . The latter is in full accordance with the empirical observations of Dette and Gösmann (2019) for more specialized procedures. Note that the procedure based on T m,q seems the most powerful for the alternative under consideration. Figure 2 highlights the influence of the parameter γ in (2.4) on the rejection percentages and the mean detection delays of the procedure based on R m,q with p = 10. The graphs are very similar for other values of p or for the procedures based on S m,q and T m,q . The conclusion is the same for all three procedures. For a change in expectation, while the parameter γ does not seem to affect the power of the procedures much, it has a clear influence on the mean detection delay: the greater γ, the shorter the mean detection delay. As we shall see, this behavior is not true for all types of alternatives. Figure 3 displays the influence of the number of steps p of the threshold function on the rejection percentages and the mean detection delays of the procedure based on S m,q with γ = 0.5. The graphs are not qualitatively different for other values of γ or for the procedures based on R m,q and T m,q . Overall, the procedures with p = 1 have the highest rejection percentages. The latter is due to the fact that, because k = 75, detections occur mostly at the end of the monitoring period, and, at the end of the monitoring interval, by construction, the threshold functions for p = 1 are below the corresponding threshold functions obtained for larger values of p. Change in variance The setting is similar to that of the previous experiment except that, this time, it is the variance of the normal distribution that changes from 1 to δ 2 . The left graph in Figure 4 displays the estimated rejection percentages for the procedures based on R m,q , S m,q , T m,q , P m and Q m with γ = 0.5 and p = 50. The graph on the right represents the corresponding mean detection delays. Again, the procedures based on R m,q , S m,q and T m,q appear to be substantially more powerful and superior in terms of mean detection delay than those based on P m and Q m . The conclusion remains true for all values of γ and p. The influence of γ and p is of the same nature as in the case of a change in mean: the greater γ, the shorter the mean detection delay and the lower p, the higher the power.
Change in distribution As a final experiment for independent univariate observations, we considered a change in distribution that keeps the expectation and the variance constant.  Specifically, F and G in H 1 were taken equal to the d.f. of the N (1, 2) and the d.f. of the Gamma distribution whose shape and rate parameters are both equal to 1/2, respectively. The parameter m was taken to be in {50, 100}, n was set to 2m and the parameter k to nc with c ∈ {0.51, 0.56, . . . , 0.96}. Figure 5 shows the rejection percentages of H ind 0 in (1.2) against k for the procedures based on R m,q , S m,q , T m,q , P m and Q m with γ = 0 and p = 50. The graphs for other values of γ and p are very similar. As one can see, the procedures based on P m and Q m appear to be the most powerful when the change occurs at the beginning of the monitoring period. The latter could have been expected from the definition of the underlying detectors and the simulation results of Dette and Gösmann (2019). As k increases, the procedures based on  N (1, 2) and G the d.f. of the Gamma distribution whose shape and rate parameters are both equal to 1/2 for the procedure based on T m,q with p = 10. Right: corresponding mean detection delays.  N (1, 2) and G the d.f. of the Gamma distribution whose shape and rate parameters are both equal to 1/2 for the procedure based on R m,q with γ = 0. Right: corresponding mean detection delays.
R m,q and T m,q become more powerful. Figure 6 displays the influence of the parameter γ on the rejection percentages of the procedure based on T m,q with p = 10. The graphs are not qualitatively different for other values of p and m or for the procedures based on R m,q and S m,q . As one can see, unlike for a change in expectation, γ has hardly no influence on the mean detection delay and it is the setting γ = 0 that leads to the highest rejection percentages.
Finally, Figure 7 shows the influence of p for the procedure based on R m,q with γ = 0. The graphs are not qualitatively different for other values of γ or for the procedures based on S m,q and T m,q . As in previous experiments, the setting p = 1 leads in the highest rejection percentages. However, when the change occurs in the first third of the monitoring period, the mean detection delay for p = 1 is clearly substantially larger than for p > 1.
Additional simulations show that the larger the monitoring period, the more pronounced this phenomenon.

Dependent multiplier bootstrap-based estimation in the time series case
The threshold function estimation approach based on the dependent multiplier bootstrap described in Section 4.2 can in principle be used as soon as the observations to be monitored are either multivariate or serially dependent. We used the setting B = 2000 in our experiments and estimated all the rejection percentages from 1000 samples at the α = 5% nominal level.

Under the null
One of the most important practical aspects is to assess how well the procedures hold their level when based on the dependent multiplier bootstrap. To attempt to answer this question, we conducted extensive simulations in the univariate case. For m ∈ {50, 100, 200, 400} and T ∈ {0.5, 1, 2, 3}, we generated samples of size n = m(T + 1) from an AR(1) model with normal innovations and autoregressive parameter β ∈ {0, 0.1, 0.3, 0.5}, and estimated the levels of the procedures based on R m,q , S m,q and T m,q with γ ∈ {0, 0.25, 0.5} and p ∈ {1, 2, 4}. The rejection percentages of H 0 in (1.1) for T m,q are given in Table 1 (the missing entries in the table correspond to parameter settings for which computations took too long given our computer cluster resources). As one can see, unsurprisingly, the larger β, the more liberal the procedure based on T m,q tends to be. This phenomenon is particularly visible for β ∈ {0.3, 0.5}. Reassuringly however, for a given β, T , p and γ, the estimated levels seem to get closer to the 5% nominal level as m increases. Hence, unsurprisingly, the stronger the serial dependence, the larger m needs to be so that the procedure can be expected to hold its nominal level. For β ∈ {0.3, 0.5} in particular and keeping T , γ and m fixed, we also see that the larger p, the more liberal the procedure based on T m,q tends to be. The latter could be explained by the fact that, as p increases, more thresholds need to be estimated, and that, except for the first, all the estimated thresholds are conditional empirical quantiles: the precision of the estimation of a threshold thus critically depends on the precision of the previously estimated thresholds with respect to which conditioning is performed. In other words, the fact that the empirical levels tend to become higher when p increases could be explained by an error propagation effect. Finally, for β, T , p and m fixed, we also see that the larger γ, the lower the rejection percentages tend to be. Overall, for β ≥ 0.1, the procedure based on T m,q holds its level best for γ = 0.5.
The conclusions for the procedures based on R m,q and S m,q are very similar, with the exception that the effect of γ seems "stronger": while the empirical levels are better for γ = 0.25 than for γ = 0, the procedures become way too conservative for γ = 0.5. The latter effect might be due to the fact the detectors R m,q and S m,q involve maxima (unlike T m,q which involves means) and to our too low setting of the constant δ in the definition of the weight function (2.4). The latter was arbitrarily set to 10 −4 (and had de facto no effect in our Monte Carlo experiments given the values of m that we considered). A similar experiment was conducted by generating samples from a GARCH(1,1) model with parameters ω = 0.012, β = 0.919 and α = 0.072 to mimic SP500 daily log-returns following . The empirical levels for the procedure based on T m,q are reported in Table 1 of the supplement Kojadinovic and Verdier (2020) and appear to be closest to the 5% nominal overall when γ = 0.5.
Finally, a bivariate experiment with independent observations consisting of generating samples of size n = 2m for m ∈ {50, 100, 200} from a normal copula with a Kendall's tau of τ ∈ {−0.6, −0.3, 0, 0.3, 0.6} was carried out. The empirical levels for the procedure based on T m,q are reported in Table 2 of the supplement Kojadinovic and Verdier (2020). The effect of γ appears similar as in the previous experiments. For fixed γ and p, it can also be observed that the procedure has a tendency of being too conservative in the case of strong negative dependence but, reassuringly, the agreement with the 5% nominal level seems to improve as m increases.

Change in the copula parameter
To grasp further the finite-sample behavior of the procedures in the case of bivariate independent observations, we simulated a change in the parameter of a normal copula. The left (resp. right) graph in Figure 8 displays the estimated rejection probabilities of H ind 0 in (1.2) for the procedures based on R m,q , S m,q , T m,q , P m and Q m with γ = 0.25 and p = 4 under H 1 in (1.3) with m = 50, k = 75, n = 100, F the bivariate normal copula with a Kendall's tau of -0.6 (resp. 0.6) and G the bivariate normal copula with a Kendall's tau of τ ∈ {−0.6, 0.3, 0, 0.3, 0.6, 0.9} (resp. τ ∈ {−0.9, −0.6, −0.3, 0, 0.3, 0.6}). As one can notice, the procedure based on T m,q (resp. P m ) is always among the most (resp. least) powerful ones. Graphs for other values of γ and p are not qualitatively different. As for all previ-26 ous experiments, we observed that the smaller p, the more powerful the procedures. For this experiment, the parameter γ appeared to have a rather small impact on the rejection percentages of T m,q .

Data examples
To illustrate the use of the proposed sequential change-point detection tests, we considered two fictitious scenarios, the first (resp. second) of a univariate (resp. bivariate) nature based on closing quotes of the NASDAQ composite index (resp. Microsoft and Intel stocks) for the period 2019-01-02 -2020-04-11. The latter were obtained using the get.hist.quote() function of the tseries R package (Trapletti and Hornik, 2019). In both scenarios, it was assumed that, on the last day of 2019, one wished to monitor the (univariate or bivariate) daily log-returns for a change in contemporary distribution possibly using the stretch of m = 251 (univariate of bivariate) log-returns of 2019 as learning sample. The latter decision was confirmed after the tests of  implemented in the functions stDistAutocop() and cpDist() of the npcp R package (Kojadinovic, 2019) provided no evidence against stationarity for the two candidate learning samples. Notice that, unsurprisingly, the rank-based test of serial independence of Genest and Rémillard (2004) implemented in the function serialIndepTest() of the copula R package (Hofert et al., The closing quotes and corresponding daily log-returns of the NASDAQ composite index are represented in the left panel of Figure 9. The solid vertical lines mark the beginning of the monitoring. The dotted line in the right panel represents the detector function based on T m,q with γ = 0.5. The latter was chosen given its overall good performance in our Monte Carlo experiments, both in terms of empirical level, power and mean detection delay. In the right panel, the solid line represents the threshold function with p = 4 steps estimated using the dependent multiplier bootstrap with B = 10 5 , while the dashed line represents the threshold function with p = 4 steps estimated using Monte Carlo with M = 10 5 . Note that the latter did not at all use the learning sample as it is computed under the assumption that the observations are serially independent. The relative proximity of the two threshold functions could be explained by the fact that, although present, serial dependence in the learning sample is probably very weak. The detector function exceeded the two threshold functions at the same date (2020-03-12), which is marked by the dotted vertical line in the left panel of Figure 9 and corresponds to the 49th daily log-return of 2020. Given the definition of T m,q in (2.6) and having that of S m,q in (2.5) in mind, a natural estimate of a point of change for an exceedance at position k = m + 49 = 300 is given by which returned 285 and corresponds to the date 2020-02-20. The latter is marked by a dashed vertical line in the left panel of Figure 9 and corresponds to the beginning of the sharp decrease of the NASDAQ composite index as a consequence of the Covid-19 pandemic. Figure 10 describes the monitoring of the bivariate daily log-returns of the Microsoft and Intel stocks using the procedure based on T m,q with γ ∈ {0, 0.25, 0.5} and a threshold function 28 with p ∈ {1, 2, 4} steps estimated using the dependent multiplier bootstrap. All the dates of exceedance are between the 2020-03-12 (γ = 0 and p = 2) and the 2020-03-09 (γ = 0.5 and p = 1). The estimated dates of change turn out not to depend on p and are the 2020-01-24 for γ = 0, the 2020-02-14 for γ = 0.25 and the 2020-02-19 for γ = 0.5. This effect of γ on (6.1) was to be expected: larger values of γ give more weight to potential break points close to k.

Concluding remarks
In the context of closed-end sequential change-point detection, it can be argued that it is desirable that the underlying threshold function is such that the probability of false alarm remains approximately constant over the monitoring period. In this work, the asymptotic validity of the bootstrap-based estimation of such a threshold function was established for generic detectors. The latter was applied to sequential change-point tests involving detectors based on differences of empirical d.f.s that can be either simulated or resampled using a dependent multiplier bootstrap depending on whether univariate independent or multivariate serially dependent observations are monitored. The proposed detectors are adaptations of statistics used in a posteriori change-point testing and include a weight function in the spirit of Csörgő and Szyszkowicz (1994a) that can be used to give more importance to recent observations. Extensive Monte Carlo experiments were used to investigate the finite-sample properties of the resulting sequential change-point tests. Among the proposed detectors, none led to a uniformly better testing procedure. When based on the dependent multiplier bootstrap, the procedure based on T m,q in (2.6) was observed to have the best behavior, overall, in terms of empirical level, power and mean detection delay. In the case of univariate independent observations, when the threshold function can be estimated using Monte Carlo simulation, the number of step p of the threshold function can be chosen as large as the number of monitoring steps. However, in the time series case, when the estimation of the threshold function is based on the dependent multiplier bootstrap, p should not be taken too large because of an error propagation effect.
With the aim of monitoring multivariate independent observations, we were also able to establish the asymptotic validity of the proposed procedures when the threshold functions are estimated using a smooth bootstrap based on the empirical beta copula (Kiriliouk, Segers and Tsukahara, 2019). In future work, we plan to investigate the validity of additional bootstraps for monitoring multivariate time series. The current and future theoretical results would also need to be complemented by additional Monte Carlo simulations, involving in particular multivariate experiments. Such finite-sample investigations are however a real computational challenge given the complexity and cost of execution of such change-point detection procedures.
Appendix A: Auxiliary lemmas for the proof of Theorem 3.3 This section, which is, to a large extent, notationally independent of the rest of the paper, provides the proofs of two lemmas, possibly of independent interest, necessary for showing 29 Theorem 3.3. Let X n denote available data. No assumptions are made on X n apart from measurability. To fix ideas, one can think of X n as a sequence of n multivariate serially dependent random vectors. Let S n = S n (X n ) be a R p -valued statistic such that S n = (S 1,n , . . . , S p,n ) S = (S 1 , . . . , S p ) as n → ∞, where the random vector S is assumed to have a continuous d.f. We additionally suppose that we have available bootstrap replicates S n , i ∈ N, are n-dimensional independent and identically distributed random vectors representing the additional sources of randomness involved in the underlying bootstrap mechanism. We shall further assume that, as n → ∞, and S [2] are independent copies of S. Note that, from Lemma 2.2 of Bücher and Kojadinovic (2019), (A.1) is equivalent to the usual conditional bootstrap consistency statement, that is, Before stating and proving the two lemmas, we introduce some additional notation and list useful results. For any q ∈ {1, . . . , p}, {j 1 , . . . , j q } ⊂ {1, . . . , p}, x j 1 , . . . , x jq ∈ R and B ∈ N, let F S,{j 1 ,...,jq} (x j 1 , . . . , x jq ) = P(S j 1 ≤ x j 1 , . . . , S jq ≤ x jq ), F Sn,{j 1 ,...,jq} (x j 1 , . . . , x jq ) = P(S j 1 ,n ≤ x j 1 , . . . , S jq,n ≤ x jq ), jq,n ≤ x jq ).
Starting from (A.11) for j = k, the desired result follows by combining the latter convergence with (A.18). The induction is thus complete. The fact that the statements (A.7) and (A.10) with '≤' replaced by '<' hold as well is a consequence of (A.2) and (A.3).
Appendix B: Proof of Theorem 3.3 Proof. Let S m = sup t∈I 1 D m (t), . . . , sup t∈Ip D m (t) and be the corresponding bootstrap replicates of S m . From (3.7) and the continuous mapping theorem, we immediately obtain that, under H 0 , (S m , S and S [1] and S [2] are independent copies of S. Since S is assumed to have a continuous d.f., the assumptions of Appendix A are satisfied, and, therefore, Lemma A.1 with n = m and ξ = 1 − (1 − α) 1/p implies that (3.8) and (3.9) hold. The fact that, under H 0 , P(D m ≤ τ B m ) → 1 − α as m, B → ∞ follows from a decomposition similar to the one used in (3.2).
To prove the last claim, it suffices to show that, when sup t∈[1, This supplement contains the proofs of Propositions 2.5, 2.7 and 4.1 as well as additional simulation results.
We start by showing that the finite-dimensional distributions of J m,q converge weakly to those of J C,q . Let p ∈ N, p > 1, and (s 1 , t 1 ), . . . , (s p , t p ) ∈ ∆ ∩ [1, T + 1] 2 be arbitrary. The 1 result is proven if we show that J m,q (s 1 , t 1 ), . . . , J m,q (s p , t p ) J C,q (s 1 , t 1 ), . . . , J C,q (s p , t p  t 1 , ·), . . . , G m,q (s p , t p , ·), C 1: mt 1 , . . . , C 1: mtp . The latter can be combined with the fact that C is continuous, G C,q has continuous sample paths with probability one, Lemma 1 in Kojadinovic, Segers and Yan (2011) and the continuous mapping theorem to obtain (C.2). It remains to show that the process J m,q is asymptotically tight (see, e.g., van der Vaart and Wellner, 2000, Section 1.5). From Section 2.1.2 and Problem 2.1.5 in the same reference, the latter is shown if, for every sequence δ m ↓ 0, The supremum on the left-hand side of the previous display is smaller than I m + J m , where L{ϑ 2 (G C,q )}, are absolutely continuous with respect to the Lebesgue measure. Since the maps ϑ 1 , ϑ 2 and ϑ 3 are continuous and convex, from Theorem 7.1 in Davydov and Lifshits (1984), we obtain that, for any i ∈ {1, 2, 3}, L{ϑ i (G C,q )} is concentrated on [a i , ∞) and absolutely continuous on (a i , ∞), where a i = inf{ϑ i (f ) : f belongs to the support of L(G C,q )}.
To conclude the proof, it remains to show that L{ϑ 1 (G C,q )}, L{ϑ 2 (G C,q )} and L{ϑ 3 (G C,q )} have no atom at 0. For any f ∈ C(∆ × [0, 1] d ), we have that ϑ 2 (f ) = ϑ 3 (f ) = 0 if and only if f (s, t, u) = 0 for all (s, t) ∈ ∆ ∩ ([1, t 2 ] × [t 1 , t 2 ]) and all u in the support of the distribution induced by C. Let u * ∈ (0, 1) d be an arbitrary point in the latter support such that Var{B C (1, u * )} = Γ(u * , u * ) > 0, where Γ is defined in (2.23), and let (s The supremum on the left-hand side of the previous display is equal to To prove (C.4), we shall show that each of the three suprema in the previous display converge in probability to zero. Notice first that Furthermore, for any (s, t) ∈ ∆, x ∈ R d and H ∈ {F, G}, let  x 1 ), . . . , F d (x d )). By the continuous mapping theorem, it thus immediately follows that, under H 0 , F m,F converges weakly in ∞ (∆ × R d ) to a tight limit. Some thought then reveals that, under the conditions of the proposition, F m,F (resp. F m,G ) converges weakly in From the expression of K c given in (C.6), for the first supremum in (C.5), we obtain that since H m converges weakly in ∞ {(∆∩[0, c] 2 )×R d } to a tight limit as a consequence of the fact that, for any 0 ≤ s ≤ t ≤ c and x ∈ R d , H m (s, t, x) = λ m (s, t)F m,F (0, s) − λ m (0, s)F m,F (s, t) and the continuous mapping theorem.
Using the fact that sup s∈[0,T +1] |λ m (0, s) − s| ≤ 1/m, the function q in (2.4) is continuous on ∆ and sup (s,t)∈∆ {q(s, t)} −1 < ∞, we obtain, from the continuous mapping we immediately obtain that sup t∈[1,T +1] R m,q (t) P → ∞. To show the two last remaining claims, we shall first prove that sup 1≤s≤t≤T +1 where, for any t ∈ [1, T + 1] and x ∈ R d , By the triangle inequality, the supremum on the left hand-side of (C.7) is smaller than and On one hand, some thought reveals that as a consequence of the continuous mapping theorem. On the other hand, from (2.27), we have that Using (C.8), the second supremum on the right-hand side of the previous display is smaller than max{K m , L m }, where From the assumptions on the strong mixing coefficients and Theorem 1.2 in Berbee (1987) (see also Rio, 2017, Chapter 3), the strong law of large numbers implies that, as k → ∞, where the arrow ' as →' denotes almost sure convergence. Since the previous convergence is equivalent to the fact that lim sup m→∞ M m = 0 almost surely, we obtain that K m = sup m≤k≤ mc M k ≤ sup m≤k M k as → 0. Using the fact that, for any c ≤ t ≤ T + 1 and x ∈ R d , and that sup c≤t≤T +1 | mc / mt − c/t| = O(1/m), we obtain that The first term on the right-hand side of the previous display is equal to M mc and thus converges to zero almost surely. The second term can be written as sup mc +1≤k≤ m(T +1) (C.9) Letting and decomposing the sum in (C.9), by the triangle inequality, (C.9) is smaller than    Let Λ F = R d {F (x) − G(x)} 2 dF (x) and Λ G = R d {F (x) − G(x)} 2 dG(x). Since F and G are continuous and F = G, Λ F > 0 and Λ G > 0. As a consequence, for all t ∈ [1, T + 1], Furthermore, since q(s, t) > 0 for all (s, t) ∈ ∆ and since, for all c < t ≤ T + 1,   Proof. Recall that n = m(T + 1) and, for any b ∈ N, s ∈ [0, 1] and u ∈ [0, 1] d , let i,m {1(U i ≤ u) − C 1:n (u)}.