Change-point Inference for High-dimensional Heteroscedastic Data

We propose a bootstrap-based test to detect a mean shift in a sequence of high-dimensional observations with unknown time-varying heteroscedasticity. The proposed test builds on the U-statistic based approach in Wang et al. (2022), targets a dense alternative, and adopts a wild bootstrap procedure to generate critical values. The bootstrap-based test is free of tuning parameters and is capable of accommodating unconditional time varying heteroscedasticity in the high-dimensional observations, as demonstrated in our theory and simulations. Theoretically, we justify the bootstrap consistency by using the recently proposed unconditional approach in Bucher and Kojadinovic (2019). Extensions to testing for multiple change-points and estimation using wild binary segmentation are also presented. Numerical simulations demonstrate the robustness of the proposed testing and estimation procedures with respect to different kinds of time-varying heteroscedasticity.


Introduction
Owing to the advances in science and technology, high-dimensional data has been increasingly important in many areas, such as genomics, neuroscience and finance among others.In the analysis of high-dimensional datasets, often some kind of homogeneity assumption such as iid (independent and identically distributed) is made, but in reality the data may exhibit certain breaks in its stochastic property, especially when the data is ordered by time (e.g., stock return data) or one-dimensional locations (e.g., gene expression levels indexed by genomic loci).This has motivated a growing literature of change-point testing and estimation for the mean shift in high-dimensional data.See Horváth and Hušková (2012); Cho and Fryzlewicz (2015); Jirak (2015); Wang and Samworth (2018); Wang et al. (2022); Enikeeva and Harchaoui (2019); Yu and Chen (2021); Zhang et al. (2021) for some recent work.
A common feature of all above-mentioned papers is that they assume the second order properties (i.e, covariance matrix for independent high-dimensional data) is time invariant, while the mean may undergo changes at unknown times.This is a strong assumption and may be violated for many high-dimensional datasets.See Section 6 for significant evidence of time varying heteroscedasticity for a genomic dataset that has been analyzed by several researchers [Wang and Samworth (2018); Wang et al. (2022); Zhang et al. (2021)].When heteroscedasticity is present, the existing changepoint detection methods developed under the homoscedastic assumption may fail or their validity remains unknown.For low dimensional time series, novel change-point detection methods have been developed by Zhou (2013) and Górecki et al. (2018) to detect mean changes while allowing for second or higher order non-stationarity, but an extension of their methods to high-dimensional setting is very nontrivial.In summary, there is a lack of methodology to detect mean changes for high-dimensional heteroscedastic data.
In this article, we develop a novel test and estimation procedure that can detect change-points in the mean when unconditional heteroscedasticity is present in the sequence of high-dimensional observations.To facilitate our methodological development, we assume the following mathematical framework: the p-dimensional observation at the ith time or location is where Z i 's are i.i.d.p-dimensional random vectors with mean 0 and covariance matrix Σ, and H(i/n) is a p×p diagonal matrix that models the unconditional time/location dependent heteroscedasticity.We are interested in testing H 0 : µ 1 = µ 2 . . .= µ n vs Under H 1 , k 1 , . . ., k s are unknown change points.The estimation of the number s and location of change-points (k 1 , • • • , k s ) is also addressed in the present paper.Note that when p = 1, our model is similar to that in Górecki et al. (2018), except that the latter paper allowed serial dependence in {Z i } n i=1 .We do not pursue the more general, heteroscedastic and temporally dependent case, as there are methodological challenges to handle temporal dependence in the high-dimensional setting; see Section 7 for more discussions.Nevertheless, the temporal independence assumption is commonly made in the literature of change-point detection of genomic data; see Jeng et al. (2010) and Zhang and Siegmund (2012).
In this paper, we propose to build on the U-statistic based approach in Wang et al. (2022), who extended the two sample U-statistic used in Chen and Qin (2010) from high-dimensional two sample testing to change-point testing.In Wang et al. (2022), the sequence of observations is assumed to be homoskedastic subject to mean shifts under the alternative, that is H(•) = I p (p×p identity matrix).They adopt the idea of self-normalization (Shao (2010b), Shao and Zhang (2010)) in forming their test statistic and the theoretical validity of their SN-based test is shown under homoskedasticity.When there is time-varying heteroscedasticity, we show that the asymptotic null distribution of the SN-based test statistic in Wang et al. (2022) is no longer pivotal, and it depends on the unknown H(•).To accommodate the unknown heteroscedascity, we propose to use the wild bootstrap to directly approximate the finite sample distribution of the original class of U-statistics, instead of doing self-normalization.With the aid of the recently proposed unconditional approach in justifying bootstrap consistency [Bücher and Kojadinovic (2019)], we are able to show the consistency of wild bootstrap under the framework (1) and derive the local asymptotic power under the one-change point alternative.In the context of testing for one change point in mean, our bootstrap-based test is free of tuning parameters, and performs well for a broad range of heteroscedastic models in our simulation studies.Extensions to testing for multiple change-point alternative and estimation of change-points using WBS (wild binary segmentation, Fryzlewicz (2014)) are also made.Note that like Wang et al. (2022), our bootstrap-based test targets dense alternatives (i.e.when small changes occur for a substantial portion of the components), which can be well motivated by real data and is often the type of alternative we are interested in.For example, copy number variations in cancer cells are commonly manifested as change-points occurring at the same positions across many related data sequences corresponding to cancer samples and biologically-related individuals; see Fan and Mackey (2017).
The rest of the paper is structured as follows.Section 2 describes the test statistic and wild bootstrap scheme for testing a single change point.An extension to testing multiple change points is also made.Section 3 provides the assumptions and theoretical results for the proposed testing procedure under the null and alternatives.In Section 4, we combine the WBS with our bootstrap-based test for change-point estimation.Section 5 compares the bootstrap-based testing and estimation methods with their counterparts in Wang et al. (2022) via simulations.Section 6 illustrates the usefulness of our method using a real dataset and Section 7 concludes.All technical details and proofs are relegated to the appendix.

Single change point testing
We first focus on the single change point alternative Our test statistic is motivated by Wang et al. (2022), which was inspired by the two sample testing statistics in Chen and Qin (2010).For readers who are not familiar with those papers, we now provide a brief introduction to the main ideas which appeared in there.More precisely, suppose (U 2 , V 2 ) is an independent copy of (U 1 , V 1 ).Consider the function The expectation of this kernel function is

Note that this expectation equals zero if and only
Note that this is simply a two-sample U-Statistic with kernel h.This statistic was proposed by Chen and Qin (2010) for comparing the means of two possibly high-dimensional vectors.The key observation of Chen and Qin (2010) was that this statistic is more appropriate than the seemingly natural alternative ∥ Ū − V ∥ 2 2 (with Ū , V denoting the corresponding sample means) because the latter contains terms of the form (U i − V j ) T (U i − V j ) which do not have expected value zero under the null of equal means.This does not matter in fixed dimensions, but can blow up if the dimension of the vectors grows with sample size.
Suppose the change in mean vector occurs at time k+1.We can view X 1 , . . ., X k and X k+1 , . . ., X n as two independent samples with different means.A natural test statistic for a change at time k is thus Here, the second equality follows after straightforward computations and facilitates the theoretical analysis of our test statistic.Since the location k of the change point is unknown, we consider the maximum value over all possible change-points.
To mimic the CUSUM process used in the low dimensional setting, we define a rescaled version of G n (m), where the rescaling is adopted to prevent the statistics G n (m) on the two ends from blowing up.Note that this was implicitly done in the SN-based test statistic of Wang et al. (2022).Then we define our test statistic for H 11 to be This formulation is similar to Wang et al. (2022) but does not require the use of self-normalization technique, which has its origin from Shao (2010b) and Shao and Zhang (2010).Under the null we have E[G n (m)] = 0 for all n, m.Hence the statistic T n is expected to converge to a non-degenerate distribution upon suitable standardization.Under the single change-point alternative with change at k 0 we have E[G n (k 0 )] > 0 with magnitude depending on k 0 and the size of the change.Hence the test statistic with the same normalization as under the null diverges under the one change-point alternative if the magnitude of change is large enough.As will be shown later, the limiting null distribution of T n depends on the unknown H(•), thus is not asymptotically pivotal and the idea of self-normalization is not directly applicable.This motivates us to propose a bootstrap-based approach to approximate the finite sample distribution (or the limiting null distribution) of T n under the null.
Specifically, we employ the Gaussian multiplier bootstrap.Let e 1 , . . ., e n be i.i.d N (0, 1) random variables independent of X 1 , X 2 , . . ., X n .Let X = 1 n n i=1 X i denote the sample mean.The bootstrap test statistic is defined as and To ensure the bootstrap consistency, the observations are centered with the overall mean.In practice, we also tried the centering by local mean (e.g., replace X by 1 m m i=1 X i in the first summand of G * n (m)), and the results are similar to the ones we obtain by centering by overall mean.The proof and implementation for the latter seem a bit simpler, so we only present the latter.
In the low dimensional setting, i.e., when p is fixed, the weighted bootstrap for degenerate Ustatistic has been studied by Huskova and Janssen (1992), Janssen (1994), Dehling and Mikosch (1994), Wang and Jing (2004), among others.We refer the reader to a recent paper by Huang et al. (2021) and more references therein.We are not aware of any results on bootstrap consistency for degenerate U-Statistics for data of increasing dimension.
The theoretical bootstrap critical value for a size α test is defined to be where X = (X 1 , . . ., X n ).In practice this theoretical value is typically approximated by Monte Carlo simulations.Let F * M denote the empirical cdf of M bootstrap statistics T * ,1 n , . . ., T * ,M n , where each of them is based on an independent sequence of multipliers.Then we define This quantity can be computed through simulations.We reject the null hypothesis when

Multiple change points testing
In practice, the number of change points is often unknown, so we consider a more general multiple change-points alternative, Inspired by the scanning approach developed by Zhang and Lavitas (2018) for change-point testing in the univariate time series setting, we can incorporate the idea of forward and backward scanning into our test statistics for multiple change points detection.
To this end, we first introduce some more general notations.For any a ≤ m ≤ b, a, b, m ∈ {1, . . ., n} define and Gn (m) = Gn (m; 1, n).Our test statistic for multiple change points alternative takes the following form Under H 0 , since there is no change point, both forward and backward scanning parts are expected to be small.When there is at least one change point, the first change point would result in an inflation of the forward scanning part and the last change point would lead to a large value for the backward scanning part.Again, we use the Gaussian multiplier bootstrap to obtain the bootstrap distribution and calibrate the size.The bootstrap statistic is defined as and The bootstrap critical value is defined to be In practice, the critical value is approximated by c (Mn) 2,α , which is computed from the M n bootstrap samples, similarly as c (Mn) 1,α .We then reject the null hypothesis when T n,M > c (Mn) 2,α .It is worth noting that the proposed bootstrap test avoids the trimming parameter that is required in Zhang and Lavitas (2018) and Wang et al. (2022), and is thus tuning parameter free.

Theoretical results
In this section, we present the theoretical results regarding the asymptotic properties of the test statistics and bootstrap consistency.Throughout this paper, we work with triangular array asymptotics where Z 1 , . . ., Z n are independent across n but with dimension p = p n that can grow with n.In order to keep the notation simple, we will not explicitly mark the dependence of the distribution and dimension of Z on n.All asymptotics will be for n → ∞.For a symmetric matrix Σ, we denote ∥Σ∥ F its Frobenius norm.Consider the model (1), where F for h = 2, 3, 4, 5, 6 and some positive constant C which does not depend on n.Assumption 3.3.For every t ∈ [0, 1], H(t) is a p × p diagonal matrix with all diagonal elements bounded by some finite constant B, independent of n that is |H l,l (t)| ≤ B for all t ∈ [0, 1] and l = 1, . . ., p n , n ≥ 1.
Assumption 3.4.Assume that the following limit exists for all 0 ≤ a ≤ b ≤ 1.
Assumptions 3.1 and 3.2 are also imposed in Wang et al. (2022).As shown in Wang et al. (2022), Assumption 3.1 is equivalent to ∥Σ∥ 2 = o(∥Σ∥ F ) and can only hold when p = p n → ∞ as n → ∞.See page 813 of Chen and Qin (2010) for additional discussion on its implications on the eigenvalues of Σ. Assumption 3.2 was first imposed in Wang et al. (2022), which is shown to be weaker than the factor-model-like assumption in Chen and Qin (2010); see Remark 3.3 in Wang et al. (2022).The summability of cumulants assumption is commonly used in time series analysis for the asymptotic analysis of low-dimensional time series [Brillinger (1975)].In our setting, it is used to ensure that the dependence is weak enough across the dimension of the vector.This is a crucial technical ingredient in our asymptotic analysis when establishing finite-dimensional convergence of a suitably normalized version of the process G n to a multivariate normal limit.Assumption 3.2 in general holds under uniform bounds on moments and 'short-range' dependence conditions on the components of X (possible after permutation).For example, if the sequence corresponding to the ordered components of X (or a permutation of components) satisfies certain mixing and moment conditions, then Assumption 3.2 holds.See Remark 3.2 of Wang et al. (2022) for more discussion and references.
Assumptions 3.3 and 3.4 are regarding the time varying heteroskedasticity {H(t)}.Assumption 3.3 bounds the range of heteroskedasticity, and is mild.Assumption 3.4 appears when we study the process t → Gn (⌊nt⌋).More precisely, in the Appendix we decompose Gn (⌊nt⌋) into a linear combination of a process Sn evaluated at different points (see beginning of the proof of Theorem 3.1).The limiting variance of this process Sn is directly related to the limit appearing in Assumption 3.4 (see the proof of Proposition A.1).For a transparent example, assume H(t) = f (t)I p where f is a real-valued function and I p denotes the p×p identity matrix.In this case the assumption simplifies because tr(H F .The normalized sum can be seen as a Riemann approximation of an integral, and convergence takes place provided that f is sufficiently regular (for instance, bounded and piece-wise continuous with a finite number of jumps.)In general settings, Assumption 3.4 boils down to requiring sufficient regularity of each component of H in a suitable uniform sense.In what follows, we use I(•) to denote the indicator function.
Theorem 3.1.Under Assumptions 3.1-3.4and H 0 , the normalized test statistic T n converges to a non-pivotal distribution after proper standardization, that is, and Q is a mean-zero Gaussian process on [0, 1] 2 , and the covariance is given by The theorem implies that the normalized test statistic converges to a potentially non pivotal distribution which depends on the time varying heteroskedasticity function H(•).When the time varying heteroskedasticity function is the identity matrix (i.e., H(t) = I p for every t ∈ [0, 1]), the covariance structure of Q is the same as that in Theorem 3.4 of Wang et al. (2022) and the limit is pivotal.Self-normlization can then help to get rid of the unknown normalizing factor ∥Σ∥ F leading to a pivotal test.However, in general the distribution of the self normalized statistic from Wang et al. (2022) is not pivotal due to presence of the unknown heteroskedasticity function H(•) in the definition of V .
Next we present the results on the bootstrap consistency under H 0 .Additional assumptions are needed to establish bootstrap consistency.In particular, we assume As shown in the Appendix, Assumption 3.5 implies which is comparable to Assumption 3.2 and can be verified under similar weak dependence structure as described in Wang et al. (2022).This assumption is used when showing the negligibility of some remainder terms for the bootstrap process.
Theorem 3.2.Assume Assumptions 3.1-3.5 hold.Under H 0 , we have for any sequence M n → ∞ and any α < 1/2: Next we state the result regarding the power of the proposed test statistics.
Theorem 3.3.Suppose that Assumptions 3.1-3.5 hold.Assume there is one single change point at where Moreover, for 2 copies of the bootstrap statistic T 1, * n , T 2, * n which are based on independent sets of multipliers we have where T, T (1) , T (2) are independent copies of T from Theorem 3.1.
The result in Theorem 3.3 shows that our test has nontrivial power when the L 2 -norm of change is large relative to ∥Σ∥ F , which targets the dense alternative.In the special homoscedastic case, i.e., H(t) = I p for all t ∈ [0, 1], the power result is consistent with the one obtained in Wang et al. (2022), in the sense that both tests share the same rate of alternative under which nontrivial power occurs.This suggests that the bootstrap-based procedure brings extra robustness with respect to unconditional time-varying heteroscedasticity, as compared to the SN-based one in Wang et al. (2022), without sacrificing power.Finally, we note that by results in Bücher and Kojadinovic (2019) the result in part 3 remains true for an arbitrary number of bootstrap copies T * ,1 n , . . ., T * ,K n that are obtained from independent multipiers.
The following theoretical results can be derived similarly for multiple change point testing.
Theorem 3.4.Assume Assumptions 3.1-3.5 hold, under H 0 , we have where Moreover, for 2 copies of the bootstrap statistic T 1, * n,M , T 2, * n,M which are based on independent sets of multipliers we have M ), M ) are independent copies of T M .Under the alternative, we show that the power of the proposed test for multiple change-point detection goes to 1 when there is a dense mean change.
Theorem 3.5.Assume Assumptions 3.1-3.5 hold.Suppose that there are change points at k 1 , . . ., k s , that k j = ⌊c j n⌋ for constants 0 < c 1 < • • • < c s < 1, and at least one of the change-point sizes, say for the r'th change-point, satisfies

Change-point estimation
Wild binary segmentation (WBS) was introduced by Fryzlewicz (2014) as an alternative to the popular binary segmentation algorithm to estimate the change-points locations in a univariate sequence.Wang et al. (2022) combined WBS and their SN-based test and showed that WBS outperforms binary segmentation, especially when the changes are non-monotonic.Here, we shall combine our bootstrap-based test with the WBS algorithm to estimate the locations of change-points in the mean of high-dimensional heteroscedastic data.The algorithm involves generating N random segments {(s m , e m )} m=1,...,N , calculating the single change point test statistic on each segment (s m , e m ), and then taking a maximum over all random segments, that is, max m=1,...,N W (s m , e m ).A change point is detected when max m=1,...,N W (s m , e m ) > ξ n , where ξ n is a proper threshold parameter.
In the event that a change point is detected, let m = arg max m W (s m , e m ).The location of the changepoint is estimated at Then the data is split into two parts (X 1 , • • • , X t1 ) and (X t1+1 , • • • , X n ) and WBS is employed for each part until no change-points are detected.
In Wang et al. (2022), the threshold was obtained by applying the same test to the simulated iid Gaussian data to the same set of random segments.This approach makes intuitive sense since SN-based test statistic is asymptotically pivotal when there is no heterosecasticity, but is no longer meaningful in the presence of heteroscedasticity, as the asymptotic pivotal nature of the SN-based test statistic is lost and the function H(•) is unknown.To overcome this difficulty, we propose to adopt a bootstrap-based approach in determining the threshold.Specifically, for N random segments (s m , e m ), we generate R independent copies of Gaussian multipliers.Let The threshold ξ n is defined as the 95% quantile of the values { ξ1 n , . . ., ξR n }.Note that we generate multipliers once for each bootstrap replication and apply the same multipliers in all intervals.

Simulation studies
In this section, we investigate the finite sample performance of our proposed bootstrap-based tests and WBS+Bootstrap estimation method via simulations.In Section 5.1, we present the size and power for our bootstrap-based tests in comparison with SN-based tests in Wang et al. (2022) for the settings of single and multiple change points in high-dimensional homoskedastic and heteroscedastic data.Section 5.2 examines the performance of the WBS+Bootstrap change point estimation method in comparison with the WBS+SN based approach in Wang et al. (2022) when the unconditional heteroscedasticity is present.
Cases 1 and 2 both belong to weakly dependent (across coordinates of X) models and it will be interesting to see how the increased dependence from Case 1 to Case 2 impact the finite sample size accuracy.Case 3 corresponds to a model with strong dependence, and it violates the componentwise weakly dependent assumption we imposed in our theory (see Assumptions 1&2).Nevertheless it would be interesting to see how robust our bootstrap-based tests are with respect to strong componentwise dependence.Next, we consider the following time varying trend function H(•), which specifies the trend in time-varying variance of each component but not the trend in mean.We use the terminology "trend" with the understanding that it always refers to the time-varying variance.
)}]I p .This trend function has a cosine shape.Some of these trend functions, such as A1, A3 and A4, have been considered in Zhao and Li (2012), who studied the inference of the mean for a univariate time series with time-varying variance.First, we investigate the case where there is at most one change point in the mean.Under the null hypothesis, we set µ i = 0 for all i = 1, . . ., n.We consider (n, p) = (400, 100), (100, 100), (400, 400), for all choices of Σ and H(•) described above.The empirical sizes at significance levels α = 0.  2022)) are also reported for comparison.From Table 1, we can see that for AR covariance matrix with ρ = 0.5, 0.8, both tests achieve size accuracy, i.e., the empirical sizes are close to the nominal level, when there is no time-varying heteroscedasticity.However, when time varying heteroscedasticity is present, the SN-based test exhibits pronounced over-size distortion in the case of A1, A2, A1+A2, and A1+A3.By contrast, the bootstrap-based test we propose achieves accurate size across all trend types.When the covariance matrix is compound symmetric, the model assumptions required for the validity of both SN-based test and bootstrapbased test are violated.It is observed that the SN-based test over-rejects even when there is no trend, which is consistent with the result in Wang et al. (2022).Interestingly, the bootstrap-based test still maintains accurate size for all settings.This suggests that the applicability of bootstrapbased test may be broader than what we are able to justify.It would be interesting but may be challenging to provide a new theory that supports the robustness of our bootstrap-based test when the panel dependence is strong.
Next, we investigate the power of the proposed bootstrap test under the alternative of one change point.We consider (n, p) = (100, 100) and the mean shift occurs at the center of data, i.e. µ i = ∆1{i ≥ ⌊n/2⌋}.We provide the power curves of the proposed bootstrap test and SN-based test for two AR covariance matrices and all trend types.We let ∆ steadily increase from 0 to some larger values and evaluate the empirical power at different change magnitudes based on 1000 Monte Carlo simulations.In Figure 1, the solid line corresponds to the power for bootstrap-based test and the dashed line corresponds to SN-based test, with the colors red and black indicating the results for ρ = 0.8 and ρ = 0.5, respectively.When there is no time varying trend in variance, the two tests have similar size and power.Similar phenomenon holds for trend types A3, A4, and A1+A4, for all of which we observe size accuracy for both tests.On the other hand, the SN-based test shows significant size distortion for trend types A1, A2, A1+A2 and A1+A3, making it difficult to compare the power of the two tests directly.To make a fair comparison, we also report the size adjusted power of the SN-based test for these cases.To be more specific, we calibrate the empirical critical values used in SN-based test such that the empirical sizes are exactly 0.05.The size adjusted powers of SN-based test are shown in dotted lines in the figures for trend types A1, A2, A1+A2, and A1+A3.The size for the bootstrap-based test is fairly close to 0.05, so we did not make any power adjustment.A direct comparison between the size-adjusted power of SN-based test and the raw power of bootstrap-based test suggests that the powers are quite comparable, with slight advantage for the bootstrap-based test in some settings, such as A1, A2, and A1+A2.
Next, we investigate the performance of the bootstrap-based test that targets unknown number of change points, where there are more than one change point under the alternative.We only present the results for trend types A0, A1, A2 and A1 + A2.Following Wang et al. (2022), we consider a two-change-points alternative (2CP) and a three-change-points alternative (3CP), We consider two AR covariance matrices used before when generating Z i and set (n, p) = (50, 50) and ∆ = 0.2.We compare the empirical size and power with those of SN-based test statistic T ⋄ n in Wang et al. (2022) based on 1000 replications.According to Table 2, we can see that even for homoscedastic case (trend type A0), the SN-based test is unable to control the size, which is presumably due to relatively small sample size n and dimension p.The size distortion for the SN-based test when there are time varying heterocedasticity is obvious.In comparison, the bootstrap test shows quite accurate size for both homoscedastic and heteroscedastic cases.Notice that under the alternatives where there are 2 or 3 change points, the proposed bootstrap test still shows respectable power in most cases.Due to the size inflation of SNbased test, the interpretation of its high power needs to be done with caution.Overall, we would not recommend to use SN-based test when there is time varying heteroscedasticity and bootstrap-based test is preferred.

Estimation
In this subsection, we examine the finite sample performance of the WBS+Bootstrap based change point estimation method described in Section 4. We followed the same setting used in Wang et al. (2022).Let n = 120, p = 50, and change point locations are 30, 60 and 90.These change points partitioned the data into four zones.We draw i.i.d.normal samples from N (ν j , I p ), j = 1, 2, 3, 4 for each zone.Let θ j = ν j+1 − ν j be the strength of the signals.For the dense case, we choose 5/p, 2 2.5/p.We consider all the trend functions.
In addition to reporting the frequency for the difference between the estimated number of change points and the actual number of change points ( N − N ), we also use the mean squared error (MSE) of ( N − N ) to measure the estimation accuracy for the number of change point.Similar to the comparison in Wang et al. (2022), we can view the change point estimation problem as a special case of classification.We treat the data between two successive change points as if they are in the same category, and evaluate the classification accuracy based on Adjusted Rand Index (ARI) (Rand, 1971;Hubert and Arabie, 1985;Wang and Samworth, 2018).ARI can only take values between 0 and 1, and the larger ARI is associated with the better accuracy.When all change points are estimated perfectly, the ARI is 1.If there is no change point estimated, the corresponding ARI is 0. The results are summarized in Table 3.Notice that for weaker signal k = 2.5/p, even when there is no trend (Type A0), the WBS+SN is unable to provide an accurate estimate while our WBS+Bootstrap correctly estimates the number and location of change points.When there is heteroscedasticity, our WBS+Bootstrap also substantially outperforms WBS+SN, in particular for trend types A2, A4, A1+A2 and A1+ A4.Both methods perform worse for trend type A2 which corresponds to a linear trend, while the WBS+Bootstrap still maintains reasonably good MSE and ARI.For the strong signal k = 2 2.5/p case, WBS+SN still cannot compete with WBS+Bootstrap when there is no trend (A0).For the heteroscedastic cases, WBS+SN seems to perform better than the no trend case.However, this is due to the fact that the trend function yields a smaller variance, which makes the signal to noise ratio larger and easier for WBS+SN to estimate the change point locations.For all four trend types we considered, the estimated number of change points N by WBS+Bootstrap are all correct in 200 replications (i.e., MSE = 0), which is another evidence to support the superiority of WBS+Bootstrap over WBS+SN.

Real data application
In this section, we compare the performance of the proposed change point location estimation method on the micro-array bladder tumor dataset.The ACGH (Array Comparative Genomic Hybridisation) data is publicly available and it contains log intensity ratio measurements for 43 individuals at 2215 different loci on their genome.The dataset is available in R package "ecp" and was also studied by Wang and Samworth (2018) and Wang et al. (2022).Following the latter paper, we only considered first 200 loci and perform change point estimation using WBS+Bootstrap and compare with WBS+SN.
To examine whether there are changes in the variance of each component, we apply the test for constant variance proposed by Schmidt et al. (2021) for a univariate time series to each of the 43 subjects.For a sequence of univariate random variable D 1 , . . ., D n , the test statistic is constructed as follows: where κ * 2 is the estimated long run variance We set the tuning parameters s = 0.7 and q = 0.5, following the recommendation in Schmidt et al. (2021).An appealing feature of this test is that it allows for changes in the mean, in particular, a piecewise Lipschitz-continuous mean function.We treat the resulting 43 p-values as independent and apply the Higher Criticism test (Donoho and Jin, 2004) to determine whether there is a variance change in any of the 43 dimensions.The resulting p-value is 0.022, which indicates quite strong evidence against the constant variance assumption for all components.The WBS+SN yields 6 change points {39, 74, 87, 134, 173, 191}, while the WBS+Bootstrap only reports 3 change points at {73, 135, 173}, which largely coincides with the three change-points {74, 134, 173} detected by WBS+SN.The additional change-point locations obtained from WBS+SN could be spurious due to the variance instability un-accounted for in the latter procedure.

Conclusion
In this paper, we develop a bootstrap-based test for the mean changes in high-dimensional heteroscedastic data.Existing literature on high-dimensional mean change detection exclusively focuses on the homoscedastic case, and the applicability of existing tests is questionable when there is time-varying heterscedasticity.Building on the U-statistic approach proposed in Wang et al. (2022), we develop a new test statistic and a bootstrap-based approximation for single change point testing.The bootstrap consistency is justified under mild assumptions on the heteroscedasticity and componentwise dependence.Our test involves no tuning parameters and is easy to implement.Extensions to multiple change-points testing and estimation using WBS are also presented.Numerical comparison demonstrates the robustness of our proposed testing and estimation procedures with respect to time-varying heteroscedasticity and the degree of panel dependence.
To conclude, we mention a few possible extensions.First, it would be interesting to extend our method to allow temporal dependence, that is, assuming {Z i } to be stationary and weakly dependent instead of independent observations.Under this setting, the Gaussian multiplier bootstrap may not be adequate.The dependent wild bootstrap proposed in Shao (2010a) may be needed to capture the serial dependence.Second, as the numerical results suggest, the bootstrap-based test may still work when the panel dependence is strong, i.e., compound symmetric case.It would be desirable to expand our theory to cover this interesting case.Third, we did not provide any theoretical support for the consistency of WBS+Bootstrap, although the empirical performance is very encouraging.Further theoretical investigation is left for future work.
Proposition A.1.Let Assumptions 3.1-3.4hold.Then where the centered Gaussian process Q is defined in Theorem 3.1.If Assumption 3.5 also holds then where Moreover, the sample paths of each process are asymptotically uniformly equicontinuous in probability with respect to the Euclidean metric in [0, 1] 2 .
The proof of Proposition A.1 is long and technical and will be split over several subsections.Since the proof of the second statement contains the proof of the first statement, we will only provide that proof.A close look will reveal that all parts which are relevant to showing the first part go through without Assumption 3.5.

Proof of Theorem 3.1. Begin by observing the representation
where Proposition A.1 and uniform asymptotic equi-continuity of the sample path of S n in probability together with some simple calculations yields, Since the sample paths of Q are uniformly continuous with respect to the Euclidean metric on [0, 1] 2 , a simple calculation shows that the sample paths of G(r; 0, 1) are uniformly continuous with respect to the Euclidean metric on [0, 1].Consider the maps Applying the extended continuous mapping theorem (see Theorem 1.11.1 in Van Der Vaart and Wellner (1996)), this implies which completes the proof.
Proof of Theorem 3.2 Similar arguments as given in the proof of Theorem 3.1 but utilizing equation (2) instead of the first part of that proposition show that where T ′ , T ′′ are i.i.d.copies of T and T * n,1 , T * n,2 are two copies of the bootstrap statistic each with independent sets of multipliers e i .
Next observe that by Corollary 1.3 and Remark 4.1 in Gaenssler et al. (2007) the function is continuous on R and strictly increasing on R + .This implies that the function since the latter is strictly increasing on R + .Thus the left support point of the cdf H must be in t ≤ 0. Hence by Theorem 1 in Tsirel'Son (1976) the function H is continuous on (0, ∞).Clearly The proof is completed by observing that the conclusion of Lemma 4.2 in Bücher and Kojadinovic (2019) remains true if continuity of the cdf F in there (corresponding to H in our case) is replaced by continuity on (0, ∞) and the additional assumption G −1 (1 − α) ∈ (0, ∞) is made (note that 1 − α in our notation corresponds to α in Bücher and Kojadinovic ( 2019)).The condition G −1 (1−α) ∈ (0, ∞) is guaranteed by the assumption α < 1/2.Next observe that (3) verifies Condition (a) in Lemma 2.2 in the latter paper.Condition 4.1 in Bücher and Kojadinovic ( 2019) is satisfied as well and relaxing continuity of the cdf of the limit was described above.This completes the proof.□

A.2. Proof of Proposition A.1
We begin by providing an overview of the proof: first, we show that the process S k, * n admits the representation This implies that each of the processes S n /n∥Σ∥ F , S 1, * n,1 /n∥Σ∥ F , S 2, * n,1 /n∥Σ∥ F is tight.Finally, we show joint finite-dimensional convergence of the processes S n /n∥Σ∥ F , S 1, * n,1 /n∥Σ∥ F , S 2, * n,1 /n∥Σ∥ F to the joint limit in (2), again under Assumptions 3.1-3.4(see section A.2.3).Combined, the results above imply the statement in (2).Note in particular that process convergence of S n /(n∥Σ∥ F ) follows under just Assumptions 3.1-3.4without utilizing Assumption 3.5.
Before proceeding, we state a useful technical Lemma that we will utilize in several places throughout the proof.
Lemma A.1.Under Assumptions 3.2 and 3.3 we have for a constant C independent of n, p and the distribution of Z we have for s = 4, 6 max j1,...,js,k1,...,ks=1,ki̸ =ji Observe that by the generalized version of Hölder's inequality Now let P s denote the set of disjoint partitions π of the set 1, . . ., s such that With this notation we obtain for k ̸ where the second equality uses stationarity and independence across t of {Z t } and the last inequality follows by Assumption 3.2.Setting C = |P s | completes the proof.□ A.2.1.Proof of ( 5) and (6) Both proofs follow the same principle.Observe that the processes S n , S * n,1 are piecewise constant on their index set and their values are entirely determined by their values on the grid {(i/n, j/n) : i, j = 0, . . ., n}.Now following the arguments in section 8.8.1 in Wang et al. (2022) it is clear that ( 5) and ( 6) follow if we prove that there exists a constant C which is independent of n such that sup Next, a close look at the proof of (8.18) in Wang et al. (2022) shows that it suffices to show that, for a possibly different constant C, The first bound is a direct consequence of Lemma A.1.For the second bound, note that by independence of e i and and the claim follows from Lemma A.1 since the e i are standard normal and have finite moments of all orders.Note that the proofs in this section did not make use of Assumption 3.5 and all arguments hold under Assumptions 3.1-3.4.□ A.2.2.Proof of (4) Throughout this section we will drop the index k in S k, * n for notational convenience.Consider the decomposition Observe that 0 ≤ XT X and that since we are under the null and the X i are centered.Thus since the above term is simply the process S n with e i instead of X i and S n converges weakly under Assumptions 3.1-3.4as argued in the beginning of section A.
We first deal with the second term.Observe that by the Cauchy-Schwarz inequality sup Now we have since this is simply the process S n with the new random vectors Xi = X i e 2 i .It is straightforward to check that Xi satisfy Assumption 3.1-3.4,and thus convergence of the process follows (recall that in the beginning of Section A.2 we argued that Assumptions 3.1-3.4suffice for process convergence of S n ).Combining all results so far we find that sup X i e i ≤ max k=1,...,n By Kolmogorov's maximal inequality max k=1,...,n where the last line follows since by Assumption 3.5.Hence it remains to show that max k=1,...,n To this end observe that for 1 where C 1 , C 2 are constants that are independent of n, k, ℓ and the distribution of X i and C is the constant from Assumption 3.2.Here the last line uses Lemma A.1.The second-to-last line follows since the e i are centered and independent across i, and so are the X i .Thus we can have at most two different values for the j i .Further, each k i has to be equal to either at least one j i or at least one other k i .This gives at most K(k − ℓ) 2 n 2 different choices for a universal constant K.
To conclude the proof define the process with index set T n = {i/n : i = 0, . . ., n} where G n (0) ≡ 0. The computation above implies that for for a universal constant C 3 where we used the fact that s ̸ = t, s, t ∈ T n implies |s − t| ≥ 1/n.Applying Corollary 2.2.5 from Van Der Vaart and Wellner (1996) with T, Ψ, d, X in the latter result defined as follows: here C 4 , K are constants that depend on ψ, C 3 only and are thus independent of n.An application of the Markov inequality yields sup Finally, observe that max k=1,...,n since by definition G n (0) = 0.This completes the proof of (8) and thus of (4) □

A.2.3. Finite dimensional convergence result
Proposition A.2.Under the Assumption 3.1-3.5,for any (a u,k , b u,k ) ∈ (0, 1) 2 , a u,k < b u,k and contrasts α u,k ∈ R, where u = 1, 2, 3, k = 1, . . ., K, it holds that Proof.Consider the following decomposition: where and . .e ′ i ) be a filtration, it is easy to check that is still a martingale.To get the convergence result, we need to check the following conditions: 1. ∀ϵ > 0, For Condition 1, it suffices to check that for any fixed interval (a, b) and u ∈ {1, 2, 3}, For the case u = 1 observe that by independence of the X i and since X i are centered where we applied Lemma A.1 for the last step.Thus Similarly we obtain for u = 2 and thus The case u = 3 is treated by exactly the same arguments and the proof of part 1 is complete.
For Condition 2, observe that the bootstrap multipliers are independent of X i 's, which implies for u ̸ = v, E[ξ u a1,i+1 ξ v a2,i+1 | F i ] = 0. Therefore, we have the following simplification To complete the proof, it remains to show for u ∈ {1, 2, 3}, Given this structure, it suffices to show for a X j e j ) 2 |F i ], Since M 1 M 2 , M 3 have a very similar structure we will only prove that the other two cases follow similarly.In what follows write M 2 for M 2 (a, b).Consider the following decomposition, 2 .
For M (1) 2 , Here the last equality follows since for symmetric matrices A, B, C we have and since H(•) are diagonal matrices.Thus letting A = H(j/n)H((i + 1)/n)Σ, B = H(j/n)H((i + 1)/n), C = Σ the claim follows after some simple computations.Next observe .
For the first part, let Σi,j := H denote a matrix with entries σi,j k,l .Then where the last inequality follows from Assumption 3.2 by repeated application of the Cauchy-Schwarz inequality.For instance ) and similarly for the other terms.The second sum in the representation of E[(M (1) 2 ) 2 ] can be rewritten as where we used the fact that × E[e j1 e j2 e j3 e j4 ] .
Only when j 1 = j 3 , j 2 = j 4 , the expectation is nonzero.Therefore, Here the inequality follows since by repeated application of the identity valid for symmetric matrices A, B, C as well as the cyclic permutation property of the trace operator we have Now the largest entry of the diagonal matrix is bounded by B 8 and Σ 4 has positive diagonal entries (since it can be seen as covariance matrix of (Σ 1/2 ) 3 Z 1 ) so that This shows that M (2) 2 p − → 0. Together with previous result, we have shown that This completes the proof.

A.3. Proof of Theorem 3.3: Power of the test
The following equivalent representation for the quantity G n (k) will be useful: This expression can be obtained by elementary calculations after multiplying out the products in the expression above.Further, recall that we assumed k * = ⌊nc⌋ for some constant c ∈ (0, 1).Define a new sequence of random vectors Y i , This sequence does not have a change point.Without loss of generosity, assume Y i 's are centered.
The remaining proof consists of a detailed analysis of the original test statistic and the bootstrap statistic under different types of alternatives.
For the bootstrap statistic, we will prove that under the null and any alternative S * X n satisfies where the remainder terms are uniform in a, b ∈ [0, 1] and S * Y n is defined in exactly the same way as S * X n but with Y i in place of X i .We will further show that where The remaining two cases are discussed below.
The case n∥∆∥ 2 2 /∥Σ∥ F → 0 In this case (10) implies Since the sequence Y i contains no change-points and satisfies all assumptions of Theorem 3.2, the proof follows from exactly the same arguments as the proof of the latter result.
The case n∥∆∥ 2 2 /∥Σ∥ F → ∞: from expression (10) we find that in this case T we obtain This completes the proof of Theorem 3.3.□ A.3.1.Proof of (11): Behaviour of G n (k) under the alternative Simple computations show that in the case k * > k, the statistic D n (k) admits the following decomposition where D Y n is defined similarly as D n but with Y i replacing X i .By Kolmogorov's maximal inequality we have where we used the bound max and the fact that under Assumption 1 we have ∥Σ∥ 2 = o(∥Σ∥ F ), see Remark 3.2 in Wang et al. (2022).
Combining this with the representation in ( 9) we find that, uniformly in 1 Finally, elementary computations show that for k We now discuss the consequence of this result for three types of alternatives.
case 1: n∥∆∥ 2 2 /∥Σ∥ F → 0 In this case we have Gn (k) Hence by the continuous mapping theorem  The first term corresponds to the case with no changepoint and has the same limiting behaviour as under the null.We now study he behaviour of the remainder terms.
The case c < a < b.The remainder terms take the form Similar to the arguments in the proof of Theorem 3, the result stated in Theorem 4 holds.

Fig 1 .
Fig 1.Power curves for single change point testing under different trend functions (a, b) = O P (tr(Σ)) = o P n∥Σ∥ F by Assumption 3.5.Next observe the decomposition

∆∆∆∆=∆∆∆∆∆∆∆≤∆
T ∆e i+1 e j For the last term, observe that this has the same form as S n (a, b)/n where X i are replaced by e i .The corresponding covariance matrix is Σ = 1 and hence by weak convergence of S n /n∥Σ∥ F under general conditions which are satisfied in this special case we have sup last term is of order O p (n∥∆∥ 2 2 ).The terms S * Y n,2 (a, b) and S * Y n,3 (a, b) will be handled together.Note thatS * Y n,2 (a, b) + S * Y n,3 (a, b) = T (Y j + Y i )e i e j − ⌊nb⌋−1 i=⌊na⌋+1 2∆ T Y i e 2 i − ∆ T Ȳ ⌊nb⌋−1 i,j=⌊na⌋+1,i̸ =j k * n e i e j .For the first term in the bracket observe that Y i ] = 0 and Y i are independent of e i we obtain by Kolmogorov's maximal inequality,sup T Y i e i = O p (n 1/2 max j V ar(∆ T Y j e j ) 1/2 ) = O p (n 1/2 (∆ T Σ∆) T Y i e 2 i = O p (n 1/2 max j V ar(∆ T Y j e 2 j ) 1/2 ) = O p (n 1/2 (∆ T Σ∆) O p (n 1/2 ).(15)For the last term we have by (18) and an elementary calculation using independence of theY i sup O p (V ar(∆ T Ȳ ) 1/2 )O p (n) = O p (n 1/2 (∆ T Σ∆) 1/2 ).(16)In summary, we have proved that in the case c < a T ∆e i+1 e j .This can be handled similarly to the case c < a < b.The case a < c < b.Compared to the case a < b < c we have the additional terms T (Y j − Ȳ )w i+1 e i+1 e j + T (Y i+1 − Ȳ )w j e i+1 e j + T ∆w i+1 w j e i+1 e j where w t = I{k * + 1 ≤ t ≤ n}. .For the last term, note thatw t = H(t/n) + r n (t)where H(x) = I{c ≤ x} and |r n (t)| ≤ I{|t − ⌊nc⌋ | ≤ 1}.Now a direct computation shows that the pieces involving r n (t) are negligible while the remaining term takes the form T ∆H((i + 1)/n)H(j/n)e i+1 e jThis has the same form as S n (a, b)/n where µ i = 0, H as above and Z i are replaced by e i .The corresponding covariance matrix is Σ = 1 and hence by weak convergence of S n /n∥Σ∥ F under general conditions which are satisfied in this special case we havesup i + 1)/n)H(j/n)e i+1 e j = O p (1).(18)Therefore, the last term is of order O p (n∥∆∥ 2 2 ).Next we bound the first two terms.We haveT (Y j − Ȳ )w i+1 e i+1 e j + Y i+1 − Ȳ )e i+1 e j − ⌊nb⌋−1 i=⌊nc⌋ ∆ T (Y i+1 − Ȳ )e i+1 e ⌊nc⌋ .Y i+1 − Ȳ )e i+1 e jhave a similar structure as in the case a < b < c and can be treated similarly as there.For the remaining two terms note that T (Y j − Ȳ )e i+1 e j − e ⌊nc⌋ ⌊nb⌋−1i=⌊nc⌋ ∆ T (Y i+1 − Ȳ )e i+1 T (Y j − Ȳ )e j + |e ⌊nc⌋ | • n 1/2 (∆ T Σ∆) 1/2 )O p (n 1/2 ) + O p (n −1/2 (∆ T Σ∆) 1/2 )O p (n) = O p (n(∆ T Σ∆) 1/2 )Where the last line uses the bounds in (13)-(16).Summarizing, we have proved that in the case a of Theorem 3.4 and Theorem 3.5: Theory for multiple change point testingUnder the null, the process convergence result of S * n (a, b) and continuous mapping theorem, 2 ; r 1 , 1), in probability.
05, 0.1 are reported based on 1000 Monte Carlo simulations.The results of SN-based test statistic for Size for single change point testing under different trend functions one change point (i.e., T n in Wang et al. (

Table 2
Size and power of multiple change points testing see section A.2.2.Thus it suffices to establish (2) with S k, * n,1 instead of S k, * n .Then, in section A.2.1, we show that under Assumptions 3.1-3.4,