Two-Sample and Change-Point Inference for Non-Euclidean Valued Time Series

Data objects taking value in a general metric space have become increasingly common in modern data analysis. In this paper, we study two important statistical inference problems, namely, two-sample testing and change-point detection, for such non-Euclidean data under temporal dependence. Typical examples of non-Euclidean valued time series include yearly mortality distributions, time-varying networks, and covariance matrix time series. To accommodate unknown temporal dependence, we advance the self-normalization (SN) technique (Shao, 2010) to the inference of non-Euclidean time series, which is substantially different from the existing SN-based inference for functional time series that reside in Hilbert space (Zhang et al., 2011). Theoretically, we propose new regularity conditions that could be easier to check than those in the recent literature, and derive the limiting distributions of the proposed test statistics under both null and local alternatives. For change-point detection problem, we also derive the consistency for the change-point location estimator, and combine our proposed change-point test with wild binary segmentation to perform multiple change-point estimation. Numerical simulations demonstrate the effectiveness and robustness of our proposed tests compared with existing methods in the literature. Finally, we apply our tests to two-sample inference in mortality data and change-point detection in cryptocurrency data.


Introduction
Statistical analysis of non-Euclidean data that reside in a metric space is gradually emerging as an important branch of functional data analysis, motivated by increasing encounter of such data in many modern applications.Examples include the analysis of sequences of age-at-death distributions over calendar years (Mazzuco & Scarpa 2015, Shang & Hyndman 2017), covariance matrices in the analysis of diffusion tensors in medical imaging (Dryden et al. 2009), and graph Laplacians of networks (Ginestet et al. 2017).One of the main challenges in dealing with such data is that the usual vector/Hilbert space operation, such as projection and inner product may not be well defined and only the distance between two non-Euclidean data objects is available.
Despite the challenge, the list of papers that propose new statistical techniques to analyze non-Euclidean data has been growing.Building on Fréchet mean and variance (Fréchet 1948), which are counterparts of mean and variance for metric space valued random object, Dubey & Müller (2019) proposed a test for comparing N (≥ 2) populations of metric space valued data.Dubey & Müller (2020a) developed a novel test to detect a change point in the Fréchet mean and/or variance in a sequence of independent non-Euclidean data.The classical linear and nonparametric regression has also been extended to metric spaced valued data; see Petersen & Müller (2019), Tucker et al. (2022), and Zhang et al. (2021), among others.So far, the majority of the literature on non-Euclidean data has been limited to independent data, and the only exceptions are Zhang et al. (2022) and Zhu & Müller (2021, 2022), which mainly focused on the autoregressive modeling of non-Euclidean valued time series.To the best of our knowledge, no inferential tools are available for non-Euclidean valued time series in the literature.
In this paper, we address two important problems: two-sample testing and change-point detection, in the analysis of non-Euclidean valued time series.These two problems are also well motivated by the data we analyzed in the paper, namely, the yearly age-at-death distributions for countries in Europe and daily Pearson correlation matrices for five cryptocurrencies.For time series data, serial dependence is the rule rather than the exception.This motivates us to develop new tests for non-Euclidean time series that is robust to temporal dependence.Note that the two testing problems have been addressed by Dubey & Müller (2019) and Dubey & Müller (2020a), respectively for independent non-Euclidean data, but as expected, their tests fail to control the size when there is temporal dependence in the series; see Section 5 for simulation evidence.
To accommodate unknown temporal dependence, we develop test statistics based on self-normalization (Shao 2010, Shao & Zhang 2010), which is a nascent inferential technique for time series data.It has been mainly developed for vector time series and has been extended to functional time series in Hilbert space (Zhang et al. 2011, Zhang & Shao 2015).The functional extension is however based on reducing the infinite dimensional functional data to finite dimension via functional principal component analysis, and then applying SN to the finite-dimensional vector time series.Such SN-based inference developed for time series in Hilbert space cannot be applied to non-Euclidean valued time series, since the projection and inner product commonly used for data in Hilbert space are not available for data objects that live in a general metric space.The SN-based extension to non-Euclidean valued time series is therefore fairly different from that in Zhang et al. (2011) and Zhang & Shao (2015), in terms of both methodology and theory.For independent non-Euclidean valued data, Dubey & Müller (2019, 2020a) build on the empirical process theory (van der Vaart & Wellner 1996) by regulating the complexity of the analyzed metric space, which is in general abstract and may not be easy to verify.In our paper, we take a different approach that is inspired by the M-estimation theory in Pollard (1985) and Hjort & Pollard (2011) for Euclidean data, and extend it to non-Euclidean setting.We assume that the metric distance between data and the estimator of the Fréchet mean admits certain decomposition, which includes a bias term, a leading stochastic term, and a remainder term.Our technical assumptions are more intuitive and could be easier to check in practice.Furthermore, we are able to obtain explicit asymptotic distributions of our test statistics under the local alternatives of rate O(n −1/2 ), where n is the sample size, under our assumptions, whereas they seem difficult to derive under the entropy integral type conditions employed by Dubey & Müller (2019, 2020a).
The remainder of the paper is organized as follows.Section 2 provides background of non-Euclidean metric space in which random objects of interest reside in, and some basic assumptions that will be used throughout the paper.Section 3 proposes SN-based two-sample tests for non-Euclidean time series.Section 4 considers SN-based change-point tests.Numerical studies for the proposed tests are presented in Section 5, and Section 6 demonstrates the applicability of these tests through real data examples.Section 7 concludes.Proofs of all results are relegated to Appendix A. Appendix B summarizes the examples that satisfy assumptions in Section 2, and Appendix C provides simulation results for functional time series.Some notations used throughout the paper are defined as follows.Let • denote the conventional Euclidean norm.Let D[0, 1] denote the space of functions on [0, 1] which are right continuous with left limits, endowed with the Skorokhod topology (Billingsley 1968).We use ⇒ to denote weak convergence in D[0, 1] or more generally in R m -valued function space D m [0, 1], where m ∈ N; → d to denote convergence in distribution; and → p to denote convergence in probability.A sequence of random variables X n is said to be O p (1) if it is bounded in probability.For x ∈ R, define x as the largest integer that is smaller than or equal to x, and x as the smallest integer that is greater than or equal to x.

Preliminaries and Settings
In this paper, we consider a metric space (Ω, d) that is totally bounded, i.e. for any > 0, there exist a finite number of open -balls whose union can cover Ω.For a sequence of stationary random objects {Y t } t∈Z defined on (Ω, d), we follow Fréchet (1948), and define their Fréchet mean and variance by respectively.Fréchet mean extends the traditional mean in linear spaces to more general metric spaces by minimizing expected squared metric distance between the random object Y t and the centroid akin to the conventional mean by minimizing the expected sum of residual squares.It is particularly useful for objects that lie in abstract spaces without explicit algebraic structure.Fréchet variance, defined by such expected squared metric distance, is then used for measuring the dispersion in data.
Given finite samples {Y t } n t=1 , we define their Fréchet subsample mean and variance as Note that both Fréchet (subsample) mean and variance depend on the space Ω and metric distance d, which require further regulation for desired inferential purposes.In this paper, we do not impose independence assumptions, and our technical treatment differs substantially from those in the literature, c.f. Petersen & Müller (2019), Dubey & Müller (2019, 2020a,b), Dubey & Müller (2021).
Assumption 2.1.µ is unique, and for some δ > 0, there exists a constant K > 0 such that, where B(•) is a standard Brownian motion.

respectively.
Several remarks are given in order.Assumptions 2.1-2.3 are standard and similar conditions can be found in Dubey & Müller (2019, 2020a) and Petersen & Müller (2019).Assumptions 2.1 and 2.2 are adapted from Assumption (A1) in Dubey & Müller (2020a), and are required for identification purpose.In particular, Assumption 2.1 requires that the expected squared metric distance Ed 2 (Y t , ω) can be well separated from the Fréchet variance, and the separation is quadratic in terms of the distance d(ω, µ).Assumption 2.2 is useful for obtaining the uniform convergence of the subsample estimate of Fréchet mean, i.e., μ[a,b] , which is a key ingredient in forming the selfnormalizer in SN-based inference.Assumption 2.3 is a pointwise weak law of large numbers, c.f. Assumption (A2) in Dubey & Müller (2020a).Assumption 2.4 requires the invariance principle to hold to regularize the partial sum that appears in Fréchet subsample variances.Note that d 2 (Y t , ω) takes value in R for any fixed ω ∈ Ω, thus both Assumption 2.3 and 2.4 could be implied by high-level weak temporal dependence conditions (e.g., strong mixing) in conventional Euclidean space, see Shao (2010Shao ( , 2015) ) for discussions.
Assumption 2.5 distinguishes our theoretical analysis from the existing literature.Its idea is inspired by Pollard (1985) and Hjort & Pollard (2011) for M-estimators.In the conventional Euclidean space, i.e. (Ω, d) = (R m , • ) for m ≥ 1, it is easy to see that the expansion in Assumption 2.5 holds with and R(Y t , ω, µ) ≡ 0. In more general cases, Assumption 2.5 can be interpreted as the expansion of d 2 (Y t , ω) around the target value d 2 (Y t , µ).In particular, K d d 2 (ω, µ) can be viewed as the bias term, g(Y t , ω, µ) works as the asymptotic leading term that is proportional to the distance d(ω, µ) while R(Y t , ω, µ) is the asymptotically negligible remainder term.More specifically, after suitable normalization, it reads as, .
And the verification of this assumption can be done by analyzing each term.In comparison, existing literature, e.g.Petersen & Müller (2019), Dubey & Müller (2019, 2020a), Dubey & Müller (2021), impose assumptions on the complexity of (Ω, d).These assumptions typically involve the behaviors of entropy integral and covering numbers rooted in the empirical process theory (van der Vaart & Wellner 1996), which are abstract and difficult to check in practice, see Propositions 1 and 2 in Petersen & Müller (2019).Assumption 2.5, on the contrary, regulates directly on the metric d and could be easily checked for the examples below.Moreover, Assumption 2.5 is useful for deriving local powers of tests to be developed in this paper, see Section 3.2 and 4.2 for more details.Examples that can satisfy Assumptions 2.1-2.5 include: • L 2 metric d L for Ω being the set of square integrable functions on [0, 1]; • 2-Wasserstein metric d W for Ω being the set of univariate probability distributions on R; • Frobenius metric d F for Ω being the set of square matrices, including the special cases of covariance matrices and graph Laplacians; • log-Euclidean metric d E for Ω being the set of covariance matrices.
We refer to Appendix B for more details of these examples and verifications of above assumptions for them.

Two-Sample Testing
This section considers two-sample testing in metric space under temporal dependence.For two sequences of temporally dependent random objects {Y (1) t } t∈Z on (Ω, d), we denote Y (i) t ∼ P (i) , where P (i) is the underlying marginal distribution of Y (i) t with Fréchet mean and variance µ (i) and V (i) , i = 1, 2. Given finite sample observations {Y (1) t } n1 t=1 and {Y (2) t } n2 t=1 , we are interested in the following two-sample testing problem, H 0 : P (1) = P (2) , v.s.H a : P (1) = P (2) .
Let n = n 1 + n 2 , we assume two samples are balanced, i.e. n 1 /n → γ 1 and n 2 /n → γ 2 with γ 1 , γ 2 ∈ (0, 1) and γ 1 + γ 2 = 1 as min(n 1 , n 2 ) → ∞.For r ∈ (0, 1], we define their recursive Fréchet sample mean and variance by A natural candidate test of H 0 is to compare their Fréchet sample mean and variance by contrasting (μ ).For the mean part, it is tempting to use d(μ 1 , μ(2) 1 ) as the testing statistic.However, this is a non-trivial task as the limiting behavior of d(μ 1 ) depends heavily on the structure of the metric space, which may not admit conventional algebraic operations.Fortunately, both V (1) 1 and V (2) 1 take value in R, and it is thus intuitive to compare their difference.In fact, Dubey & Müller (2019) propose the test statistic of the form where σ2 i is a consistent estimator of lim ni→∞ Var{ )}, i = 1, 2. However, U n requires both within-group and between-group independence, which is too stringent to be realistic for applications in this paper.When either of such independence is violated, the test may fail to control size, see Section 5 for numerical evidence.Furthermore, taking into account the temporal dependence requires replacing the variance by long-run variance, whose consistent estimation usually involves laborious tuning such as choices of kernels and bandwidths (Newey & West 1987, Andrews 1991).To this end, we invoke self-normalization technique to bypass the foregoing issues.
The core principle of self-normalization for the time series inference is to use an inconsistent long-run variance estimator that is a function of recursive estimates to yield an asymptotically pivotal statistic.The SN procedure does not involve any tuning parameter or involves less number of tuning parameters compared to traditional counterparts.See Shao (2015) for a comprehensive review of recent developments for low dimensional time series.For recent extension to inference for high-dimensional time series, we refer to Wang & Shao (2020) and Wang et al. (2022).

Test Statistics
Define the recursive subsample test statistic based on Fréchet variance as and then construct the SN based test statistic as where η ∈ (0, 1) is a trimming parameter for controlling the estimation effect of T n (r) when r is close to 0, which is important for deriving the uniform convergence of { √ nT n (r), r ∈ [η, 1]}, see Zhou & Shao (2013) and Jiang et al. (2022) for similar technical treatments.
The testing statistic (3) is composed of the numerator n[T n (1)] 2 , which captures the difference in Fréchet variances, and the denominator 2 , which is called self-normalizer and mimics the behavior of the numerator with suitable centering and trimming.For each r ∈ [η, 1], T n (r) is expected to be a consistent estimator for r(V (1) − V (2) ).Therefore, under H a , T n (1) is large when there is significant difference in Fréchet variance, whereas the key element T n (r) − rT n (1) in self-normalizer remains to be small.This suggests that we should reject H 0 for large values of D n,1 .Note that (3) only targets at difference in Fréchet variances.To detect the difference in Fréchet means, we can use contaminated Fréchet variance (Dubey & Müller 2020a) and The r , i = 1, 2, which indicates a small value for T C n (r).On the contrary, when d(µ (1) , µ (2) ) > 0, V C,(i) r could be much larger than n (r).The power-augmented test statistic is thus defined by where the additional term 2 that appears in the self-normalizer is used to stabilize finite sample performances.
Remark 3.1.Our proposed tests could be adapted to comparison of N -sample populations (Dubey & Müller 2019), where N ≥ 2.An natural way of extension would be aggregating all the pairwise differences in Fréchet variance and contaminated variance.Specifically, let the N groups of random data objects be {Y The null hypothesis is given as ] be the Fréchet subsample mean and variance, respectively, for the ith group, i = 1, • • • , N .For 1 ≤ i = j ≤ N , define the pairwise contaminated Fréchet subsample variance as and define the recursive statistics In the same spirit of the test statistics D n,1 and D n,2 , for n = N i=1 n i , we may construct their counterparts for the N -sample testing problem as and Compared with classical N -sample testing problem in Euclidean spaces, e.g.analysis of variance (ANOVA), the above modification does not require Gaussianity, equal variance, or serial independence.Therefore, they could be work for broader classes of distributions.We leave out the details for the sake of space.

Asymptotic Theory
Before we present asymptotic results of the proposed tests, we need a slightly stronger assumption than Assumption 2.4 to regulate the joint behavior of partial sums for both samples.
Theorem 3.1.Suppose Assumptions 2.1-2.5 (with 2.4 replaced by 3.1) hold for both {Y (1) Theorem 3.1 obtains the same limiting null distribution for Fréchet variance based test D n,1 and its poweraugmented version D n,2 .Although D n,2 contains contaminated variance T C n (1), its contribution is asymptotically vanishing as n → ∞.This is an immediate consequence of the fact that sup see proof of Theorem 3.1 in Appendix A. Similar phenomenon has been documented in Dubey & Müller (2019) under different assumptions.We next consider the power behavior under the Pitman local alternative, Theorem 3.2.Suppose Assumptions 2.1-2.5 (with 2.4 replaced by 3.1) hold for both {Y (1) t } n1 t=1 and {Y (2) 2 dr ; 2 dr ; where K d is defined in Assumption 2.5.
Theorem 3.2 presents the asymptotic behaviors for both test statistics under local alternatives in various regimes.In particular, D n,1 can detect differences in Fréchet variance at local rate n −1/2 , but possesses trivial power against Fréchet mean difference regardless of the regime of κ M .In comparison, D n,2 is powerful for differences in both Fréchet variance and Fréchet mean at local rate n −1/2 , which validates our claim that D n,2 indeed augments power.
Our results merit additional remarks when compared with Dubey & Müller (2019).In Dubey & Müller (2019), they only obtain the consistency of their test under either n ) → ∞, while Theorem 3.2 explicitly characterizes the asymptotic distributions of our test statistics under local alternatives of order O(n −1/2 ), which depend on κ V and κ M .Such theoretical improvement relies crucially on our newly developed proof techniques based on Assumption 2.5, and it seems difficult to derive such limiting distributions under empirical-process-based assumptions in Dubey & Müller (2019).However, we do admit that self-normalization could result in moderate power loss compared with t-type test statistics, see (Shao & Zhang 2010) for evidence in Euclidean space.
Note that the limiting distributions derived in Theorem 3.1 and Theorem 3.2 contain a key quantity ξ γ1,γ2 (r; σ 1 , σ 2 ) defined in ( 5), which depends on nuisance parameters σ 1 , σ 2 and ρ.This may hinder the practical use of the tests.The following corollary, however, justifies the wide applicability of our tests. where Therefore, by choosing C a = C b = 0 in Corollary 3.1, we obtain the pivotal limiting distribution The asymptotic distributions in Theorem 3.2 can be similarly derived by letting either Therefore, when either two samples are of the same length (γ 1 = γ 2 ) or two samples are asymptotically independent (ρ = 0), the limiting distribution D η is pivotal.In practice, we reject In Table 1, we tabulate commonly used critical values under various choices of η by simulating 50,000 i.i.d.N (0,1) random variables 10,000 times and approximating a standard Brownian motion by standardized partial sum of i.i.d.N (0,1) random variables.
against the single change-point alternative, The single change-point testing problem can be roughly viewed as two-sample testing without knowing where the two samples split, and they share certain similarities in terms of statistical methods and theory.
Recall the Fréchet subsample mean μ[a,b] and variance V[a,b] in (2), we further define the pooled contaminated variance separated by r ∈ (a, b) as Define the subsample test statistics and

Test Statistics
For some trimming parameters η 1 and η 2 such that η 1 > 2η 2 , and η 1 ∈ (0, 1/2), in the same spirit of D n,1 and D n,2 , and with a bit abuse of notation, we define the testing statistics where The trimming parameter η 1 plays a similar role as η in two-sample testing problem for stabilizing the estimation effect for relatively small sample sizes, while the additional trimming η 2 is introduced to ensure that the subsample estimates in the self-normalizers are constructed with the subsample size proportional to n.Furthermore, we note that the self-normalizers here are modified to accommodate for the unknown change-point location, see Shao & Zhang (2010), Zhang & Lavitas (2018) for more discussion.
where ∆ V ∈ R and ∆ M ∈ (0, ∞).The following theorem states the asymptotic power behaviors of SN 1 and SN 2 .Theorem 4.2.Suppose Assumptions 2.1-2.5 (with 2.4 replaced by 3.1) hold.If ∆ V = 0 and ∆ M = 0 are fixed, then under H an , if τ ∈ (η 1 , 1 − η 1 ), then as n → ∞, we have We note that Theorem 4.2 deals with the alternative involving two different sequences before and after the change-point, while Theorem 4.1 only involves one stationary sequence.Therefore, we need to replace Assumption 2.4 by Assumption 3.1.
Theorem 4.2 demonstrates that our tests are capable of detecting local alternatives at rate n −1/2 .In addition, it is seen from Theorem 4.2 that SN 1 is consistent under the local alternative of Fréchet variance change as |∆ V | → ∞, while SN 2 is consistent not only under |∆ V | → ∞ but also under the local alternative of Fréchet mean change as ∆ M → ∞.Hence SN 2 is expected to capture a wider class of alternatives than SN 1 , and these results are consistent with findings for two-sample problems in Theorem 3.2.
When H 0 is rejected, it is natural to estimate the change-point location by We will show that the estimators are consistent under the fixed alternative, i.e.H a : Before that, we need to regulate the behaviour of Fréchet mean and variance under H a .Let be the limiting Fréchet mean and variance of two mixture distributions indexed by α ∈ [0, 1].
Theorem 4.3 obtains the consistency of τi , i = 1, 2 when Fréchet variance changes.We note that it is very challenging to derive the consistency result when H a is caused by Fréchet mean change alone, which is partly due to the lack of explicit algebraic structure on (Ω, d) that we can exploit and the use of self-normalization.We leave this problem for future investigation.

Wild Binary Segmentation
To detect multiple change-points and identify the their locations given the time series {Y t } n t=1 , we can combine our change-point test with the so-called wild binary segmentation (WBS) (Fryzlewicz 2014).The testing procedure in conjunction with WBS can be described as follows.
Let I M = {(s m , e m )} m=1,2,...,M , where s m , e m are drawn uniformly from {0, 1/n, 1/(n − 1), . . ., 1/2, 1} such that ne m − ns m ≥ 20.Then we simulate J i.i.d samples, each sample is of size n, from multivariate Gaussian distribution with mean 0 and identity covariance matrix, i.e., for j = 1, 2, . . ., J, ∼ N (0, 1).For the jth sample k) that is computed based on sample {Z j nsm , Z j nsm +1 , . . ., Z j nem } and where ñm = ne m − ns m +1.Setting ξ as the 95% quantile of ξ 1 , ξ 2 , . . ., ξ J , we can apply our test in combination with WBS algorithm to the data sequence {Y 1 , Y 2 , . . .Y n } by running Algorithm 1 as WBS(0, 1, ξ).The main rational behind this algorithm is that we exploit the asymptotic pivotality of our SN test statistic, and the limiting null distribution of our test statistic applied to random objects is identical to that applied to i.i.d N (0,1) random variables.Thus this threshold is expected to well approximate the 95% quantile of the finite sample distribution of the maximum SN test statistic on the M random intervals under the null.In this section, we examine the size and power performance of our proposed tests in two-sample testing (Section 5.1), change-point detection (Section 5.2) problems, and provide simulation results of WBS based change-point estimation (Section 5.3).We refer to Appendix C with additional simulation results regarding comparison with FPCA approach for two-sample tests in functional time series.The time series random objects considered in this section include (i).univariate Gaussian probability distributions equipped with 2-Wasserstein metric d W ; (ii).graph Laplacians of weighted graphs equipped with Frobenius metric d F ; (iii).covariance matrices (Dryden et al. 2009) equipped with log-Euclidean metric d E .Numerical experiments are conducted according to the following data generating processes (DGPs): (i) Gaussian univariate probability distribution: we consider (ii) graph Laplacians: each graph has N nodes (N = 10 for two-sample test and N = 5 for change-point test) that are categorized into two communities with 0.4N and 0.6N nodes respectively, and the edge weight for the first community, the second community and between community are set as 0.4+arctan(U (iii) covariance matrix: such that all the entries of Z t,1 (resp.Z t,2 ) are independent copies of arctan(U t,1 ) (resp.δ 1 + δ 2 arctan(U t,2 )).
Furthermore, δ 1 ∈ [0, 0.3] and δ 2 ∈ [0.7, 1] are used to characterize the change in the underlying distributions.In particular, δ 1 can only capture the location shift, while δ 2 measures the scale change, and the case (δ 1 , δ 2 ) = (0, 1) corresponds to H 0 .For DGP (i) and (ii), i.e.Gaussian distribution with 2-Wasserstein metric d W and graph Laplacians with Euclidean metric d F , the location parameter δ 1 directly shifts Fréchet mean while keeping Fréchet variance constant; and the scale parameter δ 2 works on Fréchet variance only while holding the Fréchet mean fixed.For DGP (iii), i.e. covariance matrices, the log-Euclidean metric d E operates nonlinearly, and thus changes in either δ 1 or δ 2 will be reflected on changes in both Fréchet mean and variance.
The comparisons of our proposed methods with Dubey & Müller (2019) for two-sample testing and Dubey & Müller (2020a) for change-point testing are also reported, which are generally referred to as DM.

Two-Sample Test
For the two-sample testing problems, we set the sample size as n 1 = n 2 ∈ {50, 100, 200, 400}, and trimming parameter as η = 0.15.Table 3 presents the sizes of our tests and DM test for three DGPs based on 1000 Monte Carlo replications at nominal significance level α = 5%.
In all three subtables, we see that: (a) both D 1 and D 2 can deliver reasonable size under all settings; (b) DM suffers from severe size distortion when dependence magnitude among data is strong; (c) when two samples are dependent, i.e. a = 0.5, DM is a bit undersized even when data is temporally independent.These findings suggest that our SN-based tests provide more accurate size relative to DM when either within-group temporal dependence or between-group dependence is exhibited.

Change-Point Test
For the change-point testing problems, we set the sample size n ∈ {200, 400, 800}, and trimming parameter as (η 1 , η 2 ) = (0.15, 0.05).Table 4 outlines the size performance of our tests and DM test for three DGPs based on 1000 Monte Carlo replications at nominal significance level α = 5%.DM tests based asymptotic critical value and bootstraps (with 500 replications) are denoted as DM a and DM b , respectively.
From Table 4, we find that SN 1 always exhibits accurate size while SN 2 is a bit conservative.As a comparison, the tests based on DM a and DM b suffer from severe distortion when strong temporal dependence is present, although DM b is slightly better than DM a in DGP (i) and (ii).In Figure 2, we plot the size-adjusted power of our tests and DM test based on bootstrap calibration.Here, the size-adjusted power of DM b is implemented following Domínguez & Lobato (2000).Similar to the findings in change-point tests, we find that SN 1 has trivial power in DGP (i) and (ii) when there is only Fréchet mean change and is worst among all three tests.Furthermore, SN 2 is slightly less powerful compared to DM and the power loss is moderate.Considering its better size control, SN 2 is preferred.We further provide numerical evidence for the estimation accuracy by considering the alternative hypothesis of δ 1 = 1 − δ 2 = 0.3 with true change-point location at τ = 0.5 for DGP (i)-(iii) in the main context.When varying sample size n ∈ {400, 800, 1600}, we find that for all DGPs, the histograms of τ (based on SN 2 ) plotted in Figure 3 get more concentrated around the truth τ = 0.5, when sample size increases, which is consistent with our theoretical consistency of τ .

Multiple Change Point Detection
For simulations of multiple change point estimation, we consider non-Euclidean time series of length n = 500 generated from the following two models.These models are the same as before, but reformulated for better presentation purpose.
• Gaussian univariate probability distribution: • covariance matrix: Here, U t are generated according to the AR(1) process U t = ρU t−1 + t , t i.i.d.
As for the data generating model of covariance matrices, we set Cases 1 and 2 correspond to non-monotone changes and Case 3 considers the monotone change.Here, our method described in Section 4.3 is denoted as WBS-SN 2 (that is, a combination of WBS and our SN 2 test statistic).The method DM in conjunction with binary segmentation, referred as BS-DM, is proposed in Dubey & Müller (2019) and included in this simulation for comparison purpose.In addition, our statistic SN 2 in combination with binary segmentation, denoted as BS-SN 2 , is implemented and included as well.The critical values for BS-DM and BS-SN 2 are obtained from their asymptotic distributions respectively.
The simulation results are shown in Table 5, where we present the ARI (adjusted rand index) and number of detected change points for two dependence levels ρ = 0.3, 0.6.Note that ARI ∈ [0, 1] measures the accuracy of change point estimation and larger ARI corresponds to more accurate estimation.We summarize the main findings as follows.(a) WBS-SN 2 is the best method in general as it can accommodate both monotonic and non-monotoic changes, and appears quite robust to temporal dependence.For Cases 1 and 2, we see that BS-SN 2 does not work for non-monotone changes, due to the use of binary segmentation procedure.(b) BS-DM tends to have more false discoveries comparing to the other methods.This is expected, as method DM is primarily proposed for i.i.d data sequence and exhibit serious oversize when there is temporal dependence in Section 5.2.(c) When we increase ρ = 0.3 to ρ = 0.6, the performance of WBS-SN 2 appears quite stable for both distributional time series and covariance matrix time series.

Applications
In this section, we present two real data illustrations, one for two sample testing and the other for change-point detection.Both datasets are in the form of non-Euclidean time series and neither seems to be analyzed before by using techniques that take into account unknown temporal dependence.

Two sample tests
Mortality data.Here we are interested in comparing the longevity of people living in different countries of Europe.From the Human Mortality Database (https://www.mortality.org/Home/Index),we can obtain a time series that consists of yearly age-at-death distributions for each country.We shall focus on distributions for female from year 1960 to 2015 and there are 26 countries included in the analysis after exclusion of countries with missing data.Pair-wise two sample tests between the included countries are performed using our statistic D 2 to understand the similarity of age-at-death distributions between different countries.The resulting p-value matrix is plotted in Figure 4 (left).
To better present the testing results and gain more insights, we define the dissimilarity between two given countries by subtracting each p-value from 1. Treating these dissimilarities as "distances", we apply multidimensional scaling to "project" each country onto two dimensional plane for visualization.See Figure 4 (right) for the plot of "projected" countries.It appears that several west European countries, including UK, Belgium, Luxembourg, Ireland, and Austria, and Denmark, form a cluster; whereas several central and eastern European countries, including Poland, Latvia, Russian, Bulgaria, Lithuania and Czechia share similar distributions.We suspect the similarity in Mortality distribution is much related to the similarity in their economic development and healthcare system, less dependent on the geographical locations.Belarus, Belgium, Bulgaria, Czechia, Denmark, Estonia, Finland, France, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Netherlands, Norway, Poland, Portugal, Russia, Slovakia, Spain, Sweden, Switzerland, UK, are labeled by 1, 2, . . ., 26 respectively.The right figure is the plot of points from multidimensional scaling with dissimilarity between any countries defined as subtracting the corresponding p-value from 1.

Change point detection
Cryptocurrency data.Detecting change points in the Pearson correlation matrices for a set of interested cryptocurrencies can uncover structural breaks in the correlation of these cryptocurrencies and can play an important role in the investors' investment decisions.Here, we construct the daily Pearson correlation matrices from minute prices of Bitcoin, Doge coin, Cardano, Monero and Chainlink for year 2021.The cryptocurrency data can be downloaded at https://www.cryptodatadownload.com/analytics/correlation-heatmap/.See Figure 5 for the plot of time series of pairwise correlations.Three methods, namely, our SN 2 test combined with WBS (WBS-SN 2 ), SN 2 test combined with binary segmentation (BS-SN 2 ), and DM test of Dubey & Müller (2019) in conjunction with binary segmentation (BS-DM), are applied to detect potential change points for this time series, Method WBS-SN 2 detects an abrupt change on day 2021-05-17 and method BS-SN 2 detects a change point on day 2021-04-29.By comparison, more than 10 change points are detected by BS-DM and we suspect that many of them are false discoveries (see Section 5.3 for simulation evidence of BS-DM's tendency of over-detection).The change point in mid-May 2021 is well expected and corresponds to a major crush in crypto market that wiped out 1 trillion dollars.The major causes of this crush are the withdrawal of Tesla's commitment to accept Bitcoin as payment and warnings regarding cryptocurrency sent by Chinese central bank to the financial institutes and business in China.Since this major crush, the market has been dominated by negative sentiments and fear for a recession.We refer the following CNN article for some discussions about this crush https://www.cnn.com/2021/05/22/investing/crypto-crash-bitcoin-regulation/index.html.

Conclusion
Motivated by increasing availability of non-Euclidean time series data, this paper considers two-sample testing and change-point detection for temporally dependent random objects.Our inferential framework builds upon the nascent SN technique, which has been mainly developed for conventional Euclidean time series or functional time series in Hilbert space, and the extension of SN to the time series of objects residing in metric spaces is the first in the literature.The proposed tests are robust to weak temporal dependence, enjoy effortless tuning and are broadly applicable to many non-Euclidean data types with easily verified technical conditions.On the theory front, we derive the asymptotic distributions of our two sample and change-point tests under both null and local alternatives of order O(n −1/2 ).Furthermore, for change-point problem, the consistency of the changepoint estimator is established under mild conditions.Both simulation and real data illustrations demonstrate the robustness of our test with respect to temporal dependence and the effectiveness in testing and estimation problems.
To conclude, we mention several interesting but unsolved problems for analyzing non-Euclidean time series.For example, although powerful against Fréchet mean differences/changes, the testing statistics developed in this paper rely on the asymptotic behaviors of Fréchet (sub)sample variances.It is imperative to construct formal tests that can target directly at Fréchet mean differences/changes.For the change-point detection problem in non-Euclidean data, the existing literature, including this paper, only derives the consistency of the change-point estimator.It would be very useful to derive explicit convergence rate and the asymptotic distribution of the change-point estimator, which is needed for confidence interval construction.Also it would be interesting to study how to detect structural changes when the underlying distributions of random objects change smoothly.We leave these topics for future investigation.

A.1 Auxiliary Lemmas
We first introduce some notations.We denote o up (•) as the uniform o p (•) w.r.t. the partial sum index (a, b) The following three main lemmas are verified under Assumption 2.1-2.5, and they are used repeatedly throughout the proof for main theorems.
Then, by triangle inequality, we have Without loss of generality, we assume v > u, and by the boundedness of the metric, we have for some K > 0, Hence, the result follows by letting δ 1 and δ 2 sufficiently small.Thus, the uniform convergence holds.
(2).We then derive the convergence rate based on Assumption 2.5.By the consistency, we have for any δ > 0, P (sup where the last inequality holds by Assumption 2.5 and the fact ( nb − na )/n > η/2 for large n.
Note the above analysis holds uniformly for (a, b) ∈ I η , this implies that sup (a,b)∈Iη Proof.By Lemma A.1, and Assumption 2.5, we have sup Hence, we have that sup the result follows.
Proof.By triangle inequality and Lemma A.2, √ n sup Note by triangle inequality for the metric, ), and we know that d(ω, µ) < δ with probability tending to 1, and on this event, by Assumption 2.5, Similar to the proof of Lemma A.2, we get the result.
The result follows by the continuous mapping theorem.

A.3 Proof of Theorems in Section 4
With a bit abuse of notation, we define Proof of Theorem 4.1 .
Furthermore, we note that The result follows by continuous mapping theorem.
Proof of Theorem 4.2

Note for any
We focus on k * = nτ .In this case, the left and right part of the self-normalizer are both from stationary segments, hence by similar arguments as in H 0 , and where ] for i = 1, 2. Hence, we only need to consider the numerator, where For √ nT n (τ ; 0, 1), we have =T 11 + T 12 + T 13 .
By Lemma A.2, we know that √ n( V[0,τ] − Ṽ[0,τ] ) = o p (1), and by Assumption 3.1, we have Similarly, we can obtain Hence, using the fact that ), and by triangle inequality, we know that Then, by Assumption 2.5, we know Hence, combining results of ( 20)-( 24), we have Therefore, we know that for the 1 − α quantile of S η , denoted by .
By the proof of (1) in Lemma A.1, for i = 1, 2, we have where the second inequality holds by the boundedness of the metric and (25), and the third inequality holds by the triangle inequality of the metric.
(3).The proof is similar to (2).By continuous mapping theorem, we obtain that for i = 1, 2, Let Ω be the set of univariate CDF function on R, consider the 2-Wasserstein metric defined by where G 1 and G 2 are two inverse CDFs or quantile functions.The verification of Assumption 2.1 and 2.2 can be found in Proposition C.1 in Dubey & Müller (2020a).Furthermore, by similar arguments as Example 1, Assumption 2.5 holds under weak temporal dependence conditions, see Berkes et al. (2013).

B.3 Example 3: Frobenius metric d F for graph Laplacians or covariance matrices
Let Ω be the set of graph Laplacians or covariance matrices of a fixed dimension r, with uniformly bounded diagonals, and equipped with the Frobenius metric d F , i.e.
for two r × r matrices Σ 1 and Σ 2 .The verification of Assumption 2.1 and 2.2 can be found in Proposition C.2 in Dubey & Müller (2020a).We only consider Assumption 2.5.
Note that and still suffer from size distortion when n is small.In particular, the size distortion when K = 4 is considerably larger than that for K = 2 in the presence of temporal dependence.6.8 11.7 5.9 10.3 400 4.9 5.8 25.0 7.0 8.4 6.1 6.8 Figure 6 further compares their size-adjusted powers when n 1 = n 2 = 400 and ρ = 0.4.As can be seen, D 1 possesses trivial power against mean differences while D 2 is rather stable in all settings with evident advantages in Cases 2m and 3m.In contrast, the power performances of DM, ZSM and ZSV vary among different settings.For example, when the alternative signal function is in the span of leading eigenfunctions, i.e.Cases 1m and 1v, ZSM and ZSV with K = 2 can deliver (second) best power performances as expected, while they are dominated by other tests when the alternative signal function is orthogonal to eigenfunctions in Cases 3m and 3v.As for DM, it is largely dominated by D 2 in terms of mean differences, although it exhibits moderate advantage over D 2 for covariance operator differences.
In general, whether the difference in mean/covariance operator is orthogonal to the leading eigenfunctions, or lack thereof, is unknown to the user.Our test D 2 is robust to unknown temporal dependence, exhibits quite accurate size and delivers comparable powers in all settings, and thus should be preferred in practice.
Note that T n (r; a, b) and T C n (r; a, b) are natural extensions of T n (r) and T C n (r) from two-sample testing problem to change-point detection problem by viewing {Y t } nr t= na +1 and {Y t } nb t= nr +1 as two separated samples.Intuitively, the contrast statistics T n (r; a, b) and T C n (r; a, b) are expected to attain their maxima (in absolute value) when r is set at or close to the true change-point location τ .

Figure 4 :
Figure 4: Application to Mortality data.The left figure is the plot of p-value matrix, where the 26 countries, Austria, Belarus, Belgium, Bulgaria, Czechia, Denmark, Estonia, Finland, France, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Netherlands, Norway, Poland, Portugal, Russia, Slovakia, Spain, Sweden, Switzerland, UK, are labeled by 1, 2, . . ., 26 respectively.The right figure is the plot of points from multidimensional scaling with dissimilarity between any countries defined as subtracting the corresponding p-value from 1.

Figure 5 :
Figure 5: Plot of time series of pairwise correlations.Red vertical line indicates detected change point by WBS-SN 2 .

Table 1 :
Simulated critical values Q Dη (1 − α)Inspired by the two-sample tests developed in Section 3, this section considers the change-point detection problem for a sequence of random objects {Y t } n t=1 , i.e.

Table 2 :
Simulated critical values Q Sη (1 − α) Recall in Theorem 3.2, we have obtained the local power of two-sample tests D n,1 and D n,2 at rate n −1/2 .To this end, consider the local alternative

Table 3 :
Size Performance (×100%) at α =5% for all three DGPs.Gaussian Distribution based on d W Graph Laplacian based on d F Covariance Matrix based on d E

Table 5 :
Simulation results for multiple change point for sequential Gaussian distributions and covariance matrices based on 200 Monte Carlo repetitions.