Detecting structural breaks in eigensystems of functional time series

Detecting structural changes in functional data is a prominent topic in statistical literature. However not all trends in the data are important in applications, but only those of large enough influence. In this paper we address the problem of identifying relevant changes in the eigenfunctions and eigenvalues of covariance kernels of $L^2[0,1]$-valued time series. By self-normalization techniques we derive pivotal, asymptotically consistent tests for relevant changes in these characteristics of the second order structure and investigate their finite sample properties in a simulation study. The applicability of our approach is demonstrated analyzing German annual temperature data.


Introduction
The analysis of functional data has gained increasing attention during the past decades, due to recent advances in computing and data collecting technologies. This surging interest is testified by a rapidly expanding scope of new statistical methods, as reviewed in the monographs of Bosq (2000), Ramsay and Silverman (2005), Ferraty and Vieu (2010), Horváth and Kokoszka (2012) and Hsing and Eubank (2015).
Applications of functional data analysis include such diverse topics as imaging, meteorology, genomics, and economics. The analytical link between these fields lies in modelling observations as random functions, whether they are temperature curves or stock prices. While this approach facilitates the data's interpretation for users, it in exchange poses theoretical challenges, since each observation is now an element of a complex function space.
Consequently dimension reducing procedures play a key role in this field, as they make functional data amenable to the methods of finite dimensional statistics. Among these, functional principal component analysis (fPCA) has taken the most prominent position. As principal component analysis (PCA) for finite dimensions, fPCA is based on projecting the data on linear subspaces that explain most of its variance. These spaces are spanned by the eigenfunctions of the estimated covariance operator. An overview of the mathematical aspects of this procedure can be found in the monographs of Ramsay and Silverman (2005) and Horváth and Kokoszka (2012) and in the survey of Shang (2014). Recently, Aue et al. (2015) used functional principal components for prediction in functional autoregressive models, Shang (2017) constructed forecasts with dynamic updating based on functional principal component regression and Gao et al. (2019) proposed dynamic fPCA for forecasting mortality rates. Obviously dimension reducing procedures rest upon the assumption of "stable" eigensystems, i.e. that most of the variance of all data is confined to the same, low-dimensional subspace. This insight has furnished interest in methods to validate this assumption.
One option to investigate the stability of the eigensystem is change point analysis, where one is monitoring a functional time series for structural breaks in the corresponding covariance operators. In particular, Aston and Kirch (2012) and Stoehr et al. (2019) develop a powerful methodology to detect changes in the covariance operator. Similarly changes in the cross-covariance operator have been investigated by Rice and Shum (2019). However, with the specific focus on fPCA it might be reasonable to conduct a refined analysis and to search directly for changes in the eigenfunctions and eigenvalues of the covariance operator. Yet, not much literature can be found in this direction.
The present paper contributes to this problem in two respects. On the one hand we develop a new statistical methodology for change point analysis of the eigenvalues and eigenfunctions of a covariance operator of a functional times series. While a test for a change in the spectrum of the covariance operator has already been proposed by Aue et al. (2018), we are -to the best of our knowledge -not aware of any procedure addressing the problem of change point detection in the eigenfunctions corresponding to a sequence of functional data. On the other hand -in contrast to the cited work, which has its focus on the "classical hypotheses" of strict equality-we propose tests for "precise" hypotheses as introduced in Berger and Delampady (1987). This means that we are NOT interested in asserting arbitrarily small differences in the eigensystem before and after the change versus the hypothesis of exact equality. Rather, we try to detect or reject changes of relevant magnitude.
For example, for the maximal eigenvalue of the covariance operator of the random function X n , say τ 1,n , this means that -in contrast to Aue et al. (2018) -we do not consider the null hypothesis τ 1,1 = . . . = τ 1,N , but develop a statistical methodology to test the hypothesis of no relevant deviation of the eigenvalue before and after the change point, that is Here τ (1) 1 := τ 1,1 = . . . = τ 1, Nθ0 = τ 1, Nθ0 +1 = . . . = τ 1,N =: τ (2) 1 for some θ 0 ∈ (0, 1) and Δ τ is a given threshold defined by the concrete application (under the null hypothesis the change is considered as not relevant). The hypotheses regarding the other eigenvalues and eigenfunctions are formulated similarly (see equations (2.4) and (2.5) for more details).
The consideration of relevant hypotheses in the context of change point detection has been introduced in Dette and Wied (2016) and is motivated by the observation that in many applications one is not interested in small changes of a parameter. For example, in forecasting of functional times series, it is not reasonable to use only part of the data if a structural break in an eigenvalue (or eigenfunction) is detected, but the difference before and after the change is rather small. In this case discarding the data before the change could admittedly reduce the prediction bias, but come at the cost of a substantially increased variance due to a smaller sample size used for prediction.
Relevant hypotheses have been considered in statistics to different degrees since the mid 20th century (see Hodges and Lehmann (1954)), and have been investigated intensively in biostatistics, where tests for "bioequivalence" of certain drugs have nowadays become standard (see for example Wellek (2010)). In the context of change point analysis for functional data relevant hypotheses have recently been considered by Dette et al. (2019) for Banach-space valued random variables and by Dette et al. (2018) in Hilbert spaces. The first named paper concentrates on inference regarding the mean functions while Dette et al. (2018) developed tests for a relevant structural break in the mean function and in the covariance operator. The detection of structural breaks in the eigenvalues and eigenfunctions is a substantially more difficult problem due to their implicit definition and statistical tests have mainly been developed for the two sample case (see Zhang and Shao (2015), who consider classical hypotheses and Aue et al. (2019), who discuss relevant hypotheses).
The aim of the present work is to develop statistical methodology for detecting relevant changes in the eigensystem of a functional time series. In Section 2 we introduce the testing problems, define corresponding test statistics and give the main theoretical results. Typically in change point problems of this type estimation of the long run covariance structures is required, which is nearly intractable in the present context, because it involves all eigenvalues and eigenfunctions of the covariance operators before and after the change point (see, for example, Dauxois et al. (1982) or Hall and Hosseini-Nasab (2006) for an explicit representation of the estimated eigenvalues and eigenfunctions in terms of the empirical covariance operator). We propose a self-normalization approach which avoids this problem. In Section 3 we illustrate our approach by virtue of a small simulation study, as well as the investigation of the German weather data. Finally, in Appendices A and B we provide the proofs of our findings and also give some auxiliary results.

Testing for relevant changes
In this section we provide a precise outline of the testing problems considered in this paper and subsequently present the main theoretical results. Let L 2 [0, 1] denote the Hilbert space of square integrable functions f : [0, 1] → R equipped with the common inner product The corresponding norm is denoted by · . Notice that according to the induced metric, functions that differ only on a set of Lebesgue mass 0 are identified. Now suppose we observe a sample of N random functions X 1 , ..., X N ∈ L 2 [0, 1], where for any n ∈ {1, ..., N } n ) n∈Z are stationary sequences of centered, random functions in L 2 [0, 1] and θ 0 ∈ (0, 1) is a constant of proportionality. The assumption of vanishing expectations is made for the sake of a simple notation and all results presented in this paper hold in the case EX . For a more detailed discussion of this case see Remark 2.7. A general definition of expectations of random functions in L 2 [0, 1] can be found in Bücher et al. (2019). However in the subsequent discussion we will always assume that E X which implies that expectations can be defined point-wise (compare Horváth and Kokoszka (2012), Section 2.2). Under assumption (2.2) the covariance kernel c (i) of X (i) 1 (i = 1, 2) is almost everywhere defined and given by Regarded as a function with two arguments it is an element of L 2 ([0, 1] × [0, 1]), the space of square integrable functions on the unit square, which can be isomorphically identified with the tensor product Hilbert space L 2 [0, 1] ⊗ L 2 [0, 1] (for details see Weidmann (1980)). We will also denote the induced norm of this space by · , since it will always be clear from the context, which space we refer to. By Mercer's theorem (see König (1986) p. 145) the kernels c (1) and c (2) permit the L 2 -expansions .. are the corresponding eigenvalues. For simplicity of reference we assume for some fixed p ∈ N that the first p + 1 eigenvalues are the largest and that they are arranged in descending order, i.e. τ . Furthermore the sets of eigenfunctions are supposed to form orthonormal bases of the space L 2 [0, 1], which can always be enforced by adding further, orthogonal functions (with corresponding eigenvalues 0).
Based on the sample of observations X 1 , ..., X N we want to investigate relevant changes in the eigensystems corresponding to c (1) and c (2) . More precisely, for some j ∈ {1, ..., p} we test whether the difference of the j-th eigenvalues τ j exceeds a predetermined threshold. To be precise, we consider for a fixed index j ∈ {1, ..., p} the hypotheses and Here Δ τ and Δ v are prespecified constants, denoting the maximal values for which the distances between the eigenvalues and eigenfunctions are still considered scientifically irrelevant. The particular choice of Δ τ and Δ v depends on the concrete application. Note also that for Δ τ = 0 or Δ v = 0 the hypotheses (2.4) and (2.5) reduce to the classical change point detection problems for eigenvalues and eigenfunctions respectively. In order to decide whether a relevant change either in the eigenvalues or in the eigenfunctions has occurred we first have to identify the change point θ 0 .

Change point estimation
The change point estimator is constructed by the CUSUM principle and defined byθ where the function f is given by Note that in definition (2.6) we confine the maximization of f to a subset of {1, ..., N } to obtain stable estimators. In practice this restriction is not an issue and even very small values for ε can be used in (2.6). We refer to Section 3.3), where we demonstrate the stability of the estimator with respect to the choice of the threshold ε by means of a simulation study. In the same section (Remark 3.1) we discuss advantages and disadvantages of our change point estimator and consider alternative approaches. Before we proceed we specify the basic assumptions required for the theoretical statements presented in this paper.
1. The sequence of random functions (X for some measurable, non-random function g (i) :
Note that these assumptions match those in Berkes et al. (2013), who derived weak invariance principles for m-approximable sequences. However, the stronger summability condition (2.8) is imposed here, since we are not estimating mean functions, but covariance kernels. We now state a first result concerning the convergence rate of the change point estimator defined in (2.6). The proof follows by similar arguments as given in the proof of Proposition 3.1 in Dette et al. (2018), which are omitted for the sake of brevity.

Relevant changes in the eigenvalues
We now proceed to construct a test for the hypothesis (2.4) of a relevant change in the j-th eigenvalue. For this purpose we define for i = 1, 2 and λ ∈ [0, 1] the eigenfunctionsv (2.11) and the eigenvaluesτ of the estimatesĉ (i) (λ, ·, ·) (defined in (2.9) and (2.10)). Finally we denote bŷ the eigensystems of the estimated covariance operators of the full samples X 1 , . . . , X Nθ (i = 1) and X Nθ +1 , . . . , X N (i = 2). Note that the eigenfunctions are only determined up to a sign. Additionally, we define the estimated squared difference of the j-th eigenvalues bŷ In view of the testing problem in (2.4) the natural entity of interest is the statisticÊ is defined in (2.13). The null hypothesis (2.4) of no relevant change in the j-th eigenvalue is now rejected for large values ofÊ j,N . To find critical values for such a test we determine the asymptotic distribution ofÊ j,N , which presupposes the following standard identifiability assumption (see e.g. Horváth and Kokoszka (2012), Hall and Hosseini-Nasab (2006)).
Assumption 2.3. The first p+1 eigenvalues of the covariance kernel c (i) satisfy τ It will be shown in the Appendix that under the Assumptions 2.1 and 2.3 the statisticÊ j,N is asymptotically normal in the sense that where the symbol → D denotes weak convergence, E j := (τ j ) 2 is the squared (unknown) difference between the j-th eigenvalues of the kernels c (1) and c (2) , and N (μ, σ 2 ) denotes a normal distribution with mean μ and variance σ 2 . In particular, if σ 2 = 0 this distribution is defined as the point measure with probability mass 1 at the point μ. The variance of the normal distribution in (2.14) can be decomposed as (2) E are notoriously difficult to estimate. We circumvent this problem using selfnormalization techniques. This concept has been introduced for change point detection in a seminal paper by Shao and Zhang (2010) and since then been used by many authors. While most of this literature concentrates on classical change point problems, Dette et al. (2018) introduced a novel type of self-normalization for relevant hypotheses and used it to define a self-normalized test for a relevant change in the mean of a time series. In the following we will further develop this concept to detect relevant changes in the spectrum. For this purpose we define a normalizing factor where ν is a probability measure on the interval (0, 1). Even though the specific choice of ν in (2.16) is generally not influential, it is numerically convenient to use a discrete measure in applications rather than some mathematically more natural choice like the Lebesgue measure. The next Proposition is the central building block to prove the feasability of the normalization approach.

Proposition 2.4. Suppose that Assumptions 2.1 and 2.3 hold, that
for some j ∈ {1, . . . , p}. Then the following weak convergence holds Combining the weak convergence in (2.17) with the continuous mapping Theorem yields that,Ê where the random variable is a pivot. Some quantiles of the distribution of W can be found in Table 1 (where ν is a discrete uniform distribution). We can now construct an asymptotic level-α-test rejecting the null hypothesis in (2.4), wheneverÊ where q 1−α is the asymptotic (1 − α)-quantile of the distribution of W . These considerations are summarized in the following theorem. (2.21) It should be noted that by the same arguments as above a test can be constructed for the hypothesis of a relevant difference in the eigenvalues before and after the change point, that is ( 2.22) The corresponding test rejects if and the same arguments as in the proof of Theorem 2.5 show that this decision rule defines a consistent and asymptotic level-α-test, that is The formulation of the hypothesis in the form (2.23) is useful if one wants to establish the similarity between the eigenvalues at a controlled type-I-error. Hypotheses of the form (2.22) are frequently investigated in biostatistics, in particular in bioequivalence studies (see, for example, Wellek (2010)).

Relevant changes in the eigenfunctions
Similar techniques as in the preceeding section can be employed in the analysis of the hypothesis (2.5) of no relevant change in the j-th eigenfunction. This task is slightly more intricate, as we are now dealing with L 2 [0, 1]-functions instead of real numbers.
Recall the definition of the estimated eigenfunctions in (2.13). As we have already noticed such functions are only determined up to a sign. Thus, to make comparisons meaningful, we always assume that the inner products v j,λ are non-negative for all λ ∈ [0, 1]. This assumption is solely made for the sake of a clear presentation. It can be dropped if in the testing problem (2.5) and in the subsequently presented test statistic all occurring vector distances v − v are replaced by the min ( v − v , v + v ). This is also how the statistic should be understood in applications.
We estimate the squared difference j,λ are defined as the eigenfunctions of the estimated covariance operators from the samples respectively (see (2.11)). We also introduce the the normalizing factor We propose to reject the null hypothesis of no relevant change in the j-th eigenfunction in (2.5), wheneverD The following result shows that this test has asymptotic level-α and is consistent. The proof can be found in Section A.3 in the appendix.
Theorem 2.6. Suppose that Assumptions 2.1 and 2.3 hold, that ε < min{θ 0 , 1− θ 0 }, that j ∈ {1, . . . , p} and that the quantities σ positive. Then the test defined in (2.25) has asymptotic level α and is consistent for the hypothesis (2.5), that is (2.26) We conclude this section with a brief remark, that extends our results to noncentered data. This is of particular importance in applications such as presented in Section 3.4.
Remark 2.7. We conclude this section by some application oriented remarks.

A careful inspection of the proofs in Appendix A shows that all results
in this section remain true if the sequences of random variables (X In this case the estimators of the covariance kernels in (2.9) and (2.10) have to be modified as followŝ whereμ (1) andμ (2) denote the empirical mean functions of the samples X 1 , ..., X Nθ and X Nθ +1 , ..., X N respectively. 2. In practice, an important question is the appropriate choice of the threshold, which has to reflect in every particular application the specific scientific context. The classical approach avoids this choice by simply putting Δ = 0, but we highly recommend to choose this threshold thoroughly considering the scientific background of the testing problem. Yet sometimes theoretical arguments can be made to support a proper choice. For instance in the domain of PCA one might be interested, for two samples X 1 , ..., X Nθ0 and X Nθ0 +1 , ..., X N , in how much variance occurs in the direction v (1) 1 in the second sample. For this purpose we consider the relative variance measure

955
If v 1 , it follows that R = 1 (maximal amount of variance explained) and if v (1) 1 runs orthogonal to the data from the second sample R = 0 (no variance explained). The numerator is upper bounded by 2λ . One might want to chose a threshold Δ v such that e.g. R > 0.7, to explain at least 70% of variance along v (2) 1 . In this case we can use the lower bound for R, to solve for a suitable Δ Occasionally specifying a threshold of relevance may be inherently difficult and hence the complementary approach of considering confidence intervals may be more suitable, which is also covered by our theory. More precisely, an asymptotic, one-sided, 1 − α confidence interval for D j is given by Similarly, a confidence interval for the distance between the jth eigenvalues of the covariance operators before and after the change point is given by 4. Sometimes it may be of interest to consider a test for relevant changes for two or more thresholds simultaneously. We want to point out that, as long as the number of thresholds is fixed, the convergence in (2.26) guarantees (asymptotic) rejection for all choices of Δ v < D j simultaneously, with probability 1, as well as asymptotic level-α (since asymptotically the only potential rejection under the null occurs for Δ v = D j , the probability of which converges to α). It is also important to point out that for finite samples the self normalized test statistic is monotonically decreasing in Δ v . In particular if the hypothesis defined in (2.5) is accepted for some Δ v , it is also accepted for all larger Δ v . Correspondingly rejection for a Δ v means rejection for all smaller thresholds. In this sense the results of evaluating the test for multiple choices of Δ v will be logically consistent for the user.

Finite sample properties
In this section we investigate the performance of the new tests by means of a small simulation study and illustrate potential applications in a data example. All simulations are based on 4000 simulation runs. We are interested in a test of the hypothesis of no relevant differences in the eigenvalues and eigenfunctions as defined in (2.4) and (2.5), respectively. In the subsequent results the measure ν in the statisticsV j,N andÛ j,N is the uniform measure on the points 1/20, 2/20, ..., 19/20 (see Table 1, K = 20, for the critical values of the distribution of W ). Furthermore we assume that the change point is located at N/2 , that is θ 0 = 1/2.

Relevant changes in the eigenvalues
We investigate the rejection probabilities of the test (2.25) for the hypothesis of no relevant change in the first and second eigenvalue. To generate data we assume that the observed functions are smoothed over the real Fourier basis of order T , which is defined for odd T as Following Aue et al. (2009) we set T = 21, even though higher dimensions are feasible.
We define the covariance kernels in terms of the Fourier basis as follows where τ k := 1/k 2 for k = 1, ..., T and the parameter E varies in the interval [0, 1]. Obviously the squared difference of the j-th eigenvalues of c (1) and c (2) is E j = E/j 4 for j = 1, .., 4 and subsequently 0.
As we have seen in Section 2, the square L 2 -distance between the kernels c (1) and c (2) is of importance for the performance of the change point estimator (2.6). In the present case it is given by The simulated data is generated by randomly sampling sets of Fourier coefficients according to the above kernels. First we generate (N + 1) i.i.d. random vectors n := ( 1 , ..., T ) ∼ N (0, diag(τ 1 , ..., τ T )), n = 0, ..., N + 1. To introduce potential dependence, we define a matrix Ψ ∈ R T ×T with i.i.d. normally distributed entries Ψ l,k ∼ N (0, ψ) and consider the coefficients For n = Nθ 0 + 1, ..., N we downscale the first four components of a n := (a n,1 , . . . , a n,T ) T by a factor 1 − √ E. Finally the process {X n } n=1,...,N is defined by (3.28) An immediate calculation reveals that for n = 1, ..., Nθ 0 the random variable X n has covariance kernel c (1) and for n = Nθ 0 + 1, ..., N the covariance kernel of X n is given by c (2) . The dependence of the data is determined by the choice of ψ. For ψ = 0 we generate i.i.d. data and for ψ > 0 an fMA(1)-process. In the later case we choose ψ, such that E Ψ L1 = 1. In each simulation run we use a new realization of Ψ to generate the complete sample X 1 , . . . , X N . In Figure 1 and 2 we display the rejection probabilities of the test (2.20) for the hypothesis (2.4) of no relevant change in the first and second eigenvalue, with level α = 5%. The threshold Δ τ is given by 0.1 for j = 1 and 0.005 for j = 2 and the tuning parameter in the estimator (2.6) is chosen as ε = 0.05.
According to the theoretical discussion in Theorem 2.5 the test should have rejection probabilities smaller, close and larger to α if E j < Δ τ (interior of the null hypothesis), E j = Δ τ (boundary of the null hypothesis) and E j > Δ τ (alternative), respectively. For the first eigenvalue (Figure 1) we observe a good approximation of the nominal level, at the boundary of the null hypothesis even if the sample size is N = 200 and a reasonable power. For the second eigenvalue ( Figure 2) the test slightly conservative for N = 200 at the boundary of the null hypothesis, but the level is close to α for N = 400 and N = 600. A comparison of the left and right part in Figures 1 and 2 shows that dependence in the data has only a small impact on both type I and type II error, even though a subtle increase is visible. Further simulations with different distributions of the Fourier coefficients show that the results are stable in this respect, although heavier tails lead to a loss of power. These results are not reported for the sake of brevity.

Relevant changes in the eigenfunctions
To investigate the finite sample properties of the test (2.25) for the hypothesis of no relevant change in the j-th eigenfunction in (3.27) we define the covariance kernels Here the eigenvalues of c (1) and c (2) are the same, that is τ k = 1/k 2 for k = 1, ..., T, and the respective eigenfunctions v . . , f T } is defined in (3.27). In Figure 3 we display representative samples of functions before and after the change has occured, for a true distance v 1 2 = 0.3. Despite this relatively large difference it is difficult to distinguish the two samples visually.
A simple calculation yields that the L 2 -distance between the kernels is given by and that the distance between the first and second eigenfunctions is By construction the two kernels c (1) and c (2) in this example are very similar, and therefore the estimation of the change point is a challenging task. Any further difference in the eigensystems would increase the L 2 -distance of the kernels and thus facilitate the problem of change point detection.
Here R 1,2 (ϕ) is the rotation matrix of the first two components by an angle ϕ. Note that for n = 1, ..., Nθ 0 the random variable X n in (3.28) has covariance kernel c (1) and for n = Nθ 0 + 1, ..., N the covariance kernel of X n is given by c (2) . In Figure 3 and 4 we display the rejection probabilities of the test (2.25) for the hypothesis (2.5) of no relevant change in the first and second eigenfunction (threshold Δ v = 0.1), with level α = 5%. The tuning parameter in the change point estimator (2.6) is given by ε = 0.05.
We observe a good approximation of the nominal level at boundary of the null hypothesis (D j = Δ v ) and the test detects alternatives with decent power. Similar to the investigation of the eigenvalues, additional dependence has little impact on the results. An interesting difference occurs in the consideration of the second eigenfunction. Whereas for the second eigenvalue the self-normalized test (2.20) is slightly more conservative for small sample sizes, we observe that for the second eigenfunction the test (2.25) becomes slightly more liberal in this case.

Details on the change point estimator in (2.6)
In this Section we discuss aspects of the change point estimator defined in (2.6) in greater depth. First we consider what impact the choice of the boundary parameter ε has on the performance of the tests. In practice the choice ε = 0.05 indicates only moderate knowledge, but one might want to use even  smaller values, or more rigorously put ε = 0. We therefore consider the model of the preceding section, for the first eigenfunction with an i.i.d. sample, of size N = 400 and investigate the impact of different choices ε = 0, 0.005, 0.01, 0.05 on the performance of the test (2.25) for the hypothesis (2.5) of no relevant difference in the first eigenfunction. The results are displayed in Figure 6a and indicate a high stability with respect to the choice of ε. Whereas the power of the test is hardly influenced by the choice of ε, we observe that the choice ε = 0 produces comparatively high rejection probabilities under the null hypothesis, particularly for small values of D j . This effect can be explained as follows. Under the alternative the two samples X 1 , ..., X θ0N and X θ0N +1 , ..., X N have quite distinct covariance structures and so the change point estimator will perform well, regardless of the choice of ε. However, if D j is close to 0, the samples will be be nearly indistinguishable, such that the change point θ 0 is estimated less accurately. In such cases the test has larger rejection probabilities.  This interpretation is visualized by Figure 6b where we display a histogram of the estimated change points using the estimator (2.6) with ε = 0 and true distance of the eigenfunctions D j = 0. We observe that in this case the estimator frequently localizes the change point in the first or last bin. For the problem of testing for relevant differences in the eigenvalues as considered in Section 3.1, similar effects can be observed, which are not reported to avoid redundancy.
We conclude this Section by pointing out some advantages and weaknesses of the change point estimator for finite samples, as well as potential alternatives.
Remark 3.1. The change point estimatorθ defined in (2.6) and considered throughout this paper detects the change θ 0 , by searching for structural breaks in the covariance operators of the samples. The benefit of this approach is, that the estimate is based not only on information from one, but from all principal components simultaneously. However a potential drawback of this approach might be that subtle changes in later elements of the eigensystems could be difficult to detect, because they are weighted less. Under these circumstances one might consider solutions tailored to the specific testing problem, such aŝ k,j are the jth eigenvalues and eigenfunctions of the estimated kernelŝ respectively. These solutions promise to focus more specifically on changes in the component under investigation.
While estimates of the form (3.30) and (3.31) might have their place in practice, it is important to note that in general these change point estimators are not consistently estimating the location θ 0 of the change point, even if a change in the jth component occurs. More precisely, consider an example of two dimensional data, from two samples of i.i.d. normally distributed random variables The first eigenvector (discrete eigenfunction) of the first sample is (1, 0) and the first one of the second sample (0, 1) . These eigenvectors have maximal distance, and more specifically the two data sets live on perpendicular subspaces of R 2 . A change could not be easier to detect. However it can be shown that for large N k,1 are the estimated, first eigenvectors from the samples X 1 , ..., X k and X k+1 , ..., X n . In particular the estimatorθ (v) defined in (3.31) converges to 1/2 for any true θ 0 ∈ (1/4, 3/4), due to the weighting factor k(N − k)/N 2 . Similar examples of complete failure exist for the eigenvalues. It hence seems advisable to employ component specific change point estimators with some caution.

A data example
In this subsection we apply the new methodology to identify relevant changes in temperature measurements in northern Germany. The data consists of daily temperature averages, published by the national meteorological agency "Der Deutsche Wetterdienst" (https://www.dwd.de/DE/Home/home_node.html) in the state of Bremen (Lat=53.1 • , Long= 008.7 • ) over the years 1890 to 2018. Due to incomplete data in the first years, as well as immediately after WWII, the years 1890-1893 as well as 1945-1946 have been removed from the data. This leaves us with 123 years of daily measurements.
We smooth this data over the and we can calculate the respective eigenfunctions and eigenvalues, before and after the change. Exemplarily we show in Figure 7 the first three eigenfunctions before and after the estimated change point. We observe that the first eigenfunctions are quite similar, but larger differences can be found between the second and third eigenfunctions (see lower panel in Figure 7). For the first five eigenfunctions we have applied the test (2.25) for the hypothesis (2.5) of no relevant change to see whether we can find relevant differences for varying sizes of Δ v . To make our results more interpretable we translate the measure of similarity Δ v into an angle ϕ, i.e., if the squared distance of two eigenfunctions differs by at least Δ v ≈ 0.58 the "geometric angle" between them is at least ϕ = π/4. This is again due to the fact that the angle ϕ between two eigenfunctions v, w determines their distance v − w = 2 − 2 cos(ϕ).
In Table 2 we display the decisions of the test (2.25) for the hypothesis (2.5) of no relevant changes in the eigenfunctions where different thresholds are considered. We observe that the test does not detect relevant changes between the first and the second eigenfunctions. In contrast, the eigenfunctions of larger order display significant differences, which confirms the visualization in Figure  7. The test detects relevant changes in the third, fourth and fifth eigenfunctions for nearly all thresholds (and the same holds true for eigenfunctions of larger order).

Table 2
Relevance of the differences of the first five eigenfunctions for different, relevant angles ϕ. Acceptance of the null hypothesis in (2.5) with Δ = 2 − 2 cos(ϕ) is denoted by "TRUE" (p-value > 10%), rejection by "FALSE". For rejections, superindices indicate the probability of a less extreme event under the null.
To fully appreciate these results we have to take the estimated eigenvalues into account. The first five estimates before and after the change point are given byτ The rapid decay of the eigenvalues indicates, that most of the data's variance -in fact about 90% for each sample -is explained by the first principal components. Due to the similarity of the first eigenfunctions a low-dimensional representation of all data may be given, by projecting on a common, low-dimensional function space, a process much facilitating subsequent analysis. In contrast, due to great dissimilarities of the higher order eigenfunctions, finding a common space that captures, say 95% of all variance will require much higher dimensions.
Beyond such fPCA-related considerations, the eigenvalues encode further, valuable information about the data. They indicate how strongly each eigenfunction contributes to the measured functions. Displaying the eigenvalues 2-12 (those still of larger order than 0.01) in the below graphic reveals a striking trend: The estimated eigenvalues of the time period from 1894-1988 are decaying faster than those of 1989-2018.
It should be noted that this trend persists if we base our estimates on a change point earlier in time, even if we split the data in equally sized halfs. Of course contamination of the second data set then leads to smaller differences, but the underlying trend of slower decay of the eigenvalues from the earlier time period remains visible. This indicates that the observed effects are not simply due to a suggestive change point selection.  1884-1988 (solid) and 1989-2018 (dotted).
To establish the relevance of these differences, we consider each eigenvalue with an individual threshold of relevance, suited to its magnitude (proportional to the size ofτ (1) j ) and apply the test (2.20). Table 3 Differences of the eigenvalues from the samples 1894-1988 and 1989-2018  The visual inspection of the eigenvalues is consistent with the testing results. The eigenvalues of the covariance operators differ up to order 9, with decreasing relevance.
One practical interpretation of these differences may be as follows: The faster decay of the eigenvalues ofĉ (1) compared to those ofĉ (2) , indicates that the observations from 1894-1988 are less influenced by higher order eigenfunctions, than those from 1989-2018. Given that the eigenfunctions become more irregular with incresing order (compare Figure 7), this might imply rougher data, i.e. more short term variability temperatures recently, than in the first part of the 20th century.
thank the referees for their constructive comments on an earlier version of this manuscript.

Appendix A: Proofs
For clarity of presentation we confine ourselves to the case j = 1, i.e. differences in the first eigenfunction and eigenvalue. The general case follows by exactly the same arguments.
An important feature of the proofs is the replacement of the estimated change pointθ, by the deterministic, true change point θ 0 . If we knew the true change point, we could construct the ideal, estimated covariance kernels byc (1) (1, s, t) andc (2) (1, s, t) where for λ ∈ [0, 1] These kernels, as well as the corresponding eigensystems for i = 1, 2, will be frequently referred to in the following section.

A.1. Proof of Proposition 2.4
Recall the definition of the eigenvaluesτ (i) 1,λ for i = 1, 2 of the estimated kernelŝ c (i) (λ, ·, ·) in (2.9) and (2.10). In order to show the proposition, we establish the weak convergence where B is a standard Brownian motion. The statement then follows by an application of the continuous mapping Theorem, as The proof of (A.2) consists of two steps.
Step 1: First we demonstrate that using the change point estimateθ is asymptotically as good as knowing the true location θ 0 of the change point, or more precisely √ Nλ 2 (τ (1) uniformly with respect to λ ∈ [0, 1]. To establish this equality we show uniformly in λ ∈ [0, 1]. Deducing (A.3) from (A.4) is then a simple calculation.
To obtain an upper bound on the left side of (A.4), we employ Lemma 2.2 from Horváth and Kokoszka (2012), which yields For the difference of the kernels we obtain where the second equality follows by a straightforward rearrangement of the terms. Notice that |1/θ − N/ Nθ 0 | = o P (1/ √ N ), which follows immediately by Proposition 2.2. An application of the triangle inequality shows that the L 2 ([0, 1] × [0, 1])-norm of the second term on right side of (A.6) can be bounded by where we have applied Birkhoff's Theorem. The L 2 ([0, 1]×[0, 1])-norm of the first term on the right of (A.6) is also of order o P (1/ √ N ). Counting the summands we see that centering only yields a further term of order o P (1/ √ N ). We now use Theorem B.1 from Appendix B to get H. Dette and T. Kutta where Γ N is a Gaussian process defined by

Its expectation is bounded by
where B is a standard Brownian motion. The expectation converges to 0 by application of the dominated convergence Theorem together with the almost sure continuity of the paths of the Brownian motion (see Billingsley, Section 37). These considerations yield (A.4) and hence (A.3).
Step 2: We now prove the weak convergence (A.2).
Step 1 and straightforward calculations yield For further analysis of the quantity 1 ) we use Proposition B.2, in Appendix B, which gives Weak convergence of the process {G N (λ)} λ∈ [0,1] now follows by an application of the continuous mapping Theorem and the weak convergence of the vector valued process For a proof of this statement we show asymptotic tightness and convergence of the finite dimensional distributions. We therefore introduce the random variables and where the random functions X (i) n,m are defined in (2.7). Asymptotic tightness can be shown coordinate-wise, such that we verify it exemplarily for the first component. This can be rewritten as the process As tightness of a stochastic process (G(λ)) λ∈ [0,1] in ∞ [0, 1] implies tightness of (λG(λ)) λ∈ [0,1] , it will suffice to show tightness of (R Adapting the proof of Lemma 2.1 in Berkes et al. (2013) shows that under the assumptions stated in Section 2 lim m→∞ lim sup for any x > 0. By this result and the Cauchy-Schwarz inequality, the second term on the right side of (A.12) converges to 0 as m → ∞.
Nθ0 ,m in m groups of independent identically distributed random variables, which gives P sup In the last step we have used that all sums are identically distributed and we have assumed without loss of generality that Nθ 0 q and Nθ 0 r are divisible by m (otherwise the remaining term, is asymptotically negligible). Taking the limit with respect to N we observe that the right side converges to 0, which follows due to asymptotic tightness of partial sums of independent random variables, as presented in van der Vaart and Wellner (1996). The remaining part of this proof consists in verifying the marginal convergence of (A.8). More precisely, we prove by an application of the Cramer-Wold device for a finite number of parameters 0 ≤ λ 1 ≤ ... ≤ λ K ≤ 1 weak convergence of the random vector Again (A.13) and the Cauchy-Schwarz inequality show that we can replace this vector by its m-dependent version [R n,m λ∈ [0,1] . a 1 , ..., a K and b 1 , ..., b K be arbitrary real numbers and consider the sum

Now let
(here we put λ 0 = 0). To establish weak convergence we use the central limit theorem from Berk (1973) for m-dependent triangular arrays of random variables. The only non-trivial assumption of this theorem in our case is the convergence of the variance of (A.14), which will be established next. As we can see the covariance of the two groups for i = 1, 2 converges to 0, since is of order o(1) due to m-dependence. Consequently, we can investigate the variance of each of the two terms in (A.14) separately. Iterating the argument yields that we may confine ourselves to the variance of the terms for j, j = 1, ..., K separately. The subsequent convergence arguments are now the same for all remaining terms and we only exemplarily consider the first one and obtain k,m ).
Finally we have to show that for m → ∞ the sum on the right-hand side converges to which follows from Note that σ (1) E is positive by assumption. To prove (A.17) consider the estimate Each term can be bounded according to Cauchy-Schwarz by 18) The expectation on the right can be further analyzed plugging in the definitions of Y 0 and Y 0,m (see (A.9), (A.10)). for some fixed constant L > 0. According to Assumption 2.1 there exists a sufficiently small number η > 0, such that the sequence E X 0 − X 0,m 4 1/4+η is summable, and we obtain lim m→∞ (2m + 1)E X 0 − X 0,m 4 1/4 = 0.
Analogously to σ (1) E we define the long run variance for the random variables Y E is positive by assumption). We have now verified all conditions of the Theorem in Berk (1973) and this implies convergence of the finite dimensional distributions. Consequently it follows that where B 1 and B 2 are independent, standard Brownian motions. Combining these considerations with the continuous mapping theorem shows that (1) λ∈ [0,1] , which completes the proof.

A.2. Proof of Theorem 2.5
Recall that the probability of rejecting the null hypothesis in (2.5) is given by Suppose that E 1 > 0 i.e. that the first eigenvalues of c (1) and c (2) are different. Then by Proposition 2.2,Ê where the random variable W is defined in (2.19). The same result shows that V 1,N is of order o P (1) which implies that (Δ τ − E 1 )/V 1,N converges to +∞ if Δ τ > E 1 and to −∞ if Δ τ < E 1 , both in probability. This implies consistency and level α as stated in (2.21), in the case E 1 > 0. Now suppose that E 1 = 0. In this case we show thatÊ 1,N andV 1,N are of order o P (1). Then, with probability converging to 1,Ê 1,N /V 1,N is asymptotically negligible compared to Δ τ /V 1,N which converges to infinity. First, suppose that the kernels c (1) and c (2) are not equal. Proposition 2.2 implies that we may replace the change point estimator in numeratorÊ 1,N and denominatorV 1,N by the actual change point, only incurring asymptotically vanishing terms. More precisely, the denominator in (A.21) equalŝ where we have used the equality of the eigenvalues. Now Proposition B.2 in Appendix B shows thatV 1,N = o P (1). Finally, similar, but simpler arguments show thatÊ 1,N = o P (1).
In the case of equality c (1) = c (2) the estimator of the change point assumes some uninformative value inside the interval [ε, 1 − ε] and we obtain Consider now the inequality which follows from Lemma 2.2 in Horváth and Kokoszka (2012). The above expression is of order o P (1) uniformly in λ by Lemma B.1 in the supplementary material of Aue et al. (2018). Applying the same argument to the second term yields thatV 1,N = o P (1). The corresponding arguments for the estimateÊ 1,N = o P (1) are similar and therefore again omitted.

A.3. Proof of Theorem 2.6
Recall the definition of the eigenfunctionsv (i) 1,λ of the estimated kernelsĉ (i) (λ, ·, ·) defined in (2.9) and (2.10). Similarly as for the proof of Theorem 2.5 we prove the weak convergence where the process {H N (λ)} λ∈[0,1] is defined by The result then follows by similar arguments as given in the proof of Theorem 2.5 and the continuous mapping theorem, which implies the weak convergence of the tuple First we replace the estimated by the true change point showing uniformly in λ ∈ [0, 1]. To establish this equality we prove uniformly in λ ∈ [0, 1]. (A.24) then follows from (A.23) by a simple application of the Cauchy-Schwarz inequality. Note that we may confine ourselves to considering λ ∈ (1/ √ N, 1), since for λ ∈ (0, 1/ √ N ) the left side of (A.24) is upper bounded by 2/N . To derive (A.24) for λ > 1/ √ N we use Lemma 2.3 from Horváth and Kokoszka (2012) and obtain 2,λ are the eigenvalues of covariance kernelsc (1) (λ, ·, ·) defined in (A.1). We now consider numerator and denominator separately.
Beginning with the denominator we first notice that by consistency of the estimated eigenvaluesτ by Assumption 2.3 it is bounded away from 0 with probability converging to 1. To see the consistency of the eigenvalues, we use the following upper bound where the second equality follows from Lemma B.1 in the supplementary material of Aue et al. (2018) and holds uniformly in λ ∈ (1/ √ N, 1). Applying the same argument to the second eigenvalue yields also consistency ofτ (1) 2,λ, . In the proof of Proposition 2.2 (step 1), we have already shown that the numerator on the right side of (A.25) is of order o P (1/ √ N ) and hence (A.23) follows.
We now turn to an investigation of the process for which a simple calculation shows For the second equality we have used Proposition 2.1 from Aue et al. (2019). In order to determine the limiting behavior of this expression, we make several technically helpful transformations beginning with a linearization. Similar calculations as in the proof of Proposition 2.3 in Aue et al. (2019) yield the representation (1) n (A.27) where the random variablesX (1) n andX (2) n are defined bȳ and Notice that f (i) ∈ L 2 [0, 1]. Weak convergence of the process {K N (λ)} λ∈ [0,1] defined in (A.27) follows from weak convergence of the two dimensional process (2) n λ∈ [0,1] (A.31) and the continuous mapping theorem. Similar arguments as given in Aue et al. (2019) show that the components of (A.31) (2) n λ∈ [0,1] converge weakly to stochastic processes of the form σ (1) D √ θ 0 λB 1 (λ) and σ (2) D √ 1 − θ 0 λB 2 (λ) for some suitable constants σ (1) D and σ (2) D (see equation (A.33) below), where B 1 and B 2 are independent, standard Brownian motions. In particular both processes are asymptotically tight and consequently the vector in (A.31) is also asymptotically tight. To complete the proof of weak convergence of (A.31), it therefore remains to prove the convergence of the finite dimensional distributions.
For this purpose we replace the random variablesX and consequently converges to 0 in probability according to (A.13) if m → ∞. The case i = 2 can be treated analogously. Therefore it is sufficient to prove the convergence of the finite dimensional distributions of the vector (2) n,m λ∈ [0,1] , which can be shown in the same way as in the proof of Step 2 in Proposition 2.4. Finally, we define (these quantities are positive by assumption) and obtain The continuous mapping theorem gives λ∈ [0,1] . Now the same steps as in the proof of Theorem 2.5 yield the desired result.

B.1. Weak convergence of the covariance kernel
In this section we provide an adaption of Theorem 1.1 in Berkes et al. (2013) to the estimation of covariance kernels. Let (X n ) n∈Z be a sequence of random functions satisfying Assumption 2.1 and consider the sequential process Thus we are interested in a sum of random elements X n ⊗ X n ∈ L 2 ([0, 1] × [0, 1]). These products can be approximated by products of the m-dependent random functions X n,m ⊗ X n,m , where the random variables X n,m are defined in Assumption 2.1 (note that X n and X n,m have the same distribution). Using Assumption 2.1 and the notation δ = δ/2 and κ = κ/2 we obtain for a suitable constant K > 0 m≥1 E X n ⊗ X n − EX n ⊗ X n − X n,m ⊗ X n,m + EX n,m ⊗ X n,m 2+δ 1/κ = m≥1 E X n ⊗ X n − X n,m ⊗ X n,m This consideration demonstrates that we have the same approximation properties as required in Berkes et al. (2013) for the random functions X n ⊗ X n .
By analogous arguments as presented in Lemma 2.2 of Berkes et al. (2013), it can be observed that C is square integrable and thus defines a Hilbert-Schmidt operator (see e.g. Bump (1996) p. 168). It thus follows that there exists a spectral decomposition of the integral operator with kernel C. Let us call its eigenfunctions Ψ 1 , Ψ 2 , ... and its corresponding eigenvalues Λ 1 , Λ 2 , .... With this eigensystem we can define the Gaussian process where B l are independent Brownian motions for all l ≥ 1 and x ∈ [0, 1]. We now state an analogue of Berkes' Theorem 1.1. The proof runs along the same lines as in Berkes et al. (2013) and is therefore omitted.

B.2. Eigenvalue-expansion
In this section we investigate a stochastic linearization of the estimated eigenvalues of the empirical covariance operator. For this purpose let (X n ) n∈Z be a stationary, functional time series, with vanishing mean function, that complies to the Assumptions 2.1 and 2.3. We call the corresponding covariance kernel c, its eigenvalues τ 1 ≥ τ 2 ≥ ... and its eigenfunctions v 1 , v 2 , .... For the data sample X 1 , ..., X N we define the sequential estimator of the covariance kernel where the last equality defines the terms A and B in an obvious way. We now investigate the terms A and B separately. For the term A we observe thatv j is the eigenfunction of the integral operator associated withĉ, which gives Here the second equality follows by the parallelogram law and in the last step we used the estimate sup λ∈ where the last equality defines the random variables R 1 and R 2 in an obvious manner. For the term R 1 we obtain by the Cauchy-Schwarz inequality Again by Lemma B.1 from the supplement of (Aue et al., 2018) we observe that sup λ∈ [0,1] √ λ ĉ(·, ·, λ) − c = O(log 1/κ / √ N ).
(B.7) shows that R 1 = o P (1/ √ N ). We use similar arguments and obtain Combining these considerations proves the first assertion (B.3). For a proof of (B.4) we note that sup λ∈
The first inequality follows from bounding the eigenvalue distance by the operator distance and this again by the L 2 -distance of the kernels. The second one follows by a Lemma B.1 in the supplementary material for (Aue et al., 2018).