A wavelet-based approach for detecting changes in second order structure within nonstationary time series

: This article proposes a test to detect changes in general autoco-variance structure in nonstationary time series. Our approach is founded on the locally stationary wavelet (LSW) process model for time series which has previously been used for classiﬁcation and segmentation of time series. Using this framework we form a likelihood-based hypothesis test and demonstrate its performance against existing methods on various simulated examples as well as applying it to a problem arising from ocean engineering


Introduction
This article considers the problem of detecting changepoints in the second-order (i.e.autocovariance) structure of time series.The ability to detect such structural changes in temporal dependence retrospectively is of value in a number of applications, for example in renewable energy and biomedical research.It is therefore surprising that more attention has not been devoted to this topic within the literature to date, particularly when compared to the numerous contributions which already exist for the change in mean or change in variance problems (see Chen and Gupta (2000), Eckley et al. (2011) and references therein).Below we review the current state of the art in the area of autocovariance changepoint detection before outlining our approach and the structure of this paper.
Previous work.As observed, comparatively little attention has been given to the problem of autocovariance changepoint detection.A likelihood approach was taken by Davis et al. (2006) and Gombay (2008) to detect changes in autoregressive (AR) models.More specifically, Davis et al. (2006) introduce the Auto-PARM algorithm for segmenting time series with a piecewise AR structure.The test statistic used in Auto-PARM is a penalised likelihood ratio, the penalty being the Minimum Description Length (MDL).The multiple changepoint solution space is then traversed using a genetic algorithm to identify both the number and location of changepoints.In practice Auto-PARM is powerful in detecting changepoints due to the likelihood-based test statistic but has varied results when the assumption of piecewise AR structure is violated.
Other, non-parametric, approaches to this retrospective changepoint problem include the work of Ombao et al. (2001).Unfortunately this approach suffers when the change occurs at non-dyadic time points.An alternative method has been suggested by Ahamada et al. (2004) who consider the problem in a local Fourier setting, whereby changepoints are identified by considering the mean of the log-spectrum across a range of frequencies.As such, some changepoints may not be detectable due to frequency cancellation.Most recently, Cho and Fryzlewicz (2012) proposed a nonparametric test for autocovariance changepoint detection by considering the problem in a locally stationary wavelet paradigm.The results reported in their simulation study appear to be competitive with those of Davis et al. (2006), though as we shall see later (Section 4) their method can lack precision when one considers the locations of the detected changepoint locations in certain circumstances.In addition, their approach assumes that the variance of the summary statistics is contant across proposed changepoint locations, a feature which is very difficult to validate in practice.
Our approach.The changepoint method which we propose is founded on the Locally Stationary Wavelet time series model of Nason et al. (2000), a brief review of which is provided in the next section.In contrast to the approach proposed in Cho and Fryzlewicz (2012), we cast our changepoint problem in a parametric framework.The novelty of our approach comes by constructing a likelihood-based test for changes in the autocovariance structure in the time domain which is equivalent to performing a test on the evolutionary wavelet spectral structure in the wavelet domain.Consequently, unlike the approach adopted by Davis et al. (2006) which assumes the process follows an AR model, the LSW model allows more general time series forms.
The article is structured as follows: Section 2 briefly reviews LSW processes before we proceed to describe our method for detecting changes in autocovariance structure in Section 3. The performance of the method is compared against Auto-PARM and the method from Cho and Fryzlewicz (2012) using simulated data in Section 4, and for detection of storm seasons from oceanographic time series in Section 5.

Background to locally stationary wavelet processes
We begin by providing a brief review of the LSW modelling paradigm.Recall that a wavelet is a (compactly supported) oscillating function with several properties that allow efficient location-scale decompositions; see Percival and Walden (2006) and Nason (2008) for accessible introductions.
Below we give a brief review of the LSW approach and the properties which we will utilise in Section 3.
Following Fryzlewicz and Nason (2006), a triangular stochastic array {X t,n } n−1 t=0 for n = 1, 2, . .., is in a class of Locally Stationary Wavelet (LSW) processes if there exists a mean-square representation with j in {1, 2, . ..} and k ∈ Z as scale and location parameters, respectively.The ψ j = (ψ j,0 , . . ., ψ i,Lj −1 ) are discrete, real-valued, compactly supported, non-decimated wavelet vectors with support lengths L j = O(2 j ), and the ξ j,k are zero-mean, orthonormal, identically distributed random variables.For each j ≥ 1, the time-varying amplitudes, W j (z) : [0, 1] → R are real valued, piecewise constant functions with a finite (but unknown) number of jumps.Let P j denote the total magnitude of jumps in W 2 j (z), then the variability of functions W j (z) is controlled so that This modification of the constraints on the W j (z) extends the LSW model class to include time series with piecewise stationary second order structure.
Recall from classical time series theory that the second order structure of a time series is characterised by the spectrum.The spectrum for a LSW process is defined as S j (k/n) = |W j (k/n)| 2 and changes over time as well as scale.Nason et al. (2000) show that if X t is a stationary process, i.e. does not change over time, then W 2 j (k/n) is constant across each scale j.Thus if the second order structure of a time series is piecewise stationary, this manifests itself as piecewise constant W 2 j (k/n).Figure 1 provides an example of this where (a) shows a realisation from a piecewise MA model with a changepoint at time 128 and (b) shows the spectral estimate which, in the finer scales, clearly demonstrates a step change at 128.
The specific relationship between the autocovariance structure of X t and In the next section we utilise this decomposition to construct a test for a change in second order structure.

Detecting changes in second order structure
The Locally Stationary Wavelet (LSW) process can capture many dependence structures.Of particular interest in changepoint applications is the fact that a piecewise second-order time series will have its structure encoded as piecewise constant sequences in the local wavelet periodogram -a feature noted by Cho and Fryzlewicz (2012).Consequently when using the LSW framework for changepoint methods we do not need to be prescriptive about the structure of the dependence beyond the requirements of the LSW definition.
In this article we restrict our consideration to LSW processes where the innovations {ξ j,k } in (1) are Gaussian although extensions to other distributions are possible.
Below we seek to identify a finite number of changes in autocovariance structure.This is achieved by first expressing our hypothesis for a single autocovariance change in the traditional setting.We then re-formulate the hypothesis in terms of the W 2 j (k/n) parameters from the LSW model.Following this, we derive the form of the likelihood of a Gaussian LSW process under the null and alternative hypotheses which allows us to construct a test statistic in the usual way.We then conclude this section by extending the single changepoint model to multiple changes.

Detecting a single change in autocovariance structure
Our formulation of the hypothesis test for a single autocovariance changepoint follows that of Berkes et al. (2009): Definition 3.1.The hypothesis to test for a single change in second order structure at any lag v ≥ 0 is For notational simplicity we assume that X −v , . . ., X −1 are observed.
Typically an upper bound is placed on v to ensure that identifiability problems do not occur but this should not be so small that the model fails to capture the second order structure adequately (e.g. using v = 0 when there is temporal dependence).Auto-PARM and similar time domain methods require the practitioner to decide how many lags, v should be considered.However, if we assume that the time series X t can be well modelled by a LSW process then (2) demonstrates that the autocovariance structure across all lags is defined in terms of the W 2 j (k/n).Thus there is no requirement to choose a maximum lag to consider.In addition, if the time series X t is stationary (H 0 ) then W 2 j (k/n) = γ j at each scale j.In other words, under the null hypothesis the spectrum is constant over time and hence only one parameter per scale need be estimated.We therefore propose an equivalent form of the hypothesis provided in Definition 3.1, given here: Definition 3.2.The hypothesis to test for a single change in autocovariance structure for a LSW time series can be stated as follows: for some j ∈ {1, 2, . ..}.
Given this alternative definition we must now derive the form of the likelihood under the null and alternative hypotheses (Section 3.1.2).Before we consider this, let us first describe the likelihood of a general LSW process with Gaussian innovations.

Likelihood of a LSW process with Gaussian innovations
Let x = {x 1 , . . ., x n } be an observed time series which is a LSW process with Gaussian innovations {ξ j,k }.Then the log-likelihood of the time series can be expressed as follows: where the variance-covariance matrix, Σ W , has elements; Here Σ W (k, k ′ ) denotes the element in the k-th row and k ′ -th column of the Σ W matrix. Henceforth, for notational simplicity, we shall omit the indexing of Σ by W although its dependence on W will still be implicit.
We now turn to consider the formulation of the likelihood under the null and alternative hypotheses.

Likelihood-based test statistic
Recall our equivalent form of the single changepoint hypothesis (Definition 3.2).Under the null hypothesis the spectrum of a stationary process is constant, i.e.W 2 j (k/n) = γ j for all locations k and scales j.Similarly under the alternative hypothesis the spectrum is piecewise constant.Hence, given a time series x = {x t } n t=1 which can be modelled as a LSW process with Gaussian innovations the likelihood ratio test statistic for a change in autocovariance structure is given by where J = ⌊log 2 n⌋.Here Σ0 and Σ1 are the maximum likelihood estimates of the null and alternative variance-covariance matrices respectively.The entries of these matrices are given by, where γ0,l , γ1,l and γn,l are the maximum likelihood estimates of the null, pre-change and post-change transfer function, W 2 l (m/n) respectively.Using the above test statistic a changepoint is detected when, λ > c, where c is a constant.If a changepoint is detected, the location τ of the change is estimated as arg τ λ.The choice of c is discussed in Section 3.3.
Remark In practice, the variance-covariance matrix estimates occasionally produce singular matrices thus when this occurs we regularize the estimates to obtain invertible matrices.The work reported in this article adopts the regularization approach proposed by Schnabel and Eskow (1999), one of many regularization methods available.
Unfortunately, due to the structure of Σ, the maximum likelihood estimates of W 2 j (k/n) have no closed form.However several authors including (Nason et al., 2000;Fryzlewicz and Nason, 2006) have derived alternative means of estimating {W 2 j (k/n)}.We turn to these to provide plug-in estimates as an alternative for γ.In what follows we estimate W 2 j (k/n) by averaging the empirical estimates from Nason et al. (2000) within each segment and using these as plug-in estimates in equations 4 and 5. Due to the use of plug-in estimates the test statistic is thus a likelihood-based test rather than a formal likelihood ratio test.
The consistency of likelihood methods for detecting changepoints has been shown in Csorgo and Horváth (1997).It is well known that plug-in estimates that are consistent possess the same properties as maximum likelihood estimates.Thus the proof of consistency for identifying changepoints using plug-in estimates follows directly provided the estimates used are consistent.

Multiple changes in second order structure
Thus far we have considered the problem of detecting a single changepoint.More generally we are interested in detecting (potentially) multiple changes.The likelihood ratio test statistic was shown to be consistent when estimating a single changepoint in the presence of multiple changes in Vostrikova (1981).Several algorithms have been proposed in the changepoint literature which extend single changepoint tests to multiple changes (e.g.Scott and Knott (1974); Bai and Perron (2003); Killick et al. (2012) and references therein).Typically the consistency of these search methods are proved for additive error whereas the LSW model has a multiplicative error structure.However, Cho and Fryzlewicz (2012) extend the consistency of the Binary Segmentation algorithm (Scott and Knott, 1974) to multiplicative errors and thus we implement this algorithm here.
In essence, the Binary Segmentation algorithm (Algorithm 1) applies the single changepoint test and upon identifying a change, iteratively implements the test statistic on sub-segments of the time series.In practice, we require an upper bound on the number of changepoints in order that the model remains identifiable.Thus we assume that there exists an upper bound, Q, on the number of
Output the set of changepoints recorded C.
Algorithm 1: The Binary Segmentation Algorithm.

Penalty Choice
It is common in the changepoint literature to introduce a penalty to guard against overfitting (too many changepoints).When consideration extends to (potentially) multiple changepoints the penalty typically takes the form of a function dependent on the number of changepoints, e.g.βf (m) where β is a constant and f (•) is a function of the number of changepoints m.See, for example Lavielle (2005); Harchaoui and Levy-Leduc (2010).
The approach we adopt follows the work of Lavielle (2005); a data driven method which is intuitive whilst also conveying the complexity of the decision through a graphical manner.The intuition behind the approach is the same as that of parameter selection in linear modelling, i.e. when a true changepoint (informative parameter) is included the negative log-likelihood will decrease by a larger amount than when a false changepoint (uninformative parameter) is included.Hence, for a given sequence of data, we seek to identify the point at which the negative log-likelihood ceases to reduce significantly.More formally, let K max be an upper bound on the number of segments; 1 ≤ K ≤ K max ; τ be the ordered locations of changepoints; J(τ , x) be the negative log-likelihood for time series x with changepoints τ ; and τ K the locations of K changepoints that minimise J(τ K , x).Then, the approach can be summarised as follows: 1.For each K ∈ {0, . . ., K max } calculate τ K and J(τ K , x). 2. Determine, l K = J(τ K , x) − J(τ K+1 , x) for K ∈ {0, . . ., K max − 1}. 3. The number of changepoints is the largest K such that l K ≫ l j for j > K.
A simple graphical method can also be implemented for the above which may prove more intuitive in practice.This involves producing a plot of J(τ K , x) against K and visually identifying the point of maximum curvature.Figure 2(a) shows an example, from model B in Section 4, of the type of graphic produced by this method where the true number of changes is 2. For this example the values of l k are 150, 138, 29, 29, 20, . . .from which it is clear that l i ≫ l j for i = 2.This is the procedure which we implement in Section 4. As such this is reminiscent of using a scree plot in Principal Components Analysis to identify the number of principal components.

Simulation study
In this section we compare the performance of the method presented in Section 3 against Auto-PARM (AP) proposed by Davis et al. (2006) and the work of Cho and Fryzlewicz (2012) (CF).We replicate a selection of simulations from these earlier papers to demonstrate how the proposed method, henceforth referred to as WL (wavelet-based likelihood) compares.In addition, new simulation examples are introduced to highlight the flexibility of the WL method.In all simulations we report results obtained for Haar wavelets though we note that similar results were obtained for other wavelet families.For comparison we used the default values specified in Davis et al. (2006) and Cho and Fryzlewicz (2012) for implementation of the AP and CF methods respectively.For the WL method, the number of changepoints were identified using the method described in Section 3.3 and the practitioner implementing the approach had no knowledge of the true number of changepoints.
As Eckley et al. (2011) observe, consideration of both the number and distribution of changepoint estimates is helpful in assessing the benefits of the different methods.Thus for each simulation case we consider (i) the number and (ii) the location of the identified changepoints.Tables 1, 2 and 3 report the number of changepoints detected by simulation scenario whilst Figure 3 displays the density of identified changepoints for scenarios where the methods are materially different.In each of the scenarios detailed below the innovations, ǫ t have standard Normal distribution unless specified otherwise.
(A) Stationary AR(1) process with no changepoints This scenario is designed to assess the performance of the test when there are no changepoints.We fit an AR(1) model for a variety of parameter values, a (as used by Cho and Fryzlewicz (2012)).More specifically we simulate from, Results from Table 1 show that as expected, Auto-PARM outperforms both CF and WL.However, the WL method has similar performance to Auto-PARM for all but the highest positive auto-covariances.
(B) Piecewise stationary AR process with clearly observable changes Datasets are simulated from, i.e. the changes occur at dyadic points in the time series.This example highlights the increased power the WL method has over the traditional likelihood when the changepoints are at dyadic points in the time series.This is because there are more scales where the changepoint can be identified and thus more evidence for a change.Table 2 shows that we have comparable performance to Auto-PARM and substantially outperform CF in this type of scenario.
(C) Piecewise stationary AR process with less clearly observable changes Datasets are simulated from, In this example the changes no longer occur at dyadic time points and the changes are less clear.As with scenario B, our method has a similar performance to that of Auto-PARM whilst CF over estimates the number of changepoints 24% of the time.
(D) Piecewise stationary AR process with a short segment Datasets are simulated from, The results for this example demonstrate that with relatively short segments the three methods perform well.
(E) Piecewise stationary AR process with high autocorrelation Datasets are simulated from, The changepoints in this model are hard to identify because the autocorrelation function does not change very much between segments.For number of changepoints, the CF method outperforms both Auto-PARM and WL due to its variance stabilizing techniques.However, when we consider location of changepoints Figure 3(b) shows that the performance of our method is perhaps closer to CF than first appears.The distribution of changepoints for CF and our method are similar in shape and noticably different from that of Auto-PARM which detects spurious changepoints towards the start of the data.
(F) Piecewise stationary ARMA(1,1) process Datasets are simulated from, Figure 3(c) shows that all the methods identify the final changepoint more often than the first two.This is because the autocovariance function has greater difference between the second and third segments than between the first and second.For the remaining two changepoints, the WL method identifies both with similar probability whereas Auto-PARM prefers the second and CF the first changepoint.This again shows that the WL method is, in some sense, a compromise between the general structure of CF and the likelihood power of Auto-PARM.
(G) Piecewise stationary MA process Datasets are simulated from, In this scenario the WL method outperforms both Auto-PARM and CF methods.Although Auto-PARM detects the largest number of correct changepoints, Figure 3(d) shows that the location of these changepoints is split across two modes; around 125 and 135.The CF method performs poorly in this example, particularly when we consider Figure 3(d) which demonstrates that CF can severely mislocate the changes.In contrast to both these methods, the WL method has its mode around 127 and retains good detection of the correct number of changepoints.
The simulation results can be summarized as follows.As expected, when the Auto-PARM assumptions of piecewise AR model and independent segments are valid, it tends to outperform both CF and the proposed method.However, when the underlying assumption of a piecewise AR model is not valid, there are cases where the proposed method outperforms both Auto-PARM and CF.Indeed in 7 out of 10 scenarios the WL method clearly outperforms that of its nonparametric counterpart CF.Of the 3 remaining cases, results also highlighted that when considering performance of a changepoint method, the location of the changes should be taken into account in addition to the number of changepoints detected.This is particularly apparent for Model G where the number of changepoints identified by CF was reasonable but Figure 3(d) showed many of these changes were far from the true location.The simulation study used the Haar wavelet.Similar results were obtained for other wavelet families as expected, see Gott and Eckley (2013).In each plot the red line is the WL method; the blue line is the CF method and the green line is the AP method.

Table 1
Results for scenario A from our method (WL), the method from Cho and Fryzlewicz (2011) (CF) and Auto-PARM (AP).We report the percentage of repetitions that identified that number of changepoints.

Storm season identification
We now consider storm season identification using significant wave heights.
There is evidence (see for example Hidgkins et al. (2002)) that the onset of storm seasons is shifting.This is of interest as short-term operations, such as inspection and maintenance of marine structures, are typically performed outside seasons when storms are prevalent to minimise risk.As such knowledge of the date of onset of the storm season, e.g. of winter in the North Sea, or of hurricane season in the Gulf of Mexico, is particularly important if this is changing.For example, within the oil and gas industry, the current popularity of liquefying natural gas on an offshore production platform and offloading it to a shuttle gas carrier, requires knowledge of environmental winds, waves and currents that excite vessel motions, limiting floating liquified natural gas (FLNG) operations.If the time periods that allow production and offloading are sufficient, the system has the potential to work safely and efficiently, and the economic benefit of FLNG can be realised, maximising the time periods during which FLNG platforms can be offloaded.Similarly, for offshore wind turbines, environmental conditions frequently cause costly delays in the installation and maintenance process.A systematic approach to identify approximately homogeneous periods in the ocean environment, and transitions between calm and stormy seasons in particular, is therefore highly desirable.Consider, by way of example, significant wave heights for a location in the central North Sea. Figure 4 intervals for the period March 1992 -December 1994.Ocean engineers note that whilst certain features such as the cyclic nature of the series or the fact that wave heights tend to be lower during the summer than the winter are clear, other features are harder to discern.Specifically the points at which the storm season starts and ends are much more challenging to identify.For example methods of modelling the trend alone do not produce sufficiently accurate estimates of the storm season start and end.Use of established change in mean or change in regression methods such as those implemented in Killick and Eckley (2010) confirm this.However ocean engineers have suggested that looking at the second order structure could be more helpful since the variability of wave heights is expected to be small during calm and larger during storm seasons.
To identify whether a change in second order structure has occurred we must first remove the trend from the data.We choose to remove the trend by differencing but other methods of trend removal could be used.The resulting series is depicted in Figure 4(b).The change in second order structure is now more evident and the series appears to have a locally stationary structure.The question we therefore consider is how to detect the changes in autocovariance structure in this series and whether these changes correspond to the start and end of the storm season.
We apply the WL, AP and CF methods to detect changes in auto-covariance of the differenced data shown in Figure 4(b).Applying the WL method from Section 3 we identify changepoints at 12th May 1992, 5th October 1992, 1st April 1993, 19th November 1993, 10th April 1994and 10th August 1994.Each of these dates is indicated by a dashed vertical line in Figure 4(b).The changepoints identified by the CF and AP methods are indicated by vertical lines at the top and bottom of Figure 4(b) respectively.It is clear that the automatic penalty selection contained within the CF and AP methods results in more changepoints being identified than storm seasons.Both methods identify changes at approximately the same times as the WL method with additions between seasons.The exception to this is the CF method which does not identify a change around October 1992 where the WL and AP methods do.The results for the three methods were (blindly) presented to ocean engineers who selected the WL method as the set of changepoints which most closely matched their own assessment of the start and end of the storm season.

Concluding remarks
As discussed in Section 1, oceanographic applications require methods for segmenting time series where the observations cannot be assumed to be independent.The work presented in this paper has addressed this issue by developing a test for a change in autocovariance structure using the LSW framework and assuming no prior knowledge of the structure of the autocovariance.A benefit of this approach is that it does not assume a specific structure for the autocovariance.
The proposed approach was compared with both the Auto-PARM method of Davis et al. (2006) and a wavelet-based method recently proposed by Cho and Fryzlewicz (2012).The simulation study in Section 4 showed that the WL method provides an extra degree of refinement between a restrictive model assumption (Auto-PARM) and a nonparametric approach (Cho and Fryzlewicz, 2012).The refinement comes by using a more flexible model structure, namely the LSW model within a likelihood based framework.Intuitively since it is founded on the likelihood framework, when the LSW model assumptions hold the WL method should outperform a nonparametric approach.
Whilst all the changes in the simulations were detected, situations could arise where a change is too small in magnitude or too close to another change to be identified.This is a challenge for all the methods detailed in this paper.For changes in mean and variance there are clear guidelines on how small a change can be detected by specific methods.For changes in autocovariance there are no guidelines due to the complexities of possible pre and post change structures; as such it would be an interesting avenue for future research.
The proposed method was also applied to oceanographic data encountered by an industrial collaborator.The results demonstrate that the proposed method gives an automatic approach for segmenting series into seasons when storms are prevalent for further analysis or planning of maintenance.An added advantage of the WL method is that a specific number of changepoints can be identified whereas the CF and AP methods would require postprocessing procedures for selecting the changes that correspond to storm seasons.

Fig 1 .
Fig 1. Example of a piecewise stationary MA process (model H) with changepoint at 128 (a) Original Time Series, (b) Average Evolutionary Wavelet Spectrum estimate over 100 realisations.

Fig 2 .
Fig 2.An example plot, from model B, of the negative log-likelihood against number of changepoints for the purpose of identifying the number of changepoints.

Fig 3 .
Fig 3. (a) Realisation from model E. (b)-(d) Density plots of the changepoints identified for (b) model E, (c) model F and (d) model G.In each plot the red line is the WL method; the blue line is the CF method and the green line is the AP method.

Fig 4 .
Fig 4. Central North sea application: (a) Significant wave heights (b) First difference of (a) with identified changepoints.Vertical dotted lines: WL method, short lines at the top of the plot: CF method, short lines at the bottom of the plot: AP method.

Table 2
True number of changepoints is in bold Results for scenarios B-D from our method (WL), the method from Cho and Fryzlewicz (2011) (CF) and Auto-PARM (AP).We report the percentage of repetitions that identified that number of changepoints.True number of changepoints is in bold

Table 3
Results for scenarios E-G from our method (WL), the method from Cho and Fryzlewicz (2011) (CF) and Auto-PARM (AP).We report the percentage of repetitions that identified that number of changepoints.True number of changepoints is in bold