SiZer for time series: A new approach to the analysis of trends

Smoothing methods and SiZer are a useful statistical tool for discovering statistically significant structure in data. Based on scale space ideas originally developed in the computer vision literature, SiZer (SIgnificant ZERo crossing of the derivatives) is a graphical device to assess which observed features are `really there' and which are just spurious sampling artifacts. In this paper, we develop SiZer like ideas in time series analysis to address the important issue of significance of trends. This is not a straightforward extension, since one data set does not contain the information needed to distinguish `trend' from `dependence'. A new visualization is proposed, which shows the statistician the range of trade-offs that are available. Simulation and real data results illustrate the effectiveness of the method.


Introduction
Smoothing methods in statistics provide a useful tool for showing structure in data.Many monographs on smoothing are available in the statistical literature which in the last years include [2,9,11,15,22,23].
When smoothing methods are used for exploratory data analysis, a question that often arises is: which features are 'really there' (i.e.represent important underlying structure) and which are just spurious sampling artifacts (i.e.reflect unimportant random variation)?[5] proposed answering this question with a graphical device called SiZer (SIgnificant ZERo crossings of the derivatives), which is based on scale space ideas from computer vision.
Scale space is a family of kernel smooths indexed by the 'scale' h, which is usefully viewed as the level of resolution of the data.The scale h is usually called the 'smoothing parameter', or the 'bandwidth' in the statistical literature.[19], for example, suggest using an overlay of these smooths for data analysis, and call it the 'family plot'.This method is illustrated in Figure 1 (a), using the Chocolate data set, i.e. the monthly production of chocolate in Australia from July of 1957 to October of 1990 (kilotonnes).This data set comes with the software companion to [4].The dots show the Chocolate production after deseasonalising and linearly detrending the time series.The family of smooths suggests at different levels of resolution a dip around the year 1978 preceeded by two minor bumps and an increase in the last years.At the finest levels of resolution, many other features appear.Are all these features 'really there' ?SiZer answers this question by assessing statistical significance of such features.
In particular, SiZer extends the usefulness of the family plot by visually displaying the statistical significance of features over both location t and scale h.SiZer is based on confidence intervals for the derivative of the underlying function.The graphical device is a grey-scale map, reflecting statistical significance of the slope at (t, h) locations in scale space.At each (t, h) location, the curve is significantly increasing (decreasing) if the confidence interval is above (below) 0, so that map location is colored black (white).If the confidence interval contains 0, the curve at the level of resolution h and at the point t does not have a statistically significant slope, so the intermediate grey shade is used.Finally, if there is not enough information in the data set (according to a rule that will be illustrated in the following sections), at this scale space (t, h) location, then no conclusion can be drawn, so the darker shade of grey is used to indicate that the data are too sparse.
The SiZer view of the Chocolate data is shown in Figure 1 (b).This data set was studied in [18], who viewed the errors, which will be defined later in (1), as independent and identically distributed (i.i.d.).
At the coarsest levels of resolution (largest bandwidths), at the top of the SiZer map, the intermediate grey appears everywhere, indicating that there is no statistically significant increase or decrease in the corresponding smooths.As it can be seen Figure 1 (a), these curves are close to the simple linear regression line, so the conclusion is that this line has no significant slope.Moving down in the plot, meaning decreasing the scale (i.e. the bandwidth gets smaller) the smooths are first significantly decreasing (white), then significantly increasing (black) after a short time interval in which the smooths are neither significantly increasing nor decreasing (intermediate grey).This shows that the minimum near year 1978 is statistically significant.As the bandwidth gets smaller, i.e. further down the plot, some additional significant features appear but only for a limited number of bandwidth values.At the finest level of resolution (smallest bandwidth) there is not enough information to assess the significance of any structure in the data set (darker shade of grey).
Overall, the SiZer map reveals that only the decrease followed at around year 1978 by the increase in chocolate production are important features of the data.As pointed out by [5], the statistical inference, which is the basis of SiZer, makes heavy use of the assumption of i.i.d.errors.This assumption is inappropriate in time series contexts, where dependence is omnipresent, and in fact is usually the focus of statistical analysis.For SiZer to fulfill its potential to flag significant trends in time series, its underlying confidence intervals must be adjusted to properly account for the correlation structure of the data.This adjustment is straightforward when the correlation structure is known.But in the more important and common case of unknown correlation, SiZer for time series is not a straightforward extension.This is because of the identifiability problem between 'trend' and 'dependence artifacts'.Distinction between these can never be made on the basis of a single time series, as shown in Figure 2.
Figure 2 (a) shows the Strikes data from [4].One view of this series is that its underlying distribution is nonstationary.In particular the mean function seems to decrease at the beginning, with an increase in the middle, and perhaps another decrease later.The variance also appears to be time varying, being large at both ends, and smaller in the middle.But this apparent 'nonstationarity' can not be proven on the basis of that single time series, as there are stationary stochastic processes whose realizations can look quite similar.An example is the simulated series shown in Figure 2 (b), which is a single realization of a stationary stochastic process.Note that the qualitative features, in terms of mean and variance, are very similar to those of Figure 2 (a).The remaining parts of Figure 2 reveal how this simulation was performed.The particular series of Figure 2 (b) was carefully selected from the much longer series shown in Figure 2 (d).It is the part between the vertical bars, chosen to match qualitatively the features of Figure 2 (a).The long series in Figure 2 (d) was generated as X t = µ t + σ t Z t where the curves µ t and σ t are generated in the same way as drivers of the intensity function of a Cox process (shown in Figure 2 (c)), and where the Z t are independent standard normal realizations.Because µ t and σ t are stationary stochastic processes, so is X t .Now consider this example from the viewpoint of 'trend'.The time series in Figures 2 (a) and (b) can be viewed as containing a large amount of 'trend' in the mean, but that structure can be equally well explained as 'serial dependence artifacts', or in fact some of each.The challenge for applications of SiZer in the trend estimation context is that these cases cannot be distinguished.This paper proposes an approach to this dilemma via a visualization which displays the range of trade-offs to the statistician.
Background on the local linear fit and on the SiZer approach will be found in Section 2. In Section 3 we will discuss the development of SiZer for studying trends in time series.The performance of our method is studied via simulated and real data sets in Sections 4 and 5.

Background
The application of nonparametric kernel techniques to estimating a deterministic trend has been popular in time series analysis.[1,6,14] used the kernel smoother for trend estimation and estimated the correlation function parametrically for adjusting the bandwidth selection criterion.Several methods have been proposed that utilize completely nonparametric approaches including [7,13,16,20].[8,12,13] studied the same problem with long-range dependent errors.
Exponential smoothing is a common means of forecasting a future realization of a time series.[10] pointed out that exponential smoothing can be viewed as a special type of nonparametric regression procedure where fitting a particular point uses only data to the left of that location.They also showed that the most common adaptive choice of the smoothing factor in exponential smoothing is identical with the cross-validation technique from nonparametric regression.
While classical kernel methods seek to find the "optimal" smoothing parameter, SiZer is based on scale-space ideas from computer vision, see [17].Scalespace is a family of kernel smooths indexed by the bandwidth h.SiZer considers a wide range of bandwidths which avoids the classical problem of bandwidth selection.Furthermore, the target of a SiZer analysis is shifted from finding features in the "true underlying curve" to inferences about the "smoothed version of the underlying curve".The idea is that this approach uses all the information that is available in the data at each given scale.
Wavelet-based methods are similar to SiZer in the sense that they also look at the data at different locations and scales simultaneously.Specifically, while SiZer tries to find local minima and maxima, a wavelet analysis is usually used to achieve different goals, e.g.reconstructing a function or detecting change points or structural breaks.It is hard to determine the significance of trends for low frequency behavior using a wavelet analysis.In addition, it needs to control the vanishing moments parameter.Thus, smooth slow deterministic trends are difficult to be detect if one uses wavelets with a high number of zero moments because the smooth trends will be ignored by the high vanishing moments.On the contrary, SiZer does not suffer from this problem and can detect both slow and sharp trends by studying the derivatives of the smooths.Another advantage of SiZer analysis is that the inference is summarized with visualization.One can easily recognize whether a trend is significantly going up (down) or not from the family of smooths and the SiZer map.Its simplicity, interpretability and direct connection to the scale-space ideas from computer vision make SiZer attractive in general contexts.
[21] developed a dependent SiZer which conducts a goodness-of-fit test for time series.The dependent SiZer uses a true autocovariance function of an assumed model instead of estimating it from the observed data.By doing so, a goodness-of-fit test can be conducted and we can see how different the behavior of the data is from that of the assumed model.The method proposed in this paper does not need to specify an assumed model and estimates the covariance structure from the data.
The following subsections describe the local linear fit and the conventional SiZer method.

The local linear fit
One interesting approach to smoothing is local polynomial modeling (see e.g.[9], for a detailed overview of this method).Here are the main ideas in the time series context for trend estimation.
The data are collected at discrete points at t = t i for i = 1, . . ., n, and since trend estimation in time series can be viewed as a regression problem for fixed and equispaced design with correlated errors, we can set t i = i.Given the data {(i, Y i ), i = 1, ..., n}, the regression setting is where m is assumed to be a smooth function and the error is assumed to be a zero mean weakly stationary process, i.e.E(ǫ i ) = 0, V (ǫ i ) = 1, for all i = 1, . . ., n, and for all i, j = 1, . . ., n.
Of interest is the estimation of the regression function m and its derivatives at i 0 , i.e. m(i The flexibility of smoothing techniques comes from only assuming smoothness of the underlying curve.This means, intuitively, that data information about the value of the regression function m at i 0 is given by the observations at i near i 0 and therefore these observations can be used to construct an estimator of m(i 0 ).
By assuming that the (p + 1) th derivative of m(i) at the point i 0 exists, the local polynomial approach approximates the regression function m(i) locally by a polynomial of order p.This polynomial is then fitted locally by solving a weighted least squares regression problem.In particular, in the local linear fit the function m(i) is approximated by Taylor expansion of order 1 for x in the neighborhood of x 0 .The problem to be solved is then min , where K is a kernel function, often taken to be a symmetric probability density, assigning weights to each datum point.
By Taylor expansion β 0 = m(i 0 ) and β 1 = m ′ (i 0 ), so the solution to this problem gives estimates of the regression function and its first derivative at i 0 .More specifically, where Y = (Y 1 , . . ., Y n ) T , the design matrix of the local linear fit at i 0 is Because of the excellent interpretability properties of kernel type smoothers and because of their simplicity, [19] recommend the local linear fit to construct the family plot.Moreover, for a better visualization of the family of smooths, it is suggested to use a 'very wide range' of h values in the log scale and for symmetry an odd number of curves should be chosen (for details see [19]).

The SiZer approach
To understand the development of SiZer for time series let us briefly illustrate the mathematical aspects of the original version of SiZer for regression function estimation in the specific case of fixed and equispaced design but where errors are i.i.d..As previously stated, the SiZer approach is based on confidence intervals for the derivative of the smoothed underlying function.These are of the form where q(h) is an appropriate Gaussian quantile.There are two critical points of interpretation.
The first point is that from the scale space point of view, the target of the SiZer analysis is shifted from the 'true underlying curve' to 'smoothed versions of the underlying curve'.In particular, instead of seeking confidence intervals for m ′ (x), we seek confidence intervals for the scale-space version m ′ h (x) ≡ E m′ h (x).In this way, the center point of each confidence interval is automatically 'correct', i.e. the interval is unbiased.This makes sense because it reflects the part of the regression curve that is available from the data at the level of resolution, h.In other words, for each value of the bandwidth, all the information available in the data set is considered in the inference process.For more details see [5].
The second point is that the confidence intervals are constructed in a simultaneous way.The approach is, for each level of resolution h, to approximate the full simultaneous confidence limit problem by l independent confidence interval problems.The quantity l, which reflects the number of 'independent blocks', is estimated through the quantity Effective Sample Size (ESS), i.e. for the bandwidth h and at i 0 Note that when using the uniform kernel, ESS(i 0 , h) is equal to the number of data points in the kernel window centered at i 0 with the bandwidth h.The number of independent blocks is then approximated by the quotient: This results in the quantile where α is a significance level.
ESS is also considered to decide when the normal approximation is inadequate, i.e. for ESS < 5 no conclusion can be drawn.In particular, the black, white, and intermediate grey are used on the scale space set Again detailed discussion of SiZer is available in [5].

SiZer for time series
For time series trend estimation, i.e. in the context of a regression problem with fixed and equispaced design with correlated errors, SiZer can be developed by modeling the error structure and by adjusting the confidence limits for the derivative of the smoothed underlying function according to the assumed error structure.
As suggested by [19], the family of smooths is constructed by considering a 'very wide range' of bandwidths h in the log scale and, in particular, the number of curves is here taken to be 11.

The variance
For correlated data, the variance of the local polynomial estimator is given by where, for the assumed correlation structure, Σ is the kernel weighted covariance matrix of the errors where the generic element is given by A sensible estimate of the variance ( 5) is based on estimating γ in (6), by the sample autocovariance function of the observed residuals from a 'pilot smooth', using the pilot bandwidth h p .One could take h p = h, but this leads to a confounding of the different notions of 'scale' and 'dependence structure'.In other words, for dependent errors, the estimate of γ depends on the choice of h p , unlike independent errors.A small h p assumes i.i.d. or weakly correlated errors, and a large one corresponds to strongly correlated errors.Thus, if one takes h p = h, those features which appear in a particular row of a SiZer map might not be clearly interpreted as significant trends or a wrong assumption on dependence structure.Hence, we treat h and h p separately, which means that in the dependent case, another dimension needs to be added to the SiZer plot.We approach this via a series of SiZer plots, indexed by the pilot bandwidth h p , which represent the different trade-offs available between trend and dependence.This is the key to our visualization, which is further developed in Section 4.

The quantile
For positively autocorrelated errors, the family of smooths typically varies substantially as the bandwidth is varied.The reason is that the strong sporadic patterns that characterize such errors behave similarly to high frequency regression components, which appear in the smooth for a wide range of bandwidths.On the other hand, for negatively autocorrelated errors, the tendency of data points to alternate above and below the regression function gives a family of smooths which changes less as a function of the bandwidth.This is one way of seeing that the amount of information, about the underlying smooth regression function, that is available in i.i.d.data, is not the same as the amount of information available in correlated data.Positively correlated data contain 'less information' about the regression function than i.i.d.data, while negatively correlated data contain 'more information' about the regression function than i.i.d.data.Using statistical information ideas, a simple measure of 'information in the data', on the scale of sample size, is provided by the ratio , where The ratio of n ⋆ to n gives a version of the ESS, i.e. the Effective Sample Size, which properly reflects the type and the magnitude of the correlation structure to give correct simultaneous inference: For independent data, n ⋆ = n, so ESS ⋆ = ESS from conventional SiZer.But for correlated data, n ⋆ is smaller or larger than n depending on the type of correlation, i.e. observations have less effect if they are positively correlated (since they are 'less informative' than i.i.d.observations) and more effect if they are negatively correlated (since they are 'more informative' than i.i.d.observations).
The computation of l in (2) and q in (3) remains unchanged.
When the data are positively autocorrelated, the number of independent problems is larger than in the i.i.d.case, so the resulting confidence intervals are longer.Longer intervals are more likely to contain the value 0, so fewer features are flagged as statistically significant.On the other hand, for negatively correlated data (a situation that is rare in real data but is worth considering), SiZer for time series can detect those features that could be hidden in the family of smooths by the alternating pattern of data points above and below the regression function.
In the long-range dependent case, the effective sample size becomes much smaller than that of the i.i.d.case.Since the time series is strongly positively correlated, n ⋆ will be much smaller than n, which results in the smaller ESS ⋆ .As a result, there will be more chances for the effective sample size ESS ⋆ of each pixel to be less than 5, which produces more darker grey regions in a SiZer map (no test being made).
Finally the quantity n ⋆ has to be estimated (since σ 2 and γ are unknown).The estimate given by the sample autocovariance function cannot be expected to give good results since it is inconsistent.A simple approach to this problem is to divide the sample into p = √ n groups (so as to have a reasonably large number of groups with a reasonably large number of observations), and to estimate the variance of Ȳ by where Ȳj is the mean in the group j and where Ȳj .

Simulated examples
In this section, we will illustrate some examples of SiZer for time series, chosen from many simulated data sets.In Section 4.1, we will illustrate the development of the methodology and graphical presentation of the results.Section 4.2 addresses a coverage probability of SiZer.

Illustration of SiZer
The simulation study has been carried out by considering different combinations of trend, error structure and noise level.A series of different data sets have been simulated from the model (1) for different choices of m, {ǫ i } and σ, and for different values of n.The trends that were considered are characterized by a different number of 'peaks and valleys', differently located and with different amplitudes.For each chosen trend we have considered i.i.d.errors and positively and negatively correlated errors.In particular, i.i.d.errors have been simulated from N (0, 1) while correlated errors have been simulated from an autoregressive process of order 2 with high correlation at lag one (for the examples of the simulation study shown in this paper, the value of the autocorrelation function at lag one for positively correlated errors is 0.97 while for negatively correlated errors the value of the autocorrelation function at lag one is −0.89).Moreover, in each case, three different noise levels were chosen to represent 'low', 'medium' and 'high' variability (in the examples shown below, these values were respectively 1, 20 and 50).
While we do not report all the results in this paper, several interesting cases among the models considered had the trend: Given the model in (1) and the trend in (7), let us consider a simulated time series, that we will indicate by sm1, where errors are positively correlated and a 'medium' level of variability is chosen, and n = 200.An important issue is how many SiZer plots, should be shown.There will be a different plot for each pilot bandwidth h p , showing different trade-offs between trend and serial correlation.If h p takes all 11 values of the bandwidth h then the complete series of SiZer plots will be 11 in number.The family plot and the complete series of 11 SiZer plots for the simulated time series sm1 is given in Figure 3.
The family plot shows a family of smooths with a strong upward trend.The first SiZer plot (titled hp(1)) has very small h p .Thus, the error structure used to construct this plot is estimated from the residuals of the closest curve of the family of smooths to the data points, which is essentially i.i.d.. From this plot we observe some significant structure at the finest levels of resolution, i.e. for the smallest values of the bandwidth.As the level of resolution decreases, less and less structure is significant until the curves are significantly increasing everywhere.
In the second SiZer plot (titled hp( 2)), which represents a situation of slightly correlated errors, less structure than in the previous plot, as expected, appears as significant.Moreover, at the finest level of resolution the data are too sparse, in terms of ESS ⋆ , to draw any conclusion, i.e. the bottom line is now shaded darker grey, as discussed in Section 2.2.
As we move from the third to the last SiZer plot (i.e. from the plot titled hp(3) to the plot titled hp( 11)), an increasing amount of correlation appears in the error component, so that only the linear upward trend seems to be significant at every level of resolution, where conclusions can be drawn.Also, at the finest levels of resolution there is less perceived useful information in the data, which means more data sparsity, so more bottom lines of the SiZer plots are shaded darker grey.
Statistically significant structure in a curve can be hidden by a strong linear component.When this happens, as in this example, it can be useful to detrend the time series before using SiZer.Figure 4 shows the complete series of SiZer plots, for the same data set after linear detrending (indicated by sm1d).Now, many features appear in the family of smooths which is no longer closely following an increasing line.In particular, in the first SiZer plot, many peaks and valleys are significant for most of the levels of resolution.As the errors include increasing correlation, these features are significant but with a less precise location and for a smaller number of levels of resolution.From the seventh plot to the last, all of the structure in the data is explained by the error component and no significant structure is highlighted in the trend at any level of resolution.
This example clearly shows that the complete series of SiZer plots is too long.The simultaneous view of all 11 SiZer plots is hard to comprehend and the information contained in several such plots is often redundant.This motivates the choice of a subset of SiZer plots.
We develop a method for effective choice of a subset of these, which usually gives good representatives of the major different correlation structures.We found 4 plots usually conveyed the needed information.Our choice among the 11 plots is intended to reflect 'a wide array of trade-offs between trend and dependence'.A simple numerical measure of this trade-off is the IR (Indicator of the Residual component), which takes values from 0 to 1: where e hp,• are the residuals obtained from the pilot bandwidth h p .When the pilot bandwidth h p is large, the 'dependence component of the data' appears strongly in the residuals, i.e. is viewed as 'noise', and IR is close to 1. On the other hand, when the pilot bandwidth h p is small, the 'dependence component of the data' appears strongly in the pilot smooth, i.e. is viewed as 'trend', and the IR is close to 0. Intermediate values of IR reflect intermediate trade-offs.
A good reflection of the range of trade-offs came essentially from using the SiZer plots chosen (from the set of 11) for which IR is closest to 0%, 25%, 50% and 75%.This implies that the first plot to be chosen should always be the first of the 11 plots.However, since the error structure used to construct this plot is estimated from the residuals given by a smooth for which the degree of overfitting may be too high (mostly when data are positively correlated), a second option is considered.When this is the case, the second of the 11 plots is chosen as the closest to 0%, i.e. as 'more representative of the independent case'.And if this plot corresponds to a value of IR which is also the closest to 25%, the 3 remaining plots will automatically be chosen by considering 25%, 50% and 75% of the difference between the values of IR for the 11th and the 2nd plots of the complete series.
When only 4 of the SiZer plots are displayed, it is useful to stay in touch with the trend-dependence trade-off that each represents.This is done by several graphical devices.First, all 11 values of IR that are considered are displayed as a bar graph, and the chosen 4 are highlighted.Second, for each chosen trade-off, we add plots showing the pilot smooth with bandwidth h p , and the residuals from that smooth.The pilot smooths show which component of the data are viewed as trend, in that particular trade-off.The residuals give a visual impression of the component of the data used in the covariance estimate.This is the visualization chosen for SiZer for time series.Figure 5 shows this graphical device for sm1d.
The data are shown in the first plot above on the left (the continuous line shows m(i), the deterministic part of the simulated time series), while the next graphic on the right is the family plot.Further right, is the bar diagram previously discussed.The second and the third series of plots represent, respectively, the smooths and the residuals.
Note that the chosen SiZer plots give a representative sample of the complete series of SiZer plots shown in Figure 4.No relevant information is lost and the graphical representations of the smooths and the residuals associated to each SiZer give useful insights about each trend-dependence trade-off.This demonstrates the power of the SiZer method for investigating trends in time series: one graphical presentation displays all of the relevant information about statistical significance.

Type I error and power of SiZer plot
In this section, a type I error and a power of multiple tests in a SiZer map are calculated by simulating three different types of noise.These probabilities will provide how often the SiZer plots are in agreement with the 'truth' in presence of noises.The noises considered here are white noise (i.i.d.) with σ = 1, MA(1) (weakly correlated) with the coefficient 0.9 and σ = 1, and fractional Gaussian noise (strongly correlated) with the Hurst exponent H = 0.9 and σ = √ 20.The trend is added to these three noises is where n = 400.Thus, the trend has a sine wave and a decreasing trend as depicted in the upper leftmost of Figure 6.The upper middle plot shows the family smooths of the simulated MA(1).The type I error and the power are calculated as follows.First, the SiZer map of true derivative is created by plugging m in (9) into SiZer.Since the estimation of an autocovariance function is not needed, only one SiZer map is produced, and the second row of Figure 6 shows the true derivative map (the plot is repeated four times at each column for easy comparison with the third row) for (9).Second, SiZer maps are created by plugging Y i ((9) with noise MA(1)) into SiZer and four of them are selected by IR statistics in (8), which are shown in the third row of Figure 6.Then, each SiZer map is compared with the true derivative map pixel by pixel over the set (4).Here, we define the Type I error and the power as follows.The type I error counts pixels with (true = positive, SiZer map = negative) or (true = negative, SiZer map = positive) or (true=zero, SiZer map = positive or negative).The final error probability is obtained by dividing these counts by total pixels in (4).The power is defined as 1−(# of (true = positive or negative, SiZer map = zero)/ (total pixels in (4))).This can be considered as the probability of decision undeferred.In Figure 6, the titles of the second and third rows are examples of the type I error and the power.
Table 1 shows the type I error and the power for the three simulated examples.In the table, the hp1 means the first bandwidth selected by IR, and the hp2 the second selected by IR, and so on.Simulations are repeated 100 times and the table reports their mean, median and maximum.
For the white noise and MA(1) cases, the maximum of Type I error is 0.0459 throughout all the bandwidths and the 100 repetitions, which is very low.This implies that SiZer analysis provides high coverage probabilities which are close to 1 and makes few mistakes in its decision for weakly correlated errors.However, for the fractional Gaussian case, the mean and the median of Type I error is above 0.3.Also, considering the difference between the maximum and the minimum, the variation is also large throughout the repetitions.This suggests that SiZer needs a better variance estimator for strongly correlated errors such as fractional Gaussian noise.We leave this for future work.As the pilot bandwidth increases, the type I error decreases, but the power also decreases because the SiZer map cannot find the detailed features because of the coarse resolutions.

Real examples
The real data sets shown here are the Deaths data set, i.e. the monthly number of accidental deaths in US from 1973 to 1978 (thousands) and the Chocolate data set, that is considered in the introduction of this paper.As the Chocolate data set, the Deaths data set comes with the software companion to [4].

Deaths data
Figure 7 shows SiZer for time series for the Deaths data set, after deseasonalizing and linear detrending.Assuming 'i.i.d.' errors (first SiZer plot on the left) only some features of the family of smooths are significant at the finest levels of resolution.The strongest feature is the minimum around i = 36 for intermediate values of the bandwidth.But for smaller bandwidths, there is a significant increase near i = 20.For the largest value of the bandwidth the curve is neither significantly increasing nor decreasing.
For slightly correlated errors (second SiZer plot on the left), the major minimum again turns out to be significant but for a smaller range of bandwidths

Chocolate data
Figure 8 shows SiZer for time series for the Chocolate data set.As in the previous example, this data set has been deseasonalized and linearly detrended.But here the strong upward trend, driven by increasing chocolate production due to increasing population size, seems to hide much more structure than in the previous case, as we can see once we detrend the data, especially when the errors are assumed to be 'i.i.d.'.Nevertheless, SiZer for time series still gives useful insights about the trends under study, using the same 4 choices of SiZer plots from the complete series of 11.For 'i.i.d.' errors (first SiZer plot on the left) many 'peaks and valleys' are significant at the finest levels of resolution, i.e. for the smallest values of the bandwidth.As the level of resolution decreases, fewer and fewer features appear to be significant.
For slightly correlated errors, no conclusion can be drawn at the finest levels of resolution while for intermediate values of the bandwidth the major minimum around i = 250 (which corresponds to year 1978) is the only significant feature.For the highest value of the bandwidth the curve is neither significantly increasing nor decreasing.
As the correlation increases no significant structure is highlighted with the only exception being some intermediate values of the bandwidth where the curves are significantly increasing on the right end of the time domain.
For the strongly correlated error assumption, no significant structure appears at any level of resolution.

Fig 1 .
Fig 1. Exploratory analysis of the Chocolate data set through the SiZer approach: (a) Family Plot; (b) SiZer Plot.

Fig 2 .
Fig 2.An example for the identifiability problem between 'trend' and 'dependence artifacts': the Strikes data set.

Fig 3 .
Fig 3.The family plot and the complete series of 11 SiZer plots, indexed by hp, for the simulated time series sm1.

Fig 4 .
Fig 4. The family plot and the complete series of 11 SiZer plots, indexed by hp, for the simulated and linearly detrended time series sm1d.

Fig 5 .
Fig 5. SiZer for time series for the simulated and linearly detrended time series sm1d.

Fig 6 .
Fig 6.SiZer for the simulated time series examples with type I error and power.

Fig 8 .
Fig 8. SiZer for time series for the Chocolate data set.

Table 1
Type I error and Power