Simultaneous confidence bands for derivatives of dependent functional data

Abstract: In this work, consistent estimators and simultaneous confidence bands for the derivatives of mean functions are proposed when curves are repeatedly recorded for each subject. The within-curve correlation of trajectories has been considered while the proposed novel confidence bands still enjoys semiparametric efficiency. The proposed methods lead to a straightforward extension of the two-sample case in which we compare the derivatives of mean functions from two populations. We demonstrate in simulations that the proposed confidence bands are superior to existing approaches which ignore the within-curve dependence. The proposed methods are applied to investigate the derivatives of mortality rates from period lifetables that are repeatedly collected over many years for various countries.


Introduction
Advanced data collection technology has evolved over the last decade so as to permit observations densely sampled over time, space, and other continua.These observed data are often sets of functions represented in the form of curves, images or shapes; they meanwhile demand more powerful and flexible statistical analysis techniques.As a result, functional data analysis (FDA), which addresses emerging issues in the analysis of these complex data, has seen significant development in theory, methods, and computation.We refer to [15] and [11] for monographs on FDA.In the field of FDA, statistical tools are mainly investigated under the situations of independently sampled functions.In recent years, many longitudinal studies are collecting functional measurements at each visit, for example, mortality data [5] in which the age specific lifetables are recorded over years for various countries; longitudinal diffusion tensor imaging (DTI) [20] and repeated white matter tract data [18].
Our objective is to conduct estimation and simultaneous inference of derivatives of repeatedly observed random trajectories.In the fields of engineering and biomedical science, the inference of velocity or acceleration is also highly needed, for example, magnetic gradients and white matter tract.This work is motivated by characterizing the velocity of mean curves for mortality data in [5].The derivative function reflects the overall trend or direct estimation of an underlying population progress and can be used as an important index for the population response.Hence, it is of particular interest in data analysis to construct simultaneous confidence bands (SCB) for the derivative functions instead of point-wise confidence intervals and to develop global test statistics for the general hypothesis testing problem on the derivative functions.In the previous works, the theoretical focus has mainly been on obtaining consistency and asymptotic normality of the nonparametric estimators, thereby providing the necessary ingredients to construct pointwise confidence intervals for the derivative functions [14].Most SCB construction studied in the existing FDA literature only considers independent and no repeatedly observed trajectories cases.Particularly, [3] constructed asymptotic SCB for the derivative of mean curves in function data analysis without considering any within-curve dependence.[1,8] and [4] considered asymptotic SCB for mean functions of the functional regression model when within-curve dependence is not present.
Recently, there have been some attempts to study such dependent functional data in various models.For instance, [9] emphasised a general hierarchical model.[5] proposed a flexible longitudinally observed functional model and provided consistency results and asymptotic convergence rates for the estimated model components.[20] established the uniform convergence rate and confidence band for each estimated individual effect curve in multivariate varying coefficient models.[18] developed a functional mixed effects modeling framework to delineate the dynamic changes of diffusion properties along major fiber tracts and the structure of the variability of these white matter tract properties in various longitudinal studies.More recently, [2] studies polynomial spline confidence bands for mean curves of dependent functional data.However, we note that all currently available statistical nonparametric methods cannot be immediately used for constructing SCB for derivatives of mean functions when curves are repeatedly recorded.
To develop simultaneous inference for dependent functional responses, we encounter many new challenges.First, the greater technical difficulty to formulate SCB for infinite dimensional functional response and establish their theoretical properties.Second, unlike the scenarios considered in the classical FDA literature where the data consist of n independent units, in out settings, however, there is complex within-subject or spatial-temporal correlation structure.In this work we use polynomial splines to approximate the derivative functions.We show that the proposed spline confidence bands are asymptotically correct and satisfying semiparametric efficiency in the sense that they are asymptotically the same as if all random trajectories are fully recorded and without measurement errors as in [3].In this context, we further extend the simultaneous inference to the two-sample case and evaluate the equality of derivative functions from two groups.Our Monte Carlo results show that the proposed bands are superior to existing methods which ignore the dependence within the repeatedly observed curves.
The paper is organized as follows.Section 2 states the model and introduces the spline estimates of the derivative functions for dependent functional data.Section 3.1 describes the asymptotic distribution of the spline derivative estimator in the framework of allowing unknown within-curve dependence of the trajectories.Using this asymptotic result, we construct SCB for derivative functions.Section 3.2 develops the confidence bands to study the difference of derivative functions from two populations.The actual steps to implement the confidence bands are provided in Section 4. A simulation study is presented in Section 5. We present applications to age-specific human mortality datasets in Section 6.
Technical proofs are collected in the Appendix.

Background on dependent functional data
The sample of functions that give rise to the data are viewed as realizations of a smooth and square integrable random process {X ij (s), s ∈ X } for i = 1, . . ., n, j = 1, . . .m i , where X is a compact interval, i is the subject index, and j is the repeated observation index for the i-th subject.Without loss of generality, let X = [0, 1] and m i = m for i = 1, . . ., n.If the repeated sizes are not equal, we do transformation of the data, which is discussed in details in Section 4.
The process X ij (s) is assumed to have covariance function Cov(X ij (s), X ij ′ (t)) = G jj ′ (s, t), where j, j ′ = 1, . . ., m and s, t ∈ [0, 1] and G(s, t) = {G jj ′ (s, t)} m j,j ′ =1 is an m × m matrix of symmetric, positive definite, and continuous function.Therefore, G(s, t) mainly characterizes within-subject correlation structure.For each (i, j), random process X ij (s) can be viewed as an L 2 process on [0, 1], and hence by the Karhunen-Loève expansion, where µ(s) = E{X ij (s)}, ψ jk (•)'s are orthonormal eigenfunctions of the covariance function G jj (s, t) and for each fixed (i, j), the ξ ijk 's are uncorrelated random coefficients with mean 0 and variance 1.The random coefficients ξ ijk 's are also referred to as the (jk)-th functional principal component (FPC) scores of the i-th subject.We assume that ψ jk (•)'s are kept in a descending order of λ jk 's, i.e. λ j1 ≥ λ j2 ≥ • • • ≥ 0. The number of principal components is ∞ in our theories, but is often assumed to be finite for practical considerations.
We consider a typical functional data setting where Y i (s) is measured at the same grid of equidistant location points, i.e., s = l/N , l = 1, 2, . . ., N , for each subject.The above leads to the following model for dependent functional measurements The sequences {λ jk } m,∞ j,k=1 , {φ jk (•)} m,∞ j,k=1 and the random coefficients ξ ijk 's only exist mathematically and they are unknown and unobservable respectively.The problem addressed here is estimation and inference of ν-th order derivative function µ (ν) (s).A similar model has been considered in [2].

Asymptotic confidence bands
First we study the asymptotic properties of the spline estimator given in (2.2).Pretending X (ν) ij , i = 1, 2, . . ., n, j = 1, 2, . . ., m is known, the "ideal" or "infea- Theorem 1 below shows that the spline estimator μ(ν) in (2.2) is asymptotically equivalent to the "ideal" estimator μ(ν) in (3.1) with √ n-rate, which is the same rate of convergence in the parametric setting.Thus, we have the oracle efficiency of the nonparametric estimator μ(ν) .We need more notations for Theorem 1. Define the derivative of covariance function while the spline estimator μ(ν) is asymptotically equivalent to μ(ν) up to √ n order, i.e.
The oracle efficiency in Theorem 1 immediately indicates the following result, which can be used to construct SCB or pointwise confidence intervals for µ )

Confidence bands for the difference between two derivative functions
In this section, we focus our review on the statistical inference for the derivative functions in the two-sample problems drawn from dependent function data sets.The aforementioned confidence bands for one sample derivative function can be extended to the two-sample case.Analogous to the previous notations, we denote two samples indicated by d = 1, 2, which satisfy Define the ratio of two-sample sizes as τ = n 1 /n 2 and assume that lim n1→∞ τ = τ > 0.
For d = 1, 2, let μ(ν) d be the spline estimate of derivative function µ We mimic the two-sample t-test and state the following theorem whose proof is analogous to that of Theorem 1.

Implementation
To apply the proposed confidence bands, there are a number of important function and parameter estimation issues that need to be addressed, such as the unknown function Ω(•, •) and the quantiles Q 1−α and Q diff,1−α , as well as choosing the number of principal components.
In practice, very often the repeated sizes are not equal, and we suggest the following specific data transformation.For 1 In the following, for simple notations we do not distinguish

Estimating Ω(s, s)
We first apply tensor product spline approach addressed in [3] to estimate the covariance function G jj (s, t).The pilot spline estimator of where N G is the number of interior knots used to build the tensor product B-spline basis and the spline coefficients bJ, The estimation procedure of eigenvalues λjk , eigenfunctions ψjk and FPC ξijk follows standard method discussed in [17].We focus on estimation of the ν-th derivative of eigenfunctions ψ(ν) jk .According to [3], one has the eigenequation where ψjk are subject to 1 0 ψ2 jk (s)ds = 1 and 1 0 ψjk (s) ψjk ′ (s)ds = 0 for k ′ < k.If N is sufficiently large, the left hand side of (4.2) can be approximated by Thus, Ω(s, s) can be estimated by 3) The following theorem shows that Ω(•, •) and Ω(•, •) are asymptomatically equivalent.
The proof of Theorem 3 is given in Appendix A.3.According to our experience, the aforementioned approach is less efficient than imposing a structure on the within-curve correlation.For example, exchangeable (EC), first-order autoregressive (AR-1) and toeplitz (TOEP) are some common options of the structure.Similar methods are introduced in the repeated measurement studies [13,2].In the following we assume the covariance structure between ξ 1jk and Define the empirical version of with-subject covariance as Without loss of generality and for notation simplicity, let us assume λ jk = λ k , ψ jk = ψ k , and estimate ρ jj ′ under three popular correlation structures: EC, AR-1 and TOEP.For any j = j ′ , the estimator of ρ jj ′ can be defined as where R T = ( Σj1 , Σj2 , . . ., Σj,m−1 , j = 1, . . ., m − 1) and D T = ( Σj2 , Σj3 , . . ., Σj,m , j = 1, . . ., m − 1).Therefore, we can estimate Ĝ(ν) jj ′ (s, s).Remark 1.Note that under the assumption of the aforementioned with-subject correlation structures and simple notations, Ω(s, s) is equivalent to Ω(s, s) defined in (4.3).Therefore, according to Theorem 3, under Assumptions (A1)-

Estimating the quantiles
We generate independent R m -valued Gaussian vectors, Here b M is a preset large integer, the default of which is 1000.Let One estimates Q 1−α by the empirical quantile Q1−α of these maximal absolute value for each copy of ζb (s).
In two-sample situation, for One has Qdiff,1−α by taking the empirical quantile of these maximum values of ζdiff,b (s).

Selecting correlation/covariance structure
In the practice, an appropriate correlation matrix based on available data is needed.To choose correlation structure, it is natural to consider bootstrap method using the empirical coverage rates of inducing confidence bands as the choosing criterion.Following the suggestions in [2], we select the structure which provides the closest coverage rate to the nominal level.First, one computes the spline estimator μ(ν) for observed data.Second, given a correlation structure, one constructs the corresponding confidence band based on the bootstrap sample data for 500 times and compute the coverage rate of the confidence band under each correlation structure from a given set of candidates.Here, μ(ν) is assumed as the true underlying function when checking the empirical coverage rate.Finally, one chooses the covariance structure whose empirical coverage rate is the closest one to the nominal level.

Selecting spline knots and the number of eigenfunctions
The numbers of knots serve as the smoothing parameters, playing the same role as bandwidths in the local linear method.According to Assumption (A3) in Appendix, the number of interior knots in estimating µ (ν) (s) is taken to be in which [a] denotes the integer part of a.Meanwhile, the spline estimator Gjj (s, t) in (4.1) is calculated with the number of interior knots N G = [2n 1/(2p) log(log(n))].Similar knots selections are suggested in [3].These choices of knots also satisfy Assumption (A3) in Appendix.

Simulation results
To access the practical behavior of the proposed methodology, we conduct two simulation examples to explore the performance of the proposed asymptotic SCB.
We use the proposed method in (3.2) and its "oracle" version Ω(s, s) in (4.3) to construct the confidence bands for µ (1) (•) under different correlation structures: "IND", "EC", "AR-1" and "TOEP".We consider two confidence levels: 1 − α = 0.95, 0.99.The number of trajectories n is taken to be 30, 50, 100, and for each n, the number of observations on the trajectory is N = n.Each simulation is repeated 1000 times.
For each setting, we adopt cubic spline (p = 4) smoothing and calculate the percentage of coverage of the true function on regular grid with increments 0.01 by the bands constructed using four different correlation structures.Tables 1  and 2 summarize the percentage that the true curve µ (1) (•) is covered by the cubic spline confidence bands respectively.
In Table 1, the true underlying within-curve correlation structure is chosen as independent ("IND"), i.e.Ω is identity matrix.Hence, one would expect the bands constructed using the "IND" structure to be the best performer.In  1 shows that the bands constructed using correlated structures are very competitive, too.In several cases, the coverage rates of the bands with correlated structures ("EC", "AR-1" and "TOEP") are very close to or even better than the bands with "IND" structure.As we expected coverage rate of "oracle" bands approach the nominal levels is faster than the estimated one, but both of them converge to the nominal level clearly, which show positive confirmation of Theorems 1 and 3. To summarize, in many correlated structure settings, our method can lead to substantially better coverage rates than the "naive" ("IND")-type band, while the loss in the case of an underlying "IND" structure is quiet small when sample sizes increase.
In Tables 2 and 3, we investigate the coverage percentages when the true correlation structures are "EC", "AR-1" or "TOEP".For each true correlation structure, we construct spline confidence bands based on all four correlation assumptions including "IND".As expected, the coverage percentages for the confidence bands are close to the nominal levels when applying the correct correlation structure.The bands obtained by "IND" have significantly smaller coverage rates than the nominal levels regardless of the sample sizes and correlation structures.Particularly, when the correlation structure becomes stronger, for example, increasing ρ jj ′ in Ω jj ′ , the "IND"-type of band performs much worse than its counterparts.While the advantage of applying our proposed bands over the "naive" band seems to persist across the table, it is more evident in the situation with stronger correlation structure, i.e., increasing ρ jj ′ in Ω jj ′ .From Tables 2  and 3, we can see that in most of the scenarios the "oracle" confidence bands outperform the estimated bands, and the "oracle" bands approach the nominal coverage with sample sizes increasing.In Tables 1 to 3, when nominal level is 0.05, the standard error for each empirical coverage rate ranges from 0.0001 to 0.00025 and the average standard error is around 0.0002.When nominal level is 0.01, the standard errors of empirical coverage rates are less than 0.0001.
We depict a typical comparison between two 95% confidence bands constructed using "IND" and "TOEP" correlation structures in Figure 1.The true correlation structure is "TOEP" (with ρ = 0.3 r + 0.1).This plot is based on n = 30, m = 10, 20.In Figure 1, one sees that the "IND" and "TOEP" produce the same estimator for the true curve µ (1) (•), however, the confidence bands obtained are very different.From Figure 1, one sees that the true curve µ (1) (•) lies completely inside the "TOEP"-type of band, but it can not be covered entirely by the "IND"-type of band.

Simulation 2: Empirical sizes and power
We conduct a numerical study to evaluate the empirical size and power of hypothesis tests for the derivative functions based on the proposed confidence bands.We compare the results of the bands developed under the "IND" as- sumption with those taking into account the within-curve correlations.In order to mimic the two-sample testing problems, we consider the following hypothesis test: We generate two groups of data from the model given in (5.1) with µ 1 (s) = 5s + δs + 4 sin{2π(s − 1/2)} and µ 2 (s) = 5s + 4 sin{2π(s − 1/2)}, and all the remaining settings are the same as in Example 1 under the same parameter values.The constant δ takes 6 values {0, 0.25, 0.5, . . ., 1.25}.We choose the sample sizes n 1 = n 2 = 100, and the number of observation points of each curve N = 100.For each subject, the noisy trajectory is repeatedly observed m 1 = m 2 = 10 times under three true correlation structures:"AR-1"(ρ = 0.2), "EC"(ρ = 0.05) and "TOEP"(ρ r = 0.2 r + 0.05, r = 1, . . ., m − 1).For each simulation, the significance level was set at α = 0.05, and 500 replications were used to estimate the rejection rates.Indeed, large values of δ shift the derivative of mean function for the first group data further away from second one, therefore the rejection rates increase with the values of δ.Table 4 summarizes the empirical type I errors and powers of the hypothesis test in (5.2) under different true correlation structures respectively.For each correlation structure, we compare the performance of the "IND"-type band and the band using the true structure at 5% nominal level.It can be seen from Table 4, when δ = 0, the "IND" does not maintain its sizes regardless of the true correlation structures and sample sizes.This finding also correspond with its low coverage rates in Tables 2 and 3. Therefore, although the power of "IND" is larger than our proposed method, it is not recommended for practical applications.Overall, the tests based on our proposed method maintain their sizes at 5% nominal level, while increasing values of δ and sample sizes improve the power of detecting the alternatives.

G. Cao
6. Real data analysis

Confidence bands for derivative function of human mortality rates
Assessing demographic trends in mortality is of growing interest due to the demographic impacts of aging and extended longevity on the future viability of social security and retirement systems [6].Such trends not only raise basic biodemographic questions about the nature and the limits of human lifespan but also about the demographic future of aging societies.Our dataset contains the lifetables for 41 countries and areas from the calendar year 1992 to year 2005 obtained from the Human Mortality Database [12] (www.mortality.org).Figure 2 depicts mortality rates within age 60 to age 100 in two countries (Australia and Iceland) from year 1992 to year 2005.It can be found that the mortality rate decreases slightly over the years in Australia, while there are no clear trends happening in Iceland during these years.Various approaches have been proposed for the modeling of mortality data including i.i.d.functional approach [6] and two-dimensional functional principal analysis approach [5].There is one interesting issue: the analysis of recurring mortality patterns and their structure.In our analysis, we focus on the estimation and inference of the first order derivative of mortality rate over calendar years for elder population.We also apply the proposed confidence bands to period lifetables to recur mortality trends.Let Y ij (s) denote the mortality rate observed for the ith country, jth calendar year at age s, where i = 1, . . ., 41, j = 1992, . . ., 2005.We focus on the older individuals within the age range 60 ≤ s ≤ 100.
Applying cubic spline smoothing to the dataset, we obtain the estimated overall first order derivative function of mortality rates μ(1) (•) for age from 60 to 100.To construct the confidence band, we first select the correlation structure using the resampling method in Section 4, and "AR-1" gives the closest coverage rate to the nominal level based on 1000 resampling data."IND" and "AR-1" supply the same estimated derivative function µ (1) ; see the middle solid line in Figure 3.To explore the comparison of the bands under different structure assumptions, in Figure 3 we present both the "IND"-type of band (upper and lower dot-dashed lines) and the "EC"-type of band (upper and lower dashed lines) at 95% confidence level.As shown in Figure 3, the estimated derivative function of mortality rate is increasing quickly from 0, which corroborates with the convex trends in Figure 2.

Two-sample test for the difference between male and female mortality rates
We further compare the mortality velocity of female and male subjects.From the Human Mortality Database (www.mortality.org),we obtain the lifetables for female and male subjects respectively.We focus on the same 41 countries and areas analyzed in Section 6.1 from the calendar year 1992 to year 2005.Our hypothesis of interest is: F (s) are the first derivative functions of mortality rates for males and females in these n 1 = n 2 = 41 countries and areas respectively.
Figure 4 depicts the cubic spline confidence bands at confidence level 0.99 (upper and lower dashed lines), with the center dashed-dotted line representing the spline estimator μ(1) M (s) − μ(1) F (s) and a solid line representing zero.Resampling method in Section 4 chooses "AR-1" and "EC" as proper correlation structures for male and female groups respectively.The 99% confidence band increases slightly above the zero line and is relatively flat.After the single tuning point around age 90, it decreases under zero fairly fast.Due to few data collected for aged population, Figure 4 also indicates the variation of the discrepancy increases when people become aged.By comparing the 99% confidence band and zero reference line, the difference between male and female death rates velocity is extremely significant and female elders has lower mortality velocity than male elders at the same age before age around 90 and after this tuning point male has lower mortality velocity.To our best knowledge, there has been no demographic literature discussing the similar findings.Hence, a potentially interesting exten- sion of this work is to consider including some demographic variables and study the relationship of the mortality rate and these demographic regressors.φ jk (l/N ) respectively.Alternatively, μ(ν) (s) = N −1 B (ν) (s) V−1 X T µ, ẽ(ν) (s) = N −1 B (ν) (s) V−1 X T e, ξ(ν) k (s) = N −1 B (ν) (s) V−1 X T φ k , for 1 ≤ k ≤ κ, where µ (ν) = (µ (ν) (1/N ), . . ., µ (ν) (N/N )) T is the signal vector, e = (ε ••1 , . . ., ε••N ) T with ε••l = (nm) −1 n i=1 m j=1 ε ij (l/N ), 1 ≤ l ≤ N , is the noise vector, and vector φ k = (nm) −1 n i=1 m j=1 ξ ijk φ jk with φ jk = (φ jk (1/N ), . . ., φ jk (N/N )) T .In the following, we denote by Q ̟ (µ) the p-th order quasi-interpolant of µ corresponding to the knots ̟, see equation (4.12), page 146 of [10].According to Theorem 7.7.4 in [10], the following lemma holds.

Fig 1 .
Fig 1.The estimated derivative function (middle dashed line), the true derivative function (middle solid line) and its 95% confidence bands (3.2) using "TOEP" structure (upper and lower dotted lines) and using "IND" structure (upper and lower dot-dashed lines).

Fig 2 .
Fig 2. Plot of mortality rates in Australia and Iceland from year 1992 to year 2005.

Table 1
Empirical coverage rates of the bands using various correlation structures (Case 1)

Table 2
Empirical coverage rates of the bands using various correlation structures (Cases 2, 3 and 4) and each trajectory is repeatedly observed m 1 = m 2 = 10 times

Table 3
Empirical coverage rates of the bands using various correlation structures (Cases 2, 3 and 4) and each trajectory is repeatedly observed m 1 = m 2 = 20 times

Table 4
Empirical type I errors and powers for hypothesis testing (5.2) at 5% nominal levels with various correlation structures.For each case, sample sizes are n 1 = n 2 = N = 100 and each trajectory is repeatedly observed m 1 = m 2 = 10 times