Two-sample inference for sparse functional data

: In this paper, we develop an asymptotic χ 2 test for detecting diﬀerences among mean functions when sparse and irregular observations are drawn from the underlying continuous stochastic processes for each subject. We provide theoretical arguments to justify the eﬀectiveness of the proposed test procedure. Numerical experiments, including simulation studies and applications to a CD4 count data set and an eBay online auction data set, are presented to demonstrate the good performance of the developed test.


Introduction
Functional data refers to data drawn from continuous underlying processes. Examples include time series, image data, and tracing data such as hand-writings. Statistical methods for this type of data, called functional data analysis (FDA), have been under rapid development in recent decades. In FDA, there are two typical types of data: dense functional data, where a large number of regularlyobserved measurements for each subject are attainable; and sparse functional data, where only a few irregularly-spaced measurements are available for each subject. Dense functional data is common when automated instruments are utilized to record the data. Sparse functional data arise frequently in many realworld applications as well, such as longitudinal studies (e.g., the AIDS Clinical Trial Study in Section 4.2) and eBay online auctions in Section 4.3. These two types of functional data often require different sets of techniques to model. The book [33] offers a comprehensive perspective of FDA methods for denselyobserved functional data, and the paper [26] provides a nice review for sparse functional data analysis.
There is substantial literature on modeling and estimation for functional data. In this paper, we focus on the mean function inference problem. Let X g (t) ∈ L 2 (T ), g = 1, 2, denote the g-th random process, with a mean function μ g (t) and a covariance function G g (s, t), s, t ∈ T . Both μ g (t) and G g (s, t) are unknown. X 1 (t) and X 2 (t) are assumed to be independent. The two-sample mean function 1396 Q. Wang testing problem can be stated as testing μ 1 (t) = μ 2 (t) for all t ∈ T against the general alternative that the two mean functions are not equal at some time points within the time range T . The statistical framework of one-way functional ANOVA is analogous.
For dense functional data, where a large number of regularly-observed measurements for each subject are attainable, some well-developed methods for the two-sample mean function testing problem exist, including the pointwise t-test [33], the L 2 -norm-based test and the F-type test [41,9,42,43], the Hotelling's T 2 test with permutation [31], tests involving basis representations [1,16,15,17,21,36], and the pseudo-likelihood ratio test [37]. Extensions to multiple functional samples are discussed in [35,8,6,27]. Besides, tests aiming at detecting differences in functional distributions [32,13] are also applicable to functional mean testing problems under further assumptions. Recently, [12] developed a method for test of the equivalence for functional data. For a related purpose, [4] developed simultaneous confidence bands for mean functions. The point-wise F-test and a simultaneous test based on the permutation distribution of the supremum of the point-wise F-test are proposed to test the parameter functions in function-on-scalar regressions [33,34]. When defining the scalar-valued predictor as a dummy variable that indicates the group label of functional response, the tests for the parameter function being 0 is equivalent to the two-sample mean function inference problem in the paper. It is noteworthy that the F-type tests in function-on-scalar regressions, under these circumstances, are the same as the F-type mean function tests in [41,9,42,43].
When it comes to sparse functional data, where only a few irregularly-observed measurements are available for each subject, the testing problem is more challenging and complicated. Most of the aforementioned methods assume one can fully observe the individual functions, such that the regular sample mean and sample covariance functions are attainable. In practice, these tests require the individual curves from each subject being observed at the same dense regular grid. When observing times are not the same for all the subjects or observed data are seriously contaminated with measurement errors, pre-smoothing for each curve is used [13]. However, pre-smoothing techniques based on individual curves are no longer reliable for sparse functional data, considering the limited amount of data for each subject. In the simulation studies, we implement the PACE method proposed [40] to jointly use data from all curves to conduct curve recovery. But note that the curves recovered this way are not consistent to the true curves; therefore the tests either have inflated type I errors or very low powers. There is some recent work regarding one-sample inference for sparse functional data [25]. These methods cannot be easily generalized to two or more functional samples.
To our best knowledge, the pseudo likelihood ratio test [37] and the nonparametric distribution test [32] are the only two tests that are also applicable to sparsely sampled functional data. In [37], they proposed an extension to sparse data with some details, but the numerical studies focused on the experiments for dense data, and the most sparse case in simulations has 10 data points on each curve. We found that the test has inflated type I error in very sparse cases. [32] proposed to apply the Anderson-Darling test on FPC scores to test the equality of two distributions. When the data are sparsely observed, they extended the FPC scores in [40] to pooled data of two groups and used the so-called marginal FPC scores in the distribution test, but they only briefly discussed that this replacement will still lead to "close" test results. All their numerical experiments are for dense data.
In this paper, we propose a test procedure specially designed for the sparse functional mean testing problem and theoretically derive its asymptotic null distribution and large sample consistency. We propose a Hotelling's T 2 type statistic based on a shrinkage eigen-projection score vector. This is an alternative way of extending the sparse FPC in [40] to multiple groups of sparse functional data, in addition to the marginal FPC in [32]. Unlike the regular eigen-projection score vector, the distribution of the individual shrinkage score depends on its specific design, i.e., the observing time points. We notice that if we account for the randomness of designs, the distribution of the shrinkage score is still i.i.d across all the subjects. Based on the distribution of the estimated shrinkage score, we propose an asymptotic χ 2 test statistic. The development of the asymptotic null distribution is nontrivial since we need to consider the estimation error of the shrinkage score for each curve and control the overall error uniformly across all subjects.
The proposed statistic can be roughly viewed as a Hotelling's T 2 type statistic applied on the estimated shrinkage scores, with a new variance estimator. [31] proposed a functional Hotelling's T 2 statistic for fully observed functional data, which is based on the sample mean and sample covariance of the data. When the data are observed on a discrete but dense grid, one can use smoothing methods to recover the underlying function, and then the Hotelling's T 2 statistic can be empirically computed by truncating at the first p eigen components, and they proposed to use a permutation distribution for the statistic. For sparsely observed functional data, one can consider the permutation test constructed on the recovered curve, which can still control type I error, but our numerical results show that it has low power.
The remainder of this paper is organized as follows. The main results for the two-sample mean testing problem for sparse functional data, including the problem settings, the proposed test procedure, the corresponding asymptotic results, and an extension to common principal component cases, are stated in Sections 2 and 3. The performance of our test procedure is illustrated by three simulation studies and applications to a CD4 count data set and an eBay online auction data set are described in Section 4. The paper is concluded in Section 5. All the proofs are in Appendix B. An extension to the one-way functional ANOVA problem is briefly described in Appendix C.

Problem settings
The model we consider is where Y gij denotes the j-th observation of the i-th subject in group g, for g = 1, 2; i = 1, .., n g ; j = 1, .., N gi . The random samples X gi (t) are realizations of two independent stochastic processes with homogeneous covariance structures in a bounded time domain T . Without loss of generality, let's assume T = [0, 1], and then X gi are random samples from X g ∼ SP (μ g , G). Here gij ∼ N (0, σ 2 ) are i.i.d measurement errors. The number of observations for the i-th subject in group g is denoted as N gi , and we focus on sparse scenarios where N gi is finite. Given N gi , random observing times T gi1 , ..., T giNgi are i.i.d with a bounded density function within the time domain T . It is also assumed that X gi (t), N gi and gij are mutually independent. The hypothesis testing of interest is

Proposed test procedure based on shrinkage estimator
Let μ pool (t) denote the mean function of the mixture process of X 1 (t) and X 2 (t).
We first achieve a centered model by deducting μ pool (t) from both sides of model in Eq.
.., ∞} be orthonormal eigenfunctions of G(s, t), corresponding to the non-increasing eigenvalue sequence {λ k , k = 1, 2, ..., ∞}. We define the projection score of μ g (t) − μ pool (t) onto the k-th eigenfunction φ k (t) as θ c gk , which is the inner product of μ g (t) − μ pool (t) and φ k (t), i.e, , it can be seen that testing defined in Eq. (2.2) is equivalent to testing Given the following projection of X c gi (t), 7) under dense data settings, projection-based tests truncate at the first p dimensions and construct test statistics based on r c gik , g = 1, 2; i = 1, .., n g ; k = 1, .., p.
For dense functional data, to obtain a consistent estimate of the projection score r c gik , one can utilize the estimator t∈T X c gi (t)φ k (t) dt. However, for sparse functional data, we do not have the functions X c gi (t). We propose to consider the best linear unbiased predictor, def =r c gi . When the projection scores r c gi is Gaussian (i.e., the random processes X 1 (t) and X 2 (t) are Gaussian processes) and are jointly Gaussian distributed with random errors gij , the explicit formula ofr c gi under H 0 is with the kth column being φ gik = (φ k (T gi1 ), ..., φ k (T giNgi )) T , and Σ Y c gi is the covariance matrix of Y c gi , whose (j, j ) element equals to G(T gij , T gij )+σ 2 1(j = j ). This is referred as shrinkage score in the rest of the paper.
The shrinkage score in Eq. (2.9) is not the same as the original score r c gi in Eq. (2.6), but it gets closer to r c gi when the number of observations on each curve goes large and the measurement error gets small. Best linear predictors have been first proposed in [40] for principal component analysis for one-sample i.i.d sparse functional data. Our proposed shrinkage score for multiple groups of sparse data is different from the marginal FPC score predictor in [32]. The eigenfunctions and eigenvalues in our proposed shrinkage score correspond to the covariance matrix G(s, t) instead of the mixture covariance matrix G mix (s, are respectively the probability of random samples of the mixture process coming from X 1 (t) and X 2 (t).
To construct a test statistic based on the shrinkage score vectorr c gi , let's first calculate its mean vector and covariance matrix through the following two steps. First, conditional on the observing times T gi = (T gi1 , ..., T giNgi ) T , the conditional mean and variance ofr c gi are

Q. Wang
Next, by taking the randomness of T gi into account, we have the mean and variance ofr c gi are (2.11) Here μ gi = (μ g (T gi1 ), ..., μ g (T giNgi )) T . Note that the second term in cov[r c gi ] is zero under the null hypothesis. All the quantities involved in estimatingr c gi and cov[r c gi ] can be obtained from data Y gij , g = 1, 2; i = 1, .., n g ; j = 1, .., N gi , with details included in the next subsection. Let's denote the estimated quantities aŝ G,λ k ,φ k (t),μ pool (t), andσ. Under the null hypothesis, we have the following empirical estimatorŝ whereĜ gi is the estimated covariance function evaluated at timestamps T gi = (T gi1 , ..., T giNgi ) T . Given all these arguments, we propose the following test statistic for the hypothesis testing problem in Eq. (2.2), wherer c g· = ng i=1r c gi /n g , g = 1, 2. It is noteworthy that, even though the proposed test statistic T p,N is motivated from joint Gaussian situations, we later derive that the asymptotic distribution of this statistic under H 0 is a χ 2 p distribution, regardless of whether the random functions are Gaussian or not. For a specific significance level α, we is the upper α quantile of χ 2 p . The derivation of the asymptotic distribution is given in Section 3. Finite sample performances are demonstrated by three simulation studies and real data analyses in Section 4.
For modeling and testing procedures based on basis projections, one needs to specify the number p in practice. Popular approaches in functional data analysis literature include the cross-validation approach, the percentage of variance explained method, and other penalty criteria such as AIC and BIC. These approaches have been adopted in two-sample tests for dense functional data [1,16,15,17,21,36,31]. In the context of functional linear regression, [14] and [22] have proposed to combine P -values from p ∈ (p min , p max ), where p min and p max grow slowly with n. Since we used the restricted maximum likelihood method [30] to estimate the eigenfunctions, the default method in our package is to choose p by cross-validation, which is described in the next subsection.

Estimation of model components
In this subsection, we discuss the estimation procedures for μ pool (t), G, λ k , φ k (t), and σ. We propose an effective procedure to extend the restricted maximum likelihood covariance function estimation method [30] to multiple groups of sparse functional data scenarios.
Estimations of the pooled mean function μ pool (t) are performed by a local linear smoothing technique, which has been used in various studies [7,40,30]. To be more specific, we define the local linear smoother of the pooled mean function μ pool (t) by minimizing with respect to β 0 and β 1 , where K(·) is a smoothing kernel and h μ pool is the bandwidth. Thenμ pool (t) =β 0 (t).
To estimate the covariance function G(s, t), we first estimate the group-wise mean functions μ g (t), g = 1, 2. For each g, we define the local linear smoother of the mean function μ g (t) by minimizing with respect to β * g0 and β * g1 . Thenμ g (t) =β * g0 (t). Given the mean function estimationμ g (t), G(s, t) is estimated by modifying the restricted MLE method [30] as follows. It is assumed that the eigenvalues of G(s, t) decay to zero efficiently fast such that the difference between G(s, t) = ∞ k=1 λ k φ k (t)φ k (s) and r k=1 λ k φ k (t)φ k (s) is small. Under some weak smoothness conditions on X g (·), the first r eigenfunctions {φ 1 (t), ..., φ r (t)} can be modeled as, A Newton-Raphson algorithm is utilized to achieveD,λ andσ 2 such that they minimize the negative log-likelihood, subject to the constraint that D T D = I r .
Then the corresponding eigenfunctions are estimated byφ k (t) = M l=1d lk ψ l (t). Note that this method directly targets the eigen components. In our numerical studies, we found that the restricted MLE method produces more accurate estimates for the eigen components than the local linear smoothing method developed in [40], which first estimates the covariance and then does eigen decomposition on the discretized covariance.
To be able to obtain consistent estimates of G(s, t), φ k (t) and λ k using restricted MLE, one has to have large enough M and r such that the true covariance is well-approximated by a member of the model space M(M, r). The model space consists of a class of covariance kernels C, which have rank r, and whose eigenfunctions are represented in a known orthonormal basis {ψ k } M k=1 [28]. In current implementations, the values of (M, r) are chosen by leave one curve out cross-validation [30]. Particularly, the likelihood-based cross-validation score CV(M, r) to be maximized is ),σ 2−(g,i) are the estimates of D, λ, σ 2 based on the data excluding the Y gi , i.e., the i-th curve in group g.
The next step is to select the dimension of truncation p. Intuitively, we want to identify the appropriate value such that the difference in the group mean functions is well approximated (i.e., p is sufficiently large) and non-essential difference that should be considered as random errors are excluded (i.e., p is not too large). In the literature, the dimension of truncation p for mean is often selected based on the covariance function. For instance, the percentage of variance explained method is used in [16,15,17,21,32]. Besides, the same number of projections for the mean and covariance are used in the mixed-effect model in [18]. In this paper, since we used restricted MLE to estimate the eigen components, the default method in the our package is to choose p = r, i.e., p is selected by the leave-one-curve cross-validation method in Eq. (2.18). Now, by plugging in all these estimates, we achieve our test statistic T p,N as in Eq. (2.13).

An extension to common principal component cases
Our proposed test statistic T p,N is created based on the homogeneous covariance function assumption. In this section, we describe how to extend the test procedure to a certain type of heterogeneous covariance structure called the common principal component (CPC) scenarios [11,2,3,5]. Under CPC, the eigenvalues are different across different groups, while the eigenfunctions are the same.
with φ k (t) being the common eigenfunctions and λ gk be the group specific eigenvalues.
The specific modifications for CPC scenarios are summarized as follows. In the test statistic T p,N ,λ should be replaced withλ g . The estimation procedure also needs to be modified. In particular,φ k (t) is achieved by applying the restricted MLE method to the pooled data.Ĝ g (s, t) is calculated by using data from group g alone. Then the group specific eigenvaluesλ gk is obtained by the following formula, Further extensions to cases with the general heterogeneous covariance functions (i.e., eigenvalues and eigenfunctions are both different) are not easy. This is a future topic of interest.

Asymptotic results
In this section, we develop the asymptotic theory of T p,N under both H 0 and H a in test (2.2). The shrinkage score vectorr c gi is not observable, needing to be estimated from data and this estimation error needs to be considered when deriving the asymptotic results. The projection-based method for dense functional data also needs to control errors in the estimated scores. What makes our arguments more complicated than dense data cases is that we need to consider each estimation error conditional on the individual design points and ensure that the overall error is uniformly controlled over the random designs.
To achieve the asymptotic results, we need the following assumptions.
1. The number of measurements N gi is bounded. The observing time T gij for g = 1, 2, i = 1, . . . , n g , j = 1, ..., N gi are random design points according to a density function f , with One can obtain consistency results, 4. The leave-one-curve-out estimates of the quantities in Eq. (3.1) are close to the whole sample estimates in the sense that the difference is Assumption 1 is a common assumption for sparsely observed functional data. Assumption 2 requires that the variance of the shrinkage scores are bounded. Assumption 3 requires consistent estimates of model components. We estimatê μ pool (t) using local linear smoothing techniques and the consistency result holds under some regular and mild conditions, which are discussed in [40,23,44]. The other quantities are estimated using the restricted MLE method [30]. The conditions needed for the restricted MLE to provide consistent estimates are given in Appendix A for completeness; details can be found in their original papers [30,28]. We note that they proved the consistency of the estimated eigen components under the Gaussian assumption and discussed that their asymptotic results still hold under some relaxed conditions. Our simulation studies in Section 4.1 also show that restricted MLE has reasonable performances under non-Gaussian cases. One can use other methods to estimate the covariance, such as the local linear smoothing in [40,23,44], which do not require Gaussian assumptions. Therefore, Gaussian distribution is not an essential requirement for the asymptotic results developed for the proposed method. Assumption 4 is satisfied by the restricted MLE method and the local linear smoothing method, as the leave-one-curve-out estimates will be close to the whole sample estimates with difference O p (1/n). This assumption is needed to control the correlation between the estimation error of model components and observations from an individual curve.
is basically some type of projection of μ g (t)−μ pool (t). Note that the only randomness here is the random design points T gi . When μ g (t) − μ pool (t) = 0 and p is sufficiently large, it is unlikely that this projection scores equal to 0 on all the first p directions. Theorem 3.2 ensures that, under any alternative H a when the difference between μ g (t) and μ pool (t) is captured by some of the first p directions of such projection, the power of our test procedure goes to 1. Proof of Theorem 3.2 is in Appendix B.
Theorem 3.1 and Theorem 3.2 ensure that for a given p < ∞, the proposed test procedure exhibits the desired asymptotic behaviors. In particular, it controls the type I error within the pre-specified significance level and the power of the test converges in probability to 1. Due to the smoothness of functional data, the majority of information within data is captured by a finite number of projection scores. In this paper, we focus on scenarios p < ∞, following the convention in many functional data analysis literature.

Simulation studies
In this subsection, through three simulations we evaluate the performances of the proposed shrinkage-score-based Hotelling's T 2 type test, which has an asymptotic χ 2 distribution. Simulation I-(1): In all the simulation studies, the data are always generated from model (2.1) Y gij = X gi (T gij )+ gij , with measurement error gij ∼ N (0, 1), and X gi (T gij ) = μ g (T gij ) + In this simulation study, we compare performances of the proposed asymptotic χ 2 test T p,N ('Shrink Hotelling'), with the distribution test in [32] ('Dist'), and the pseudo likelihood ratio test in the [37] ('pLRT-l' for linear and 'pLRT-c' for cubic). The distribution test in [32] aims to test whether X 1 (t) and X 2 (t) have the same distribution. It projects each individual curve to the eigenspace spanned by the eigenfunctions of the mixture process of X 1 (t) and X 2 (t), where a similar shrinkage is employed if the functional data is sparse. The equality of distributions of the first p projection scores is tested through the Anderson-Darling test with Bonferroni corrections. It is not a two-sample location test. Under Gaussian assumption with equal covariance settings, this test can be used to infer whether X 1 (t) and X 2 (t) have the same mean function. However, due to the non-parametric of the nature, the test is expected to have low powers. Besides, in [32], there are no theoretical results in terms of the size and the power of the proposed test. For the pLRT test, the original paper focuses on the one-sample mean function inference problem, and the generalization to two-sample mean function testing problems is described in their paper. We were able to modify Q. Wang their package to perform the two-sample testing problem. We also compare the performance with several dense data methods, where the individual curves are first recovered at a dense regular grid using methods developed in [40], and then dense data tests are applied on the recovered curves. Three different dense data tests, including the projection-based normalized l 2 test [16] ('normalized l 2 '), the Globalized-F test [42] ('GF'), and the L 2 -norm-based test [41] ('l 2 '). We also consider the permutation version of the dense-recovered methods. The p-values of the permutation based dense-recovered tests ('normalized l 2 -pm', 'GF-pm', 'l 2 -pm') are calculated from 5000 Monte Claro permutations. The Hotelling's T 2 permutation test proposed in [31] calculates a test statistic the same as the projection based normalized l 2 test and computes the p-value from the permutated samples of raw functional samples. This is the same as the permutation version of the projection-based normalized l 2 test, i.e., 'normalized l 2 -pm'.
We report the simulation results in Table 1 to 3, for different sample sizes n 1 = n 2 = 50, 100, 200, and different number of observations on each curve N = 4, 8. The significance level considered is α = 0.05. The cross validation method mostly choose p to be 4 in the proposed method. Therefore, the true number of components p = 4 is used in the proposed test, the distribution test, and the dense-recovered projection method. We can see that our asymptotic χ 2 test, the distribution test and the permutation based dense-recovered tests effectively control the type I error at the pre-determinted significance level. Among these tests, our proposed χ 2 test has the largest power than the other tests under all circumstances. The pLRT tests sometimes yield inflated type I errors, especially when the sample sizes and the number of observations on each curve are small. The original pLRT paper include some results for sparse data but with a relatively large number of observations on each curve, i.e., with N = 10. As expected, the dense-recovered methods based on asymptotic null distributions do not work, since the recovered curves from sparse functional data converge to E[X gi (t)|Y gi ] instead of X gi (t) [40]. The permutation based versions of the dense data tests are capable of controlling the type I error, however, they suffer from lower power issues.

Simulation I-(2):
In this simulation study, we assume that the mean functions are specified by a different set of basis functions other than the eigenfunctions {φ k (t) = √ 2 sin(kπt)} ∞ k=1 . In particular, the basis functions for the mean are {φ k (t) = √ 2 cos(kπt)} ∞ k=1 . Similar to Simulation I, we also truncate at the first 4 dimensions. The mean functions are accordingly μ 1 (t) = 4 k=1 a 1k φ k (t) and μ 2 (t) = 4 k=1 a 2k φ k (t). The definition for a 1k and a 2k are the same as Simulation I.
The testing results for the proposed test are summarized Table 4. When implementing the proposed test procedure, we use the cross-validation approach to select the number of truncation, with p being mostly 4. As shown by Table 4, the proposed test is working reasonably in terms of size and power. The performance under the null is supported by Theorem 3.1. The test has reasonable powers is due to the fact the difference in the means on the first dimension (i.e., φ 1 (t)) is captured by the second and fourth projections onto the eigenfunctions Table 1 Simulation I-(1) (n 1  (i.e., φ 2 (t) and φ 4 (t)), although the projection onto the first eigenfunction (i.e., φ 1 (t)) cannot capture it. Simulation II: The second simulation focuses on exploring the performance of the proposed test under non-Gaussian circumstances, and we also include the results from the distribution test as a comparison. Simulation settings are the same as Simulation I, except that ξ gik , g = 1, 2; i = 1, .., n g ; k = 1, ..., p are now generated from a mixture of two normal distributions, i.e., they are distributed as N ( λ k /2, λ k /2) with probability 1/2 and N (− λ k /2, λ k /2) with probability 1/2. In this way, we get samples from mixture Gaussian processes instead of Gaussian processes. According to the results calculated from 1000 repetitions in  To evaluate the actual performance of the proposed method, we exploit the two-sample restricted MLE estimate and the leave-one-curve-out crossvalidation based selection of p in Section 2.3 to construct the proposed asymptotic χ 2 test statistic. The size and power over 1000 Monte Carlo repetitions are summarized in Table 6. We can see our proposed procedure works well. It can 1408 Q. Wang Table 2 Simulation I-(1) (n 1 Hotelling 0.045 0.257 0.515 0.821 0.050 0.352 0.719 0. control the type I error at the pre-determined significance level. The powers are increasing with larger sample sizes and the magnitude of difference between the two mean curves.
To study the sensitivity of the proposed test to the selection of p, we fix p at three values p = 3, 4, 5, which respectively captures 99.1%, 99.5%, and 99.7% of the difference among the mean functions. The results are summarized in Table 7. It can be seen that the size is well controlled at the pre-specified significance level α = 0.05 for p = 3, 4, 5. This is expected as the asymptotic null distribution is valid for any p, as described in Theorem 3.1. Besides, for each p, the power of the proposed test increases with the increase of the sample size and the magnitude of the difference between the two mean curves. This observation adheres to the arguments in Theorem 3.2.
In Table 7, the power under different values of p is overall comparable, with some moderate variation across different p. These results numerically justify that the proposed test procedure is insensitive to the dimension of truncation when the dimension p is sufficiently large. In general, the power for p = 3 is slightly larger than the other two choices of p. Intuitively, unnecessarily large values of p tend to introduce the non-essential signal that should be considered as random noises into the test. The results in Table 6 is close to those with respect to p = 3 in Table 7. This is because the selected p from the cross-validation approach is mostly 3 in this simulation study.

Application to CD4 count data
Now we apply the proposed asymptotic χ 2 test to a CD4 count data set from the AIDS Clinical Trials Group (ACTG) 193 A study, which aims at comparing Table 3 Simulation I-(1) (n 1 Hoteling 0.045 0.513 0.877 0.992 0.051 0.664 0.940 0.   the effectiveness of two different therapies, 600mg of zidovudine alternating monthly with 400mg didanosine (group A) and 600mg of zidovudine plus 400mg didanosine (group B), for advanced AIDS patients with CD4 counts less than or equal to 50 cells per cubic millimeter. Totally we have 655 advanced AIDS patients, with 325 subjects in group A and 330 in group B. These patients were followed for 40 weeks after they started to receive either of the two treatments mentioned above. The number of observations available for each subject ranges from 1 to 9, and the observing times are randomly distributed within the 40 weeks.
We model the data from the sparse functional data perspective. The 325 CD4 count trajectories in group A are assumed to be i.i.d random samples of an unknown stochastic process. The 330 CD4 count trajectories in group B are i.i.d samples of another unknown random process. Through inferring whether the two population mean functions are significantly different within the 40-weeks period, we can understand whether the effectiveness of these two treatments is different over time. As suggested by previous literature [10,39], we transform the original CD4 count using function log(x + 1). The Spaghetti plots of the 1410 Q. Wang Table 5 Simulation II. The number of points per curve is N = 4, 8. Sample sizes are n 1 = n 2 = 50, 100, 200. The significance level is α = 0.05. Consider two tests: our asymptotic χ 2 test ('Shrink Hotelling') and the distribution test ('Dist'). Type I error (ς = 0) and powers (ς = 0.2, 0.3, 0.4) are from 1000 repetitions. Shrink Hotelling 0.054 0.144 0.256 0.483 0.053 0.220 0.401 0. We can see that under all different values of p, the proposed test is able to detect the discrepancy between the effectiveness of group A and group B. Note that the selected p from cross-validation is 3, i.e., the result of the proposed test procedure based on the cross-validation method of choosing the truncation dimension is pvalue = 0.0779 when assuming homogeneous covariance functions, and 7.73e−61 under the CPC assumption. Given the estimated mean functions provided on the left panel of Figure 2, it can be seen that the therapy used by group B tends to be more effective in helping advanced AIDS patients to recover. We also implemented the permutation-based dense-recovered tests. These tests do not detect a significant difference between the two groups. We have seen that these tests have low power in simulations. We applied the proposed method on different sample sizes, i.e., 25%, 50%, 75% and 100% of subjects in each group. The proposed method can always detect the signal for 75% and 100%, and the power is lower for 25%, and 50%.

Application to eBay online auction data
EBay.com is one of the largest online auction markets. The most common auctions on eBay are single-item auctions, even though multiple-item auctions are also feasible. The problem that we consider belongs to the single-item auction 1412 Q. Wang category. For this kind of auctions, eBay adopts a second-price rule to decide the winner. To be more specific, within the pre-selected bidding period (3 days, 5 days, or 7 days), bidders can submit the maximum amounts that they are willing to pay (WTP). The first bidder to provide the second largest WTP within the bidding period is the winner. The WTP's are hidden from the public, instead, eBay's proxy bidding system automatically increases each bidder's bid by a minimum increment determined by the current bidding price and certain rules set by eBay. And it displays these prices lively on the item page. These real-time price trajectories come out of this system are often referred to as live bid. The auction data in our problem are live bid data. More details about the mechanism of eBay online auctions are provided by their official website http://www.ebay.com/. eBay auction data for all items are appropriately stored by eBay and are completely accessible to all registered users for up to 90 days.
As mentioned in the previous paragraph, the auctions can last for 3 days, 5 days, or 7 days. We consider the problem that whether the price trends for a certain type of item are different for different bidding length choices. We use the live bid for Palm M515 Personal Digital Assistant as an example. The data set can be found at [19]. The two bidding lengths that we compare are 3 days and 7 days. In the 3 days group, there are 90 auctions of Palm M515 that happened between March and May 2003. In the 7 days group, there are 158 auctions of Palm M515 happened within the same period of time. The number of bids for each item is given in Figure 3, which are not small. However, as shown in Figure 4, where 9 randomly selected auctions in the 7 days group are visualized, the bids are extremely irregular. There are big gaps among individual live bid trajectories.
We use sparse functional data approaches to model this ebay online live bid data. We assume that the 90 auctions with duration 3 days are i.i.d random samples from an unknown population distribution and the 158 auctions with duration 7 days are i.i.d random sample from another unknown stochastic process. [29,24,38,20] also analyze ebay online auction data from functional   perspectives. Following previous papers [29,20], we transform live bid data into a log-scale. We also scale the bidding time variables of both the 3 days group and the 7 days group to [0,1]. Now the problem of detecting differences in average price trends becomes testing whether the population mean curves are significantly different within some parts of [0, 1].
As indicated by the right panel of Figure 5, the estimated first three eigenfunctions of the 7 days group (black) and the 3 days group (blue) are very close. The corresponding eigenvalues are (0.344, 0.101, 0.020) T for the 7 days group and (0.416, 0.162, 0.011) T for the 3 days group. It is reasonable to assume that they share the same covariance function. The proposed χ 2 test is implemented to this mean function testing problem. We use the cross-validation method to choose p = 5 and the p-value of our proposed test is less than 0.000001. This means that we have enough confidence to conclude that the mean price evolution curves are different when different bidding time periods are selected. The estimated mean functions by local linear smoothing are included on the left panel of Figure 5. From the graph, we can see that the price of the 7 days group (black) is larger than that of the 3 days group (blue) over the entire time domain. The permutation based dense-recovered tests are also implemented but do not detect a significant difference between the two groups.

Conclusions and Discussions
In this paper, we developed an asymptotic χ 2 test for detecting differences among the mean functions of two independent stochastic processes with homogeneous covariance functions, when only a few irregularly spaced measurements are given for each subject. The proposed test statistic is constructed based on the shrinkage score vector, a counterpart of the one-sample PACE designed especially for multiple-sample mean testing problems. Note that as shown by the theoretical derivations, our proposed test does not require Gaussian random processes, although the test is motivated by Gaussian.
We described in detail the numerical implementation of the proposed test procedure. To benchmark the proposed test, we considered four baselines, including two existing tests that apply to sparse data settings (i.e., the non-parametric distribution test and the pseudo-likelihood ratio test), and two promising combinational approaches (i.e., dense tests equipped with PACE as a pre-processing step, and the permutation dense test with PACE). The numerical experiments demonstrated the superior performance of our proposed test, in comparison with these methods. In particular, the pLRT test yielded inflated type I errors when the sample size and the number of observations per curve are small. The dense data methods with the help of PACE to recover the individual curves did not work, since the recovered curves from sparse functional data do not converge to the actual random curves. The non-parametric distribution test performed reasonably, however, it had low powers due to the non-parametric nature.
We proposed two useful extensions of the proposed test. One extension aims to generalize the test to a certain type of heterogeneous covariance scenario called the common principal component structure, where the eigenvalues can be different across groups. The other extension that handles scenarios where more than two groups are considered in the hypothesis testing problem (see Appendix C).
The test procedure developed in the paper cannot handle cases with general heterogeneous covariance functions, i.e. when the eigenvalues and eigenfunctions are both different across groups. This is an important piece of future work to study. Condition 1 is a working assumption. Note that the derivation of the asymptotic distribution of the proposed test statistic T p,N does not need Gaussian distribution as long as the estimates listed in Assumption 3 are consistent. Conditions 2-5 are common assumptions for sparse functional data. We focus on the sparse scenario that the number of observations on each curve is bounded; while the proposed method also works whenN grows with the sample size n. If one can achieve rate α n consistency for all the estimated quantities in Assumption 3 and haveN 2 α 2 n = o(1) satisfied, then the current proofs for Theorem 3.1 and Theorem 3.2 can go through. Given that we use restricted MLE to estimate G, λ, φ and σ and use local linear smoothing for μ pool , we can at least allow N = O(n 1/5 ) with M (nN 2 / log n) 1/9 and an appropriately chosen bandwidth in local linear smoothing. Conditions 6-7 put conditions on (M, r) such that the true covariance can be approximated well by a member in the model space M (M, r). The restricted MLE method developed in [30] provides an estimate of σ 2 , which we callσ 2 . They do not directly provide a theorem for the consistency ofσ 2 , but rather they note that all the consistency results for G, φ and λ hold whenσ 2 is used.