Electronic Journal of Statistics

Principal components analysis for sparsely observed correlated functional data using a kernel smoothing approach

Debashis Paul and Jie Peng

Full-text: Open access

Abstract

We consider the problem of functional principal component analysis for correlated functional data. In particular, we focus on a separable covariance structure and consider irregularly and possibly sparsely observed sample trajectories. By observing that under the sparse measurements setting, the empirical covariance of pre-smoothed sample trajectories is a highly biased estimator along the diagonal, we propose to modify the empirical covariance by estimating the diagonal and off-diagonal parts of the covariance kernel separately. We prove that under a separable covariance structure, this method can consistently estimate the eigenfunctions of the covariance kernel. We also quantify the role of the correlation in the L2 risk of the estimator, and show that under a weak correlation regime, the risk achieves the optimal nonparametric rate when the number of measurements per curve is bounded.

Article information

Source
Electron. J. Statist., Volume 5 (2011), 1960-2003.

Dates
First available in Project Euclid: 30 December 2011

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1325264854

Digital Object Identifier
doi:10.1214/11-EJS662

Mathematical Reviews number (MathSciNet)
MR2870154

Zentralblatt MATH identifier
1274.62412

Subjects
Primary: 62G20: Asymptotic properties
Secondary: 62H25: Factor analysis and principal components; correspondence analysis

Keywords
Functional data analysis principal component analysis kernel smoothing consistency

Citation

Paul, Debashis; Peng, Jie. Principal components analysis for sparsely observed correlated functional data using a kernel smoothing approach. Electron. J. Statist. 5 (2011), 1960--2003. doi:10.1214/11-EJS662. https://projecteuclid.org/euclid.ejs/1325264854


Export citation

References

  • [1] Ash, R. B. (1972)., Real Analysis and Probability, Academic Press.
  • [2] Banerjee, S. and Johnson, G. A. (2006). Coregionalized single- and multiresolution spatially varying growth curve modeling with application to weed growth., Biometrics 61, 617-625.
  • [3] Besse, P., Cardot, H. and Ferraty, F. (1997). Simultaneous nonparametric regression of unbalanced longitudinal data., Computational Statistics and Data Analysis 24, 255-270.
  • [4] Cai, T. and Hall, P. (2006). Prediction in functional linear regression., Annals of Statistics 34, 2159-2179.
  • [5] Cardot, H., Ferraty F. and Sarda P. (1999). Functional Linear Model., Statistics and Probability Letters 45, 11-22.
  • [6] Cardot, H. (2000). Nonparametric estimation of smoothed principal components analysis of sampled noisy functions., Journal of Nonparametric Statistics 12, 503-538.
  • [7] Chui, C. (1987)., Multivariate Splines. SIAM.
  • [8] Chiou, J.-M. and Li, P.-L. (2007). Functional clustering and identifying substructures of longitudinal data., Journal of the Royal Statistical Society, Series B, 69, 679-699.
  • [9] Chen, K., Paul, D. and Wang, J.-L. (2009). Properties of principal component analysis for correlated data., Technical Report, University of California, Davis.
  • [10] Ferraty, F. and Vieu, P. (2006)., Nonparametric Functional Data Analysis: Theory and Practice. Springer.
  • [11] Gelfand, A. E., Schmidt, A., Banerjee, S. and Sirmans, C. F. (2004). Nonstationary multivariate process modeling through spatially varying coregionalization (with discussion)., Test 13, 1-50.
  • [12] Hall, P. and Horowitz, J. L. (2007). Methodology and convergence rates for functional linear regression., Annals of Statistics 35, 70-91.
  • [13] Hall, P., Müller, H.-G. and Wang, J.-L. (2006). Properties of principal component methods for functional and longitudinal data analysis., Annals of Statistics 34, 1493-1517.
  • [14] Hlubinka, D. and Prchal, L. (2007). Changes in atmospheric radiation from the statistical point of view., Computational Statistics and Data Analysis 51, 4926-4941.
  • [15] James, G. M., Hastie, T. J. and Sugar, C. A. (2000). Principal component models for sparse functional data., Biometrika, 87, 587-602.
  • [16] James, G. M. and Hastie, T. (2001). Functional linear discriminant analysis for irregularly sampled curves., Journal of the Royal Statistical Society, Series B, 64, 411-432.
  • [17] James, G. M. and Sugar, C. A. (2003). Clustering for sparsely sampled functional data., Journal of the American Statistical Association, 98, 397-408.
  • [18] Kato, T. (1980)., Perturbation Theory of Linear Operators. Springer-Verlag.
  • [19] Kneip, A. and Utikal, K. J. (2001). Inference for density families using functional principal component analysis, Journal of the American Statistical Association, 96, 519-542.
  • [20] Nica, A. and Speicher, R. (2006)., Lectures on the Combinatorics of Free Probability. Cambridge University Press.
  • [21] Paul, D. and Johnstone, I. M. (2007). Augmented sparse principal component analysis for high dimensional data., Technical Report. (http://anson.ucdavis.edu/~debashis/techrep/augmented-spca.pdf)
  • [22] Paul, D. and Peng, J. (2008). Principal components analysis for sparsely observed correlated functional data using a kernel smoothing approach., Technical report. arXiv:0807.1106v1.
  • [23] Paul, D. and Peng, J. (2009). Consistency of restricted maximum likelihood estimators of principal components., Annals of Statistics, 37, 1229-1271.
  • [24] Peng, J. and Paul, D. (2009). A geometric approach to maximum likelihood estimation of the functional principal components from sparse longitudinal data., Journal of Computational and Graphical Statistics, 18, 995-1015.
  • [25] Peng, J. and Müller, H.-G. (2008). Distance-based clustering of sparsely observed stochastic processes, with applications to online auctions. To appear in, Annals of Applied Statistics.
  • [26] Ramsay, J. and Silverman, B. W. (2005)., Functional Data Analysis, 2nd Edition. Springer.
  • [27] Rice, J. A. and Wu, C. O. (2001). Nonparametric mixed effects models for unequally sampled noisy curves., Biometrics, 57, 253-259.
  • [28] Spellman, P.T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast, saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273-3297.
  • [29] Tang, R. and Müller, H.-G. (2008). Time-synchronized clustering of gene expression trajectories., Biostatistics, 10, 32-45
  • [30] Yao, F. and Lee, T. C. M. (2006). Penalized spline models for functional principal component analysis., Journal of the Royal Statistical Society, Series B 68, 3-25.
  • [31] Yao, F., Müller, H.-G. and Wang, J.-L. (2005). Functional data analysis for sparse longitudinal data., Journal of the American Statistical Association 100, 577-590.
  • [32] Yao, F., Müller, H.-G. and Wang, J.-L. (2005). Functional linear regression for longitudinal data., Annals of Statistics 33, 2873-2903.
  • [33] Yuan, M. and Cai, T. T. (2010). A reproducing kernel Hilbert space approach to functional linear regression., Annals of Statistics 38, 3412-3444.
  • [34] Wackernagel, H. (2003)., Multivariate Geostatistics, 3rd Edition. Springer.
  • [35] Zhang, J. T. and Chen, J. (2007). Statistical inferences for functional data., Annals of Statistics, 35, 1052-1079.
  • [36] Zhou, L., Huang, J., Martinez, J. G., Maity, A., Baladandayuthapani, V. and Carroll, R. C. (2010). Reduced rank mixed effects models for spatially correlated hierarchical functional data., Journal of the American Statistical Association 105, 390-400.