The Annals of Applied Statistics

Multilevel functional principal component analysis

Chong-Zhi Di, Ciprian M. Crainiceanu, Brian S. Caffo, and Naresh M. Punjabi

Full-text: Open access


The Sleep Heart Health Study (SHHS) is a comprehensive landmark study of sleep and its impacts on health outcomes. A primary metric of the SHHS is the in-home polysomnogram, which includes two electroencephalographic (EEG) channels for each subject, at two visits. The volume and importance of this data presents enormous challenges for analysis. To address these challenges, we introduce multilevel functional principal component analysis (MFPCA), a novel statistical methodology designed to extract core intra- and inter-subject geometric components of multilevel functional data. Though motivated by the SHHS, the proposed methodology is generally applicable, with potential relevance to many modern scientific studies of hierarchical or longitudinal functional outcomes. Notably, using MFPCA, we identify and quantify associations between EEG activity during sleep and adverse cardiovascular outcomes.

Article information

Ann. Appl. Stat. Volume 3, Number 1 (2009), 458-488.

First available in Project Euclid: 16 April 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Functional principal component analysis (FPCA) multilevel models


Di, Chong-Zhi; Crainiceanu, Ciprian M.; Caffo, Brian S.; Punjabi, Naresh M. Multilevel functional principal component analysis. Ann. Appl. Stat. 3 (2009), no. 1, 458--488. doi:10.1214/08-AOAS206.

Export citation


  • Baladandayuthapani, V., Mallick, B. K., Hong, M. Y., Lupton, J. R., Turner, N. D. and Carroll, R. J. (2008). Bayesian hierarchical spatially correlated functional data analysis with application to colon carcinogenesis. Biometrics 64 64–73.
  • Besse, P. and Ramsay, J. O. (1986). Principal components analysis of sampled functions. Psychometrika 51 285–311.
  • Bigelow, J. L. and Dunson, D. B. (2007). Bayesian adaptive regression splines for hierarchical data. Biometrics 63 724–732.
  • Borbely, A. A. and Achermann, P. (1999). Sleep homeostasis and models of sleep regulation. J. Biological Rhythms 14 557.
  • Brumback, B. A. and Rice, J. A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves. J. Amer. Statist. Assoc. 93 961–976.
  • Carrier, J., Land, S., Buysse, D. J., Kupfer, D. J. and Monk, T. H. (2001). The effects of age and gender on sleep EEG power spectral density in the middle years of life (ages 20–60 years old). Psychophysiology 38 232–242.
  • Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. Chapman and Hall/CRC Press, Boca Raton, FL.
  • Crainiceanu, C. M., Caffo, B. S., Di, C.-Z. and Naresh, P. M. (2009). Nonparametric signal extraction and measurement error in the analysis of electroencephalographic activity during sleep. J. Amer. Statist. Assoc. To appear.
  • Di, C.-Z., Crainiceanu, C. M., Caffo, B. S. and Punjabi, N. M. (2009). Supplment to “Multilevel functional principal component analysis.” DOI: 10.1214/08-AOAS206SUPP.
  • Diggle, P. J., Heagerty, P., Liang, K.-Y. and Zeger, S. L. (2002). The Analysis of Longitudinal Data. Oxford Univ. Press.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability 66. Chapman & Hall, London.
  • Goldstein, H. (1995). Multilevel Statistical Models. A Hodder Arnold Publication, London.
  • Guo, W. (2002). Functional mixed effects models. Biometrics 58 121–128.
  • Hall, P. and Hosseini-Nasab, M. (2006). On properties of functional principal components analysis. J. R. Stat. Soc. Ser. B Statist. Methodol. 68 109–126.
  • Hall, P., Müller, H.-G. and Yao, F. (2008). Modeling sparse generalized longitudinal observations with latent Gaussian processes. J. R. Stat. Soc. Ser. B Statist. Methodol. 70 703–723.
  • Indritz, J. (1963). Methods in Analysis. Macmillan, New York.
  • James, G. M., Hastie, T. J. and Sugar, C. A. (2000). Principal component models for sparse functional data. Biometrika 87 587–602.
  • Karhunen, K. (1947). Über lineare Methoden in der Wahrscheinlichkeitsrechnung. Suomalainen Tiedeakatemia.
  • Laird, N. and Ware, J. (1982). Random-effects models for longitudinal data. Biometrics 38 963–974.
  • Lin, X. and Carroll, R. J. (2000). Nonparametric function estimation for clustered data when the predictor is measured without/with error. J. Amer. Statist. Assoc. 95 520–534.
  • Loève, M. (1945). Fonctions aléatoires de second ordre. C. R. Acad. Sci. 220 469.
  • Massimini, M., Ferrarelli, F., Esser, S. K., Riedner, B. A., Huber, R., Murphy, M., Peterson, M. J. and Tononi, G. (2007). Triggering sleep slow waves by transcranial magnetic stimulation. Proc. Natl. Acad. Sci. 104 84–96.
  • Morris, J. S., Brown, P. J., Herrick, R. C., Baggerly, K. A. and Coombes, K. R. (2008). Bayesian analysis of mass spectrometry proteomic data using wavelet-based functional mixed models. Biometrics 64 479–489.
  • Morris, J. S. and Carroll, R. J. (2006). Wavelet-based functional mixed models. J. Roy. Statist. Soc. Ser. B 68 179–199.
  • Morris, J. S., Vannucci, M., Brown, P. J. and Carroll, R. J. (2003). Wavelet-based nonparametric modeling of hierarchical functions in colon carcinogenesis. J. Amer. Statist. Assoc. 98 573–584.
  • Müller, H.-G. (2005). Functional modelling and classification of longitudinal data. Scand. J. Statist. 32 223–240.
  • Quan, S. F., Howard, B. V., Iber, C., Kiley, J. P., Nieto, F. J., O’Connor, G. T., Rapoport, D. M., Redline, S., Robbins, J., Samet, J. M. and Wahl, P. W. (1997). The sleep heart health study: Design, rationale, and methods. Sleep 20 1077–1085.
  • Ramsay, J. O. and Dalzell, C. J. (1991). Some tools for functional data analysis. J. R. Stat. Soc. Ser. B Statist. Methodol. 53 539–572.
  • Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis. Springer.
  • Raudenbush, S. W. and Bryk, A. S (2002). Hierarchical Linear Models: Applications and Data Analysis Methods, 2nd ed. Advanced Quantitative Techniques in the Social Sciences Series 1. Sage, Thousand Oaks, CA.
  • Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. J. R. Stat. Soc. Ser. B Statist. Methodol. 53 233–243.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Shahar, E., Whitney, C. W., Redline, S., Lee, E. T., Newman, A. B., Javier Nieto, F., O’connor, G. T., Boland, L. L., Schwartz, J. E. and Samet, J. M. (2001). Sleep-disordered breathing and cardiovascular disease cross-sectional results of the Sleep Heart Health Study. American J. Respiratory and Critical Care Medicine 163 19–25.
  • Silverman, B. W. (1996). Smoothed functional principal components analysis by choice of norm. Ann. Statist. 24 1–24.
  • Sing, H. C., Kautz, M. A., Thorne, D. R., Hall, S. W., Redmond, D. P., Johnson, D. E., Warren, K., Bailey, J. and Russo, M. B. (2005). High-frequency EEG as measure of cognitive function capacity: A preliminary report. Aviation, Space and Environmental Medicine 76 C114–C135.
  • Staniswalis, J. G. and Lee, J. J. (1998). Nonparametric regression analysis of longitudinal data. J. Amer. Statist. Assoc. 93 1403–1404.
  • Tassi, P., Bonnefond, A., Engasser, O., Hoeft, A., Eschenlauer, R. and Muzet, A. (2006). EEG spectral power and cognitive performance during sleep inertia: the effect of normal sleep duration and partial sleep deprivation. Physiological Behavior 87 177–184.
  • Van Cauter, E., Leproult, R. and Plat, L. (2000). Age-related changes in slow wave sleep and rem sleep and relationship with growth hormone and cortisol levels in healthy men. J. Amer. Medical Assoc. 284 861–868.
  • Yao, F., Müller, H.-G., Clifford, A. J., Dueker, S. R., Follett, J., Lin, Y., Buchholz, B. A. and Vogel, J. S. (2003). Shrinkage estimation for functional principal component scores with application to the population kinetics of plasma folate. Biometrics 59 676–685.
  • Yao, F., Müller, H.-G. and Wang, J. L. (2005). Functional data analysis for sparse longitudinal data. J. Amer. Statist. Assoc. 100 577–591.
  • Zhang, L., Samet, J., Caffo, B. S. and Punjabi, N. M. (2006). Cigarette smoking and nocturnal sleep architecture. American Journal of Epidemiology 164 529.

Supplemental materials