The Annals of Statistics

Generalized functional linear models

Hans-Georg Müller and Ulrich Stadtmüller

Full-text: Open access

Abstract

We propose a generalized functional linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function. A linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function, and the expected value of the response is related to this linear predictor via a link function. If, in addition, a variance function is specified, this leads to a functional estimating equation which corresponds to maximizing a functional quasi-likelihood. This general approach includes the special cases of the functional linear model, as well as functional Poisson regression and functional binomial regression. The latter leads to procedures for classification and discrimination of stochastic processes and functional data. We also consider the situation where the link and variance functions are unknown and are estimated nonparametrically from the data, using a semiparametric quasi-likelihood procedure.

An essential step in our proposal is dimension reduction by approximating the predictor processes with a truncated Karhunen–Loève expansion. We develop asymptotic inference for the proposed class of generalized regression models. In the proposed asymptotic approach, the truncation parameter increases with sample size, and a martingale central limit theorem is applied to establish the resulting increasing dimension asymptotics. We establish asymptotic normality for a properly scaled distance between estimated and true functions that corresponds to a suitable L2 metric and is defined through a generalized covariance operator. As a consequence, we obtain asymptotic tests and simultaneous confidence bands for the parameter function that determines the model.

The proposed estimation, inference and classification procedures and variants with unknown link and variance functions are investigated in a simulation study. We find that the practical selection of the number of components works well with the AIC criterion, and this finding is supported by theoretical considerations. We include an application to the classification of medflies regarding their remaining longevity status, based on the observed initial egg-laying curve for each of 534 female medflies.

Article information

Source
Ann. Statist. Volume 33, Number 2 (2005), 774-805.

Dates
First available in Project Euclid: 26 May 2005

Permanent link to this document
https://projecteuclid.org/euclid.aos/1117114336

Digital Object Identifier
doi:10.1214/009053604000001156

Mathematical Reviews number (MathSciNet)
MR2163159

Zentralblatt MATH identifier
1068.62048

Subjects
Primary: 62G05: Estimation 62G20: Asymptotic properties
Secondary: 62M09: Non-Markovian processes: estimation 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]

Keywords
Classification of stochastic processes covariance operator eigenfunctions functional regression generalized linear model increasing dimension asymptotics Karhunen–Loève expansion martingale central limit theorem order selection parameter function quasi-likelihood simultaneous confidence bands

Citation

Müller, Hans-Georg; Stadtmüller, Ulrich. Generalized functional linear models. Ann. Statist. 33 (2005), no. 2, 774--805. doi:10.1214/009053604000001156. https://projecteuclid.org/euclid.aos/1117114336.


Export citation

References

  • Alter, O., Brown, P. O. and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97 10101--10106.
  • Ash, R. B. and Gardner, M. F. (1975). Topics in Stochastic Processes. Academic Press, New York.
  • Brown, B. M. (1971). Martingale central limit theorems. Ann. Math. Statist. 42 59--66.
  • Brumback, B. A. and Rice, J. A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with discussion). J. Amer. Statist. Assoc. 93 961--994.
  • Capra, W. B. and Müller, H.-G. (1997). An accelerated time model for response curves. J. Amer. Statist. Assoc. 92 72--83.
  • Cardot, H., Ferraty, F. and Sarda, P. (1999). Functional linear model. Statist. Probab. Lett. 45 11--22.
  • Carey, J. R., Liedo, P., Müller, H.-G., Wang, J.-L. and Chiou, J.-M. (1998a). Relationship of age patterns of fecundity to mortality, longevity and lifetime reproduction in a large cohort of Mediterranean fruit fly females. J. Gerontology: Biological Sciences 53A B245--B251.
  • Carey, J. R., Liedo, P., Müller, H.-G., Wang, J.-L. and Vaupel, J. W. (1998b). Dual modes of aging in Mediterranean fruit fly females. Science 281 996--998.
  • Castro, P. E., Lawton, W. H. and Sylvestre, E. A. (1986). Principal modes of variation for processes with continuous sample curves. Technometrics 28 329--337.
  • Chiou, J.-M. and Müller, H.-G. (1998). Quasi-likelihood regression with unknown link and variance functions. J. Amer. Statist. Assoc. 93 1376--1387.
  • Chiou, J.-M. and Müller, H.-G. (1999). Nonparametric quasi-likelihood. Ann. Statist. 27 36--64.
  • Chiou, J.-M., Müller, H.-G. and Wang, J.-L. (2003). Functional quasi-likelihood regression models with smooth random effects. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 405--423.
  • Conway, J. B. (1990). A Course in Functional Analysis, 2nd ed. Springer, New York.
  • Dunford, N. and Schwartz, J. T. (1963). Linear Operators. II. Spectral Theory. Wiley, New York.
  • Fan, J. and Lin, S.-K. (1998). Test of significance when the data are curves. J. Amer. Statist. Assoc. 93 1007--1021.
  • Fan, J. and Zhang, J.-T. (2000). Two-step estimation of functional linear models with application to longitudinal data. J. R. Stat. Soc. Ser. B Stat. Methodol. 62 303--322.
  • Faraway, J. J. (1997). Regression analysis for a functional response. Technometrics 39 254--261.
  • Ghorai, J. (1980). Asymptotic normality of a quadratic measure of orthogonal series type density estimate. Ann. Inst. Statist. Math. 32 341--350.
  • Hall, P. and Heyde, C. (1980). Martingale Limit Theory and Its Applications. Academic Press, New York.
  • Hall, P., Poskitt, D. S. and Presnell, B. (2001). A functional data-analytic approach to signal discrimination. Technometrics 43 1--9.
  • Hall, P., Reimann, J. and Rice, J. (2000). Nonparametric estimation of a periodic function. Biometrika 87 545--557.
  • James, G. M. (2002). Generalized linear models with functional predictors. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 411--432.
  • McCullagh, P. (1983). Quasi-likelihood functions. Ann. Statist. 11 59--67.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
  • Müller, H.-G., Carey, J. R., Wu, D., Liedo, P. and Vaupel, J. W. (2001). Reproductive potential predicts longevity of female Mediterranean fruit flies. Proc. R. Soc. Lond. Ser. B Biol. Sci. 268 445--450.
  • Partridge, L. and Harvey, P. H. (1985). Costs of reproduction. Nature 316 20--21.
  • Shao, J. (1997). An asymptotic theory for linear model selection (with discussion). Statist. Sinica 7 221--264.
  • Shibata, R. (1981). An optimal selection of regression variables. Biometrika 68 45--54.
  • Ramsay, J. O. and Silverman, B. W. (1997). Functional Data Analysis. Springer, New York.
  • Rice, J. A. and Silverman, B. W. (1991). Estimating the mean and covariance structure nonparametrically when the data are curves. J. Roy. Statist. Soc. Ser. B 53 233--243.
  • Staniswalis, J. G. and Lee, J. J. (1998). Nonparametric regression analysis of longitudinal data. J. Amer. Statist. Assoc. 93 1403--1418.
  • Wang, J.-L., Müller, H.-G., Capra, W. B. and Carey, J. R. (1994). Rates of mortality in populations of Caenorhabditis elegans. Science 266 827--828.
  • Wedderburn, R. W. M. (1974). Quasi-likelihood functions, generalized linear models and the Gauss--Newton method. Biometrika 61 439--447.