The Annals of Statistics

Semiparametric GEE analysis in partially linear single-index models for longitudinal data

Jia Chen, Degui Li, Hua Liang, and Suojin Wang

Full-text: Open access

Abstract

In this article, we study a partially linear single-index model for longitudinal data under a general framework which includes both the sparse and dense longitudinal data cases. A semiparametric estimation method based on a combination of the local linear smoothing and generalized estimation equations (GEE) is introduced to estimate the two parameter vectors as well as the unknown link function. Under some mild conditions, we derive the asymptotic properties of the proposed parametric and nonparametric estimators in different scenarios, from which we find that the convergence rates and asymptotic variances of the proposed estimators for sparse longitudinal data would be substantially different from those for dense longitudinal data. We also discuss the estimation of the covariance (or weight) matrices involved in the semiparametric GEE method. Furthermore, we provide some numerical studies including Monte Carlo simulation and an empirical application to illustrate our methodology and theory.

Article information

Source
Ann. Statist., Volume 43, Number 4 (2015), 1682-1715.

Dates
Received: May 2014
Revised: February 2015
First available in Project Euclid: 17 June 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1434546219

Digital Object Identifier
doi:10.1214/15-AOS1320

Mathematical Reviews number (MathSciNet)
MR3357875

Zentralblatt MATH identifier
1317.62036

Subjects
Primary: 62G09: Resampling methods 62H99: None of the above, but in this section 62G99: None of the above, but in this section

Keywords
Efficiency GEE local linear smoothing longitudinal data semiparametric estimation single-index models

Citation

Chen, Jia; Li, Degui; Liang, Hua; Wang, Suojin. Semiparametric GEE analysis in partially linear single-index models for longitudinal data. Ann. Statist. 43 (2015), no. 4, 1682--1715. doi:10.1214/15-AOS1320. https://projecteuclid.org/euclid.aos/1434546219


Export citation

References

  • Braun-Fahrländer, C., Ackermann-Liebrich, U., Schwartz, J., Gnehm, H. P., Rutishauser, M. and Wanner, H. U. (1992). Air pollution and respiratory symptoms in preschool children. Am. Rev. Respir. Dis. 145 42–47.
  • Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. P. (1997). Generalized partially linear single-index models. J. Amer. Statist. Assoc. 92 477–489.
  • Chen, L.-H., Cheng, M.-Y. and Peng, L. (2009). Conditional variance estimation in heteroscedastic regression models. J. Statist. Plann. Inference 139 236–245.
  • Chen, J., Gao, J. and Li, D. (2013a). Estimation in single-index panel data models with heterogeneous link functions. Econometric Rev. 32 928–955.
  • Chen, J., Gao, J. and Li, D. (2013b). Estimation in partially linear single-index panel data models with fixed effects. J. Bus. Econom. Statist. 31 315–330.
  • Chen, J., Li, D., Liang, H. and Wang, S. (2015). Supplement to “Semiparametric GEE analysis in partially linear single-index models for longitudinal data.” DOI:10.1214/15-AOS1320SUPP.
  • Diggle, P. J., Heagerty, P. J., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Univ. Press, Oxford.
  • Dockery, D. W., Speizer, F. E., Stram, D. O., Ware, J. H., Spengler, J. D. and Ferris, B. G. Jr. (1989). Effects of inhalable particles on respiratory health of children. Am. Rev. Respir. Dis. 139 587–594.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman & Hall, London.
  • Fan, J. and Huang, T. (2005). Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11 1031–1057.
  • Fan, J., Huang, T. and Li, R. (2007). Analysis of longitudinal data with semiparametric estimation of convariance function. J. Amer. Statist. Assoc. 102 632–641.
  • Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Amer. Statist. Assoc. 99 710–723.
  • Fan, J. and Wu, Y. (2008). Semiparametric estimation of covariance matrixes for longitudinal data. J. Amer. Statist. Assoc. 103 1520–1533.
  • Fan, J. and Yao, Q. (1998). Efficient estimation of conditional variance functions in stochastic regression. Biometrika 85 645–660.
  • Gao, J. (2007). Nonlinear Time Series: Semiparametric and Nonparametric Methods. Chapman & Hall/CRC, Boca Raton, FL.
  • Hall, P., Müller, H.-G. and Wang, J.-L. (2006). Properties of principal component methods for functional and longitudinal data analysis. Ann. Statist. 34 1493–1517.
  • He, X., Zhu, Z.-Y. and Fung, W.-K. (2002). Estimation in a semiparametric model for longitudinal data with unspecified dependence structure. Biometrika 89 579–590.
  • Huang, J. Z., Wu, C. O. and Zhou, L. (2002). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89 111–128.
  • Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J. Econometrics 58 71–120.
  • Jiang, C.-R. and Wang, J.-L. (2011). Functional single index models for longitudinal data. Ann. Statist. 39 362–388.
  • Kim, S. and Zhao, Z. (2013). Unified inference for sparse and dense longitudinal models. Biometrika 100 203–212.
  • Kinney, P. L., Ware, J. H., Spengler, J. D., Dockery, D. W., Speizer, F. E. and Ferris, B. G. Jr. (1989). Short-term pulmonary function change in association with ozone levels. Am. Rev. Respir. Dis. 139 56–61.
  • Leng, C. and Tang, C. Y. (2011). Improving variance function estimation in semiparametric longitudinal data analysis. Canad. J. Statist. 39 656–670.
  • Li, Y. and Hsing, T. (2010). Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Ann. Statist. 38 3321–3351.
  • Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22.
  • Liang, H., Liu, X., Li, R. and Tsai, C.-L. (2010). Estimation and testing for partially linear single-index models. Ann. Statist. 38 3811–3836.
  • Lin, X. and Carroll, R. J. (2006). Semiparametric estimation in general repeated measures problems. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 69–88.
  • Lin, D. Y. and Ying, Z. (2001). Semiparametric and nonparametric regression analysis of longitudinal data. J. Amer. Statist. Assoc. 96 103–126.
  • Lipfert, F. W. and Hammerstrom, T. (1992). Temporal patterns in air pollution and hospital admissions. Environ. Res. 59 374–399.
  • Luo, W., Li, B. and Yin, X. (2014). On efficient dimension reduction with respect to a statistical functional of interest. Ann. Statist. 42 382–412.
  • Ma, S., Liang, H. and Tsai, C.-L. (2014). Partially linear single index models for repeated measurements. J. Multivariate Anal. 130 354–375.
  • Ma, Y. and Zhu, L. (2013). Doubly robust and efficient estimators for heteroscedastic partially linear single-index models allowing high dimensional covariates. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 305–322.
  • Peng, L. and Yao, Q. (2003). Least absolute deviations estimation for ARCH and GARCH models. Biometrika 90 967–975.
  • Petrov, V. V. (1995). Limit Theorems of Probability Theory: Sequences of Independent Random Variables, Oxford Science Publications. Oxford Univ. Press, New York.
  • Pope, C. A. III (1991). Respiratory hospital admissions associated with $\mathrm{PM}_{1}0$ pollution in utah, salt lake, and cache valleys. Archives of Environmental Health: An International Journal 46 90–97.
  • Pope, C. A. III, Bates, D. V. and Raizenne, M. E. (1995). Health effects of particulate air pollution: Time for reassessment? Environ. Health Perspect. 103 472–480.
  • Wang, L. (2011). GEE analysis of clustered binary data with diverging number of covariates. Ann. Statist. 39 389–417.
  • Wang, N., Carroll, R. J. and Lin, X. (2005). Efficient semiparametric marginal estimation for longitudinal/clustered data. J. Amer. Statist. Assoc. 100 147–157.
  • Wang, S., Qian, L. and Carroll, R. J. (2010). Generalized empirical likelihood methods for analyzing longitudinal data. Biometrika 97 79–93.
  • Wang, J.-L., Xue, L., Zhu, L. and Chong, Y. S. (2010). Estimation for a partial-linear single-index model. Ann. Statist. 38 246–274.
  • Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90 831–844.
  • Wu, H. and Zhang, J.-T. (2006). Nonparametric Regression Methods for Longitudinal Data Analysis. Wiley, Hoboken, NJ.
  • Xia, Y. and Härdle, W. (2006). Semi-parametric estimation of partially linear single-index models. J. Multivariate Anal. 97 1162–1184.
  • Xia, Y., Tong, H. and Li, W. K. (1999). On extended partially linear single-index models. Biometrika 86 831–842.
  • Xie, M. and Yang, Y. (2003). Asymptotics for generalized estimating equations with large cluster sizes. Ann. Statist. 31 310–347.
  • Yao, W. and Li, R. (2013). New local estimation procedure for a non-parametric regression function for longitudinal data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 123–138.
  • Yao, F., Müller, H.-G. and Wang, J.-L. (2005). Functional data analysis for sparse longitudinal data. J. Amer. Statist. Assoc. 100 577–590.
  • Yu, K. and Jones, M. C. (2004). Likelihood-based local linear estimation of the conditional variance function. J. Amer. Statist. Assoc. 99 139–144.
  • Yu, Y. and Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. J. Amer. Statist. Assoc. 97 1042–1054.
  • Zhang, J.-T. and Chen, J. (2007). Statistical inferences for functional data. Ann. Statist. 35 1052–1079.
  • Zhang, W., Fan, J. and Sun, Y. (2009). A semiparametric model for cluster data. Ann. Statist. 37 2377–2408.
  • Zhang, W., Leng, C. and Tang, C. Y. (2015). A joint modelling approach for longitudinal studies. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77 219–238.

Supplemental materials

  • Supplement to “Semiparametric GEE analysis in partially linear single-index models for longitudinal data”. The supplement gives the proof of Theorem 3 and some technical lemmas that were used to prove the main results in Appendix B. It also includes some additional results of our simulation studies described in Section 5.