The Annals of Statistics

A semiparametric model for cluster data

Wenyang Zhang, Jianqing Fan, and Yan Sun
Source: Ann. Statist. Volume 37, Number 5A (2009), 2377-2408.

Abstract

In the analysis of cluster data, the regression coefficients are frequently assumed to be the same across all clusters. This hampers the ability to study the varying impacts of factors on each cluster. In this paper, a semiparametric model is introduced to account for varying impacts of factors over clusters by using cluster-level covariates. It achieves the parsimony of parametrization and allows the explorations of nonlinear interactions. The random effect in the semiparametric model also accounts for within-cluster correlation. Local, linear-based estimation procedure is proposed for estimating functional coefficients, residual variance and within-cluster correlation matrix. The asymptotic properties of the proposed estimators are established, and the method for constructing simultaneous confidence bands are proposed and studied. In addition, relevant hypothesis testing problems are addressed. Simulation studies are carried out to demonstrate the methodological power of the proposed methods in the finite sample. The proposed model and methods are used to analyse the second birth interval in Bangladesh, leading to some interesting findings.

First Page: Show Hide
Primary Subjects: 62G08
Secondary Subjects: 62G10, 62G15
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1247663759
Digital Object Identifier: doi:10.1214/08-AOS662
Zentralblatt MATH identifier: 05596905
Mathematical Reviews number (MathSciNet): MR2543696

References

[1] Bickel, P. L. and Rosenblatt, M. (1973). On some global measures of the derivations of density function estimates. Ann. Statist. 1 1071–1095.
Mathematical Reviews (MathSciNet): MR348906
Zentralblatt MATH: 0275.62033
Digital Object Identifier: doi:10.1214/aos/1176342558
Project Euclid: euclid.aos/1176342558
[2] Brumback, B. and Rice, J. A. (1998). Smoothing spline models for the analysis of nested and crossed samples of curves (with discussion). J. Amer. Statist. Assoc. 93 961–994.
Mathematical Reviews (MathSciNet): MR1649194
Zentralblatt MATH: 1064.62515
Digital Object Identifier: doi:10.1080/01621459.1998.10473755
[3] Chiang, C.-T., Rice, J. A. and Wu, C. O. (2001). Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables. J. Amer. Statist. Assoc. 96 605–619.
Mathematical Reviews (MathSciNet): MR1946428
Zentralblatt MATH: 1018.62034
Digital Object Identifier: doi:10.1198/016214501753168280
[4] Chiou, J.-M. and Müller, H.-G. (2005). Estimated estimating equations: Semiparametric inference for clustered/longitudinal data. J. Roy. Statist. Soc. Ser. B 67 531–553.
Mathematical Reviews (MathSciNet): MR2168203
Zentralblatt MATH: 1095.62046
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00514.x
[5] Csörgö, M. and Révész, P. (1981). Strong Approximations in Probability and Statistics. Academic Press, New York.
Mathematical Reviews (MathSciNet): MR666546
[6] Diggle, P. J., Heagerty, P., Liang, K. Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data. Oxford Univ. Press, London.
Mathematical Reviews (MathSciNet): MR2049007
Zentralblatt MATH: 1031.62002
[7] Fan, J. and Gijbels, I. (1995). Data-driven bandwidth selection in local polynomial fitting: Variable bandwidth and spatial adaptation. J. Roy. Statist. Soc. Ser. B 57 371–394.
Mathematical Reviews (MathSciNet): MR1323345
[8] Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Amer. Statist. Assoc. 99 710–723.
Mathematical Reviews (MathSciNet): MR2090905
Zentralblatt MATH: 1117.62329
Digital Object Identifier: doi:10.1198/016214504000001060
[9] Fan, J. and Zhang, W. (1999). Statistical estimation in varying coefficient models. Ann. Statist. 27 1491–1518.
Mathematical Reviews (MathSciNet): MR1742497
Zentralblatt MATH: 0977.62039
Digital Object Identifier: doi:10.1214/aos/1017939139
Project Euclid: euclid.aos/1017939139
[10] Fan, J. and Zhang, W. (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scand. J. Statist. 27 715–731.
Mathematical Reviews (MathSciNet): MR1804172
Digital Object Identifier: doi:10.1111/1467-9469.00218
[11] Fan, J. and Wu, Y. (2008). Semiparametric estimation of covariance matrices for longitudinal data. J. Amer. Statist. Assoc. To appear.
Mathematical Reviews (MathSciNet): MR2504201
Digital Object Identifier: doi:10.1198/016214508000000742
[12] Fan, J., Zhang, C. and Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Statist. 29 153–193.
Mathematical Reviews (MathSciNet): MR1833962
Zentralblatt MATH: 1029.62042
Digital Object Identifier: doi:10.1214/aos/996986505
Project Euclid: euclid.aos/996986505
[13] Härdle, W. (1989). Asymptotic maximal deviation of M-smoothers. J. Multivariate Anal. 29 163–179.
Mathematical Reviews (MathSciNet): MR1004333
Zentralblatt MATH: 0667.62028
Digital Object Identifier: doi:10.1016/0047-259X(89)90022-5
[14] Hoover, D. R., Rice, J. A., Wu, C. O. and Yang, L.-P. (1998). Nonparametric smoothing estimates of time-varying coefficient models with longitudinal data. Biometrika 85 809–822.
Mathematical Reviews (MathSciNet): MR1666699
Zentralblatt MATH: 0921.62045
Digital Object Identifier: doi:10.1093/biomet/85.4.809
[15] Huang, J. Z., Wu, C. O. and Zhou, L. (2002). Varying-coefficient models and basis function approximations for the analysis of repeated measurements. Biometrika 89 111–128.
Mathematical Reviews (MathSciNet): MR1888349
Zentralblatt MATH: 0998.62024
Digital Object Identifier: doi:10.1093/biomet/89.1.111
[16] Lam, C. and Fan, J. (2008). Profile-Kernel likelihood inference with diverging number of parameters. Ann. Statist. 36 2232–2260.
Mathematical Reviews (MathSciNet): MR2458186
Zentralblatt MATH: 05368490
Digital Object Identifier: doi:10.1214/07-AOS544
Project Euclid: euclid.aos/1223908091
[17] Li, R. and Liang, H. (2008). Variable selection in semiparametric regression modeling. Ann. Statist. 36 261–286.
Mathematical Reviews (MathSciNet): MR2387971
Zentralblatt MATH: 1132.62027
Digital Object Identifier: doi:10.1214/009053607000000604
Project Euclid: euclid.aos/1201877301
[18] Lin, X. and Carroll, R. J. (2000). Nonparametric function estimation for clustered data when the predictor is measured without/with error. J. Amer. Statist. Assoc. 95 520–534.
Mathematical Reviews (MathSciNet): MR1803170
Zentralblatt MATH: 0995.62043
Digital Object Identifier: doi:10.1080/01621459.2000.10474229
[19] Lin, X. and Carroll, R. J. (2006). Semiparametric estimation in general repeated measures problems. J. Roy. Statist. Soc. Ser. B 68 69–88.
Mathematical Reviews (MathSciNet): MR2212575
Zentralblatt MATH: 1141.62026
Digital Object Identifier: doi:10.1111/j.1467-9868.2005.00533.x
[20] Lin, Z. Y. and Lu, C. R. (1992). Strong Approximation Theorem. Science Press, Beijing, China. (In Chinese.)
[21] Martinussen, T. and Scheike, T. H. (1999). A semiparametric additive regression model for longitudinal data. Biometrika 86 691–702.
Mathematical Reviews (MathSciNet): MR1723787
Zentralblatt MATH: 0938.62043
Digital Object Identifier: doi:10.1093/biomet/86.3.691
[22] Mitra, S. N., Al-Sabir, A., Cross, A. R. and Jamil, K. (1997). Bangladesh and demographic health survey 1996–1997. National Institute of Population Research and Training (NIPORT), Mitra and Associates, and Macro International Inc., Dhaka and Calverton, MD.
[23] Qu, A. and Li, R. (2006). Quadratic inference functions for varying-coefficient models with longitudinal data. Biometrics 62 379–391.
Mathematical Reviews (MathSciNet): MR2227487
Digital Object Identifier: doi:10.1111/j.1541-0420.2005.00490.x
[24] Sun, Y., Zhang, W. and Tong, H. (2007). Estimation of the covariance matrix of random effects in longitudinal studies. Ann. Statist. 35 2795–2814.
Mathematical Reviews (MathSciNet): MR2382666
Zentralblatt MATH: 1129.62053
Digital Object Identifier: doi:10.1214/009053607000000523
Project Euclid: euclid.aos/1201012980
[25] Wang, N. (2003). Marginal nonparametric kernel regression accounting within-subject correlation. Biometrika 90 43–52.
Mathematical Reviews (MathSciNet): MR1966549
Zentralblatt MATH: 1034.62035
Digital Object Identifier: doi:10.1093/biomet/90.1.43
[26] Wang, N., Carroll, R. J. and Lin, X. (2005). Efficient semiparametric marginal estimation for longitudinal/clustered data. J. Amer. Statist. Assoc. 100 147–157.
Mathematical Reviews (MathSciNet): MR2156825
Zentralblatt MATH: 1117.62440
Digital Object Identifier: doi:10.1198/016214504000000629
[27] Welsh, A. H., Lin, X. and Carroll, R. J. (2002). Marginal longitudinal nonparametric regression: Locality and efficiency of spline and kernel methods. J. Amer. Statist. Assoc. 97 482–493.
Mathematical Reviews (MathSciNet): MR1941465
Zentralblatt MATH: 1073.62529
Digital Object Identifier: doi:10.1198/016214502760047014
[28] Wu, C. O., Chiang, C. T. and Hoover, D. R. (1998). Asymptotic confidence regions for kernel smoothing of a varying-coefficient model with longitudinal data. J. Amer. Statist. Assoc. 93 1388–1402.
Mathematical Reviews (MathSciNet): MR1666635
Zentralblatt MATH: 1064.62523
Digital Object Identifier: doi:10.1080/01621459.1998.10473800
[29] Xia, Y. (1998). Bias-corrected confidence bands in nonparametric regression. J. Roy. Statist. Soc. Ser. B 60 797–811.
Mathematical Reviews (MathSciNet): MR1649488
Zentralblatt MATH: 0909.62043
Digital Object Identifier: doi:10.1111/1467-9868.00155
[30] Xia, Y. and Li, W. K. (1999). On the estimation and testing of functional-coefficient linear models. Statist. Sinica 9 735–757.
Mathematical Reviews (MathSciNet): MR1711643
Zentralblatt MATH: 0958.62040
[31] Zeger, S. L. and Diggle, P. J. (1994). Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 50 689–699.
[32] Zhang, W., Lee, S. Y. and Song, X. (2002). Local polynomial fitting in semivarying coefficient models, J. Multivariate Anal. 82 166–188.
Mathematical Reviews (MathSciNet): MR1918619
Zentralblatt MATH: 0995.62038
Digital Object Identifier: doi:10.1006/jmva.2001.2012

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?