The Annals of Statistics

Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates

Li Wang, Lan Xue, Annie Qu, and Hua Liang

Full-text: Open access

Abstract

We propose generalized additive partial linear models for complex data which allow one to capture nonlinear patterns of some covariates, in the presence of linear components. The proposed method improves estimation efficiency and increases statistical power for correlated data through incorporating the correlation information. A unique feature of the proposed method is its capability of handling model selection in cases where it is difficult to specify the likelihood function. We derive the quadratic inference function-based estimators for the linear coefficients and the nonparametric functions when the dimension of covariates diverges, and establish asymptotic normality for the linear coefficient estimators and the rates of convergence for the nonparametric functions estimators for both finite and high-dimensional cases. The proposed method and theoretical development are quite challenging since the numbers of linear covariates and nonlinear components both increase as the sample size increases. We also propose a doubly penalized procedure for variable selection which can simultaneously identify nonzero linear and nonparametric components, and which has an asymptotic oracle property. Extensive Monte Carlo studies have been conducted and show that the proposed procedure works effectively even with moderate sample sizes. A pharmacokinetics study on renal cancer data is illustrated using the proposed method.

Article information

Source
Ann. Statist., Volume 42, Number 2 (2014), 592-624.

Dates
First available in Project Euclid: 20 May 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1400592171

Digital Object Identifier
doi:10.1214/13-AOS1194

Mathematical Reviews number (MathSciNet)
MR3210980

Zentralblatt MATH identifier
1309.62077

Subjects
Primary: 62G08: Nonparametric regression
Secondary: 62G10: Hypothesis testing 62G20: Asymptotic properties 62J02: General nonlinear regression 62F12: Asymptotic properties of estimators

Keywords
Additive model group selection model selection oracle property partial linear models polynomial splines quadratic inference function SCAD selection consistency

Citation

Wang, Li; Xue, Lan; Qu, Annie; Liang, Hua. Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates. Ann. Statist. 42 (2014), no. 2, 592--624. doi:10.1214/13-AOS1194. https://projecteuclid.org/euclid.aos/1400592171


Export citation

References

  • [1] Boni, J. P., Leister, C., Bender, G., Fitzpatrick, V., Twine, N., Stover, J., Dorner, A., Immermann, F. and Burczynski, M. E. (2005). Population pharmacokinetics of CCI-779: Correlations to safety and pharmacogenomic responses in patients with advanced renal cancer. Clinical Pharmacology & Therapeutics 77 76–89.
  • [2] Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models. Ann. Statist. 17 453–555.
  • [3] Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 759–771.
  • [4] Cheng, G., Zhou, L. and Huang, J. Z. (2014). Efficient semiparametric estimation in generalized partially linear additive models for longitudinal/clustered data. Bernoulli 20 141–163.
  • [5] Cho, H. and Qu, A. (2013). Model selection for correlated data with diverging number of parameters. Statist. Sinica 23 901–927.
  • [6] Fan, J., Feng, Y. and Song, R. (2011). Nonparametric independence screening in sparse ultra-high-dimensional additive models. J. Amer. Statist. Assoc. 106 544–557.
  • [7] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [8] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • [9] Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • [10] Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators. Econometrica 50 1029–1054.
  • [11] Härdle, W., Müller, M., Sperlich, S. and Werwatz, A. (2004). Nonparametric and Semiparametric Models. Springer, New York.
  • [12] He, X., Fung, W. K. and Zhu, Z. (2005). Robust estimation in generalized partial linear models for clustered data. J. Amer. Statist. Assoc. 100 1176–1184.
  • [13] Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.
  • [14] Huang, J. Z. (1998). Functional ANOVA models for generalized regression. J. Multivariate Anal. 67 49–71.
  • [15] Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist. 31 1600–1635.
  • [16] Huang, J. Z., Wu, C. O. and Zhou, L. (2004). Polynomial spline estimation and inference for varying coefficient models with longitudinal data. Statist. Sinica 14 763–788.
  • [17] Huang, J. Z., Zhang, L. and Zhou, L. (2007). Efficient estimation in marginal partially linear models for longitudinal/clustered data using splines. Scand. J. Stat. 34 451–477.
  • [18] Lian, H., Liang, H. and Wang, L. (2014). Generalized additive partial linear models for clustered data with diverging number of covariates using GEE. Statist. Sinica 24 173–196.
  • [19] Liang, K. Y. and Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika 73 13–22.
  • [20] Ma, S., Song, Q. and Wang, L. (2013). Simultaneous variable selection and estimation in semiparametric modeling of longitudinal/clustered data. Bernoulli 19 252–274.
  • [21] Macke, J. H., Berens, P., Ecker, A. S., Tolias, A. S. and Bethge, M. (2009). Generating spike trains with specified correlation coefficients. Neural Comput. 21 397–423.
  • [22] Pepe, M. S. and Anderson, G. L. (1994). A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Comm. Statist. Simulation Comput. 23 939–951.
  • [23] Qu, A., Lindsay, B. G. and Li, B. (2000). Improving generalised estimating equations using quadratic inference functions. Biometrika 87 823–836.
  • [24] Stone, C. J. (1985). Additive regression and other nonparametric models. Ann. Statist. 13 689–705.
  • [25] Wang, L., Liu, X., Liang, H. and Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. Ann. Statist. 39 1827–1851.
  • [26] Wang, L. and Qu, A. (2009). Consistent model selection and data-driven smooth tests for longitudinal data in the estimating equations approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 177–190.
  • [27] Wang, L., Xue, L., Qu, A. and Liang, H. (2014). Supplement to “Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates.” DOI:10.1214/13-AOS1194SUPP.
  • [28] Wang, N. (2003). Marginal nonparametric kernel regression accounting for within-subject correlation. Biometrika 90 43–52.
  • [29] Welsh, A. H., Lin, X. and Carroll, R. J. (2002). Marginal longitudinal nonparametric regression: Locality and efficiency of spline and kernel methods. J. Amer. Statist. Assoc. 97 482–493.
  • [30] Wood, S. N. (2004). Stable and efficient multiple smoothing parameter estimation for generalized additive models. J. Amer. Statist. Assoc. 99 673–686.
  • [31] Xue, L. (2009). Consistent variable selection in additive models. Statist. Sinica 19 1281–1296.
  • [32] Xue, L. and Qu, A. (2012). Variable selection in high-dimensional varying-coefficient models with global optimality. J. Mach. Learn. Res. 13 1973–1998.
  • [33] Xue, L., Qu, A. and Zhou, J. (2010). Consistent model selection for marginal generalized additive model for correlated data. J. Amer. Statist. Assoc. 105 1518–1530.
  • [34] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • [35] Zhou, J. and Qu, A. (2012). Informative estimation and selection of correlation structure for longitudinal data. J. Amer. Statist. Assoc. 107 701–710.
  • [36] Zhu, Z., Fung, W. K. and He, X. (2008). On the asymptotics of marginal regression splines with longitudinal data. Biometrika 95 907–917.

Supplemental materials

  • Supplementary material: Supplement to “Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates”. The supplementary material provides a number of technical lemmas and their proofs. The technical lemmas are used in the proofs of Theorems 1–5 in the paper.