The Annals of Statistics

Estimation in additive models with highly or nonhighly correlated covariates

Jiancheng Jiang, Yingying Fan, and Jianqing Fan

Full-text: Open access


Motivated by normalizing DNA microarray data and by predicting the interest rates, we explore nonparametric estimation of additive models with highly correlated covariates. We introduce two novel approaches for estimating the additive components, integration estimation and pooled backfitting estimation. The former is designed for highly correlated covariates, and the latter is useful for nonhighly correlated covariates. Asymptotic normalities of the proposed estimators are established. Simulations are conducted to demonstrate finite sample behaviors of the proposed estimators, and real data examples are given to illustrate the value of the methodology.

Article information

Ann. Statist., Volume 38, Number 3 (2010), 1403-1432.

First available in Project Euclid: 8 March 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G10: Hypothesis testing 60J60: Diffusion processes [See also 58J65]
Secondary: 62G20: Asymptotic properties

Additive model backfitting local linear smoothing normalization varying coefficient


Jiang, Jiancheng; Fan, Yingying; Fan, Jianqing. Estimation in additive models with highly or nonhighly correlated covariates. Ann. Statist. 38 (2010), no. 3, 1403--1432. doi:10.1214/09-AOS753.

Export citation


  • Buja, A., Hastie, T. J. and Tibshirani, R. J. (1989). Linear smoothers and additive models. Ann. Statist. 17 453–555.
  • Dudoit, S., Yang, Y. H., Luu, P., Lin, D. M., Peng, V., Ngai, J. and Speed, T. P. (2002). Normalization for cDNA microarray data: A robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Research 30 e15.
  • Fan, J., Chen, Y., Chan, H. M., Tam, P. K. H. and Ren, Y. (2005). Removing intensity effects and identifying significant genes for Affymetrix arrays in MIF-suppressed neuroblastoma cells. Proc. Natl. Acad. Sci. USA 102 17751–17756.
  • Fan, J., Härdle, W. and Mammen, E. (1998). Direct estimation of additive and linear components for high-dimensional data. Ann. Statist. 26 943–971.
  • Fan, J., Huang, T. and Peng, H. (2005). Semilinear high-dimensional model for normalization of microarray data: A theoretical analysis and partial consistency (with discussion). J. Amer. Statist. Assoc. 100 781–813.
  • Fan, J. and Jiang, J. (2005). Nonparametric inference for additive models. J. Amer. Statist. Assoc. 100 890–907.
  • Fan, J. and Jiang, J. (2007). Nonparametric inference with generalized likelihood ratio tests (with discussion). TEST 16 409–478.
  • Friedman, J. H. and Stuetzle, W. (1981). Projection pursuit regression. J. Amer. Statist. Assoc. 76 817–823.
  • Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Huang, J., Wang, D. and Zhang, C.-H. (2005). A two-way semi-linear model for normalization and analysis of cDNA microarray data. J. Amer. Statist. Assoc. 100 814–829.
  • Huang, J. and Zhang, C.-H. (2005). Asymptotic analysis of a two-way semilinear model for microarray data. Statist. Sinica 15 597–618.
  • Jiang, J. Cheng, B. and Wu, X. (2002). On estimation of survival function under random censoring. Sci. China Ser. A 45 503–511.
  • Jiang, J. and Mack, Y. P. (2001). Robust local polynomial regression for dependent data. Statist. Sinica 11 705–722.
  • Linton, O. B. and Nielsen, J. P. (1995). A Kernel method of estimating regressing structured nonparametric regression based on marginal integration. Biometrika 82 93–100.
  • Mammen, E., Linton, O. and Nielsen, J. (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions. Ann. Statist. 27 1443–1490.
  • Nielsen, J. P. and Sperlich, S. (2005). Smooth backfitting in practice. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 43–61.
  • Opsomer, J.-D. (2000). Asymptotic properties of backfitting estimators. J. Multivariate Anal. 73 166–179.
  • Opsomer, J.-D. and Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression. Ann. Statist. 25 186–211.
  • Opsomer, J.-D. and Ruppert, D. (1998). A fully automated bandwidth selection method for fitting additive models. J. Amer. Statist. Assoc. 93 605–619.
  • Patterson, T. et al. (2006). Performance comparison of one-color and two-color platforms within the MicroArray Qualtiy Control (MAQC) project. Nat. Biotechnol. 24 1140–1150.
  • Tjøtheim, D. and Auestad, B. (1994). Nonparametric identification of nonlinear time series: Projection. J. Amer. Statist. Assoc. 89 1398–1409.
  • Tseng, G. C., Oh, M. K., Rohlin, L., Liao, J. C. and Wong, W. H. (2001). Issues in cDNA microarray analysis: Quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Research 29 2549–2557.