## The Annals of Statistics

### Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates

#### Abstract

In the low-dimensional case, the generalized additive coefficient model (GACM) proposed by Xue and Yang [Statist. Sinica 16 (2006) 1423–1446] has been demonstrated to be a powerful tool for studying nonlinear interaction effects of variables. In this paper, we propose estimation and inference procedures for the GACM when the dimension of the variables is high. Specifically, we propose a groupwise penalization based procedure to distinguish significant covariates for the “large $p$ small $n$” setting. The procedure is shown to be consistent for model structure identification. Further, we construct simultaneous confidence bands for the coefficient functions in the selected model based on a refined two-step spline estimator. We also discuss how to choose the tuning parameters. To estimate the standard deviation of the functional estimator, we adopt the smoothed bootstrap method. We conduct simulation experiments to evaluate the numerical performance of the proposed methods and analyze an obesity data set from a genome-wide association study as an illustration.

#### Article information

Source
Ann. Statist., Volume 43, Number 5 (2015), 2102-2131.

Dates
Received: September 2014
Revised: May 2015
First available in Project Euclid: 3 August 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1438606855

Digital Object Identifier
doi:10.1214/15-AOS1344

Mathematical Reviews number (MathSciNet)
MR3375878

Zentralblatt MATH identifier
1323.62033

#### Citation

Ma, Shujie; Carroll, Raymond J.; Liang, Hua; Xu, Shizhong. Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates. Ann. Statist. 43 (2015), no. 5, 2102--2131. doi:10.1214/15-AOS1344. https://projecteuclid.org/euclid.aos/1438606855

#### References

• Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. P. (1997). Generalized partially linear single-index models. J. Amer. Statist. Assoc. 92 477–489.
• Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 759–771.
• Cheverud, J. M. (2001). A simple correction for multiple comparisons in interval mapping genome scans. Heredity (Edinb) 87 52–58.
• Claeskens, G. and Van Keilegom, I. (2003). Bootstrap confidence bands for regression curves and their derivatives. Ann. Statist. 31 1852–1884.
• Csörgő, M. and Révész, P. (1981). Strong Approximations in Probability and Statistics. Academic Press, New York.
• Dawber, T. R., Meadors, G. F. and Moore, F. E. (1951). Epidemiological approaches to heart disease: The Framingham 660 study. American Journal of Public Health 41 279–286.
• de Boor, C. (2001). A Practical Guide to Splines, revised ed. Applied Mathematical Sciences 27. Springer, New York.
• DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Grundlehren der Mathematischen Wissenschaften 303. Springer, Berlin.
• Efron, B. (2014). Estimation and accuracy after model selection. J. Amer. Statist. Assoc. 109 991–1007.
• Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• Fan, Y. and Tang, C. Y. (2013). Tuning parameter selection in high dimensional penalized likelihood. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 531–552.
• Hall, P. and Titterington, D. M. (1988). On confidence bands in nonparametric density estimation and regression. J. Multivariate Anal. 27 228–254.
• Härdle, W. and Marron, J. S. (1991). Bootstrap simultaneous error bars for nonparametric regression. Ann. Statist. 19 778–796.
• Horowitz, J., Klemelä, J. and Mammen, E. (2006). Optimal estimation in additive regression models. Bernoulli 12 271–298.
• Horowitz, J. L. and Mammen, E. (2004). Nonparametric estimation of an additive model with a link function. Ann. Statist. 32 2412–2443.
• Huang, J. Z. (2003). Local asymptotics for polynomial spline regression. Ann. Statist. 31 1600–1635.
• Huang, J., Horowitz, J. L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.
• Jiang, B. and Liu, J. S. (2014). Variable selection for general index models via sliced inverse regression. Ann. Statist. 42 1751–1786.
• Knutson, K. L. (2012). Does inadequate sleep play a role in vulnerability to obesity? Am. J. Hum. Biol. 24 361–371.
• Lam, C. and Fan, J. (2008). Profile-kernel likelihood inference with diverging number of parameters. Ann. Statist. 36 2232–2260.
• Lee, Y. K., Mammen, E. and Park, B. U. (2012). Flexible generalized varying coefficient regression models. Ann. Statist. 40 1906–1933.
• Lian, H. (2012). Variable selection for high-dimensional generalized varying-coefficient models. Statist. Sinica 22 1563–1588.
• Liu, R. and Yang, L. (2010). Spline-backfitted kernel smoothing of additive coefficient model. Econometric Theory 26 29–59.
• Liu, R., Yang, L. and Härdle, W. K. (2013). Oracally efficient two-step estimation of generalized additive model. J. Amer. Statist. Assoc. 108 619–631.
• Ma, S. and Yang, L. (2011a). A jump-detecting procedure based on spline estimation. J. Nonparametr. Stat. 23 67–81.
• Ma, S. and Yang, L. (2011b). Spline-backfitted kernel smoothing of partially linear additive model. J. Statist. Plann. Inference 141 204–219.
• Ma, S., Yang, L. and Carroll, R. J. (2012). A simultaneous confidence band for sparse longitudinal regression. Statist. Sinica 22 95–122.
• Ma, S., Carroll, R. J., Liang, H. and Xu, S. (2015). Supplement to “Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates.” DOI:10.1214/15-AOS1344SUPP.
• Meier, L. and Bühlmann, P. (2007). Smoothing $l_{1}$-penalized estimators for high-dimensional time-course data. Electron. J. Stat. 1 597–615.
• Meier, L., van de Geer, S. and Bühlmann, P. (2009). High-dimensional additive modeling. Ann. Statist. 37 3779–3821.
• Murcray, C. E., Lewinger, J. P. and Gauderman, W. J. (2009). Gene-environment interaction in genome-wide association studies. Am. J. Epidemiol. 169 219–226.
• Nyholt, D. R. (2004). A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am. J. Hum. Genet. 74 765–769.
• Randall, J. C., Winkler, T. M., Kutalik, Z., Berndt, S. I., Jackson, A. U. et al. (2013). Sex-stratified genome-wide association studies including 270,000 individuals show sexual dimorphism in genetic loci for anthropometric traits. PLOS Genetics 9 e1003500.
• Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2009). Sparse additive models. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 1009–1030.
• Wang, H., Li, R. and Tsai, C.-L. (2007). Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94 553–568.
• Wang, L., Xue, L., Qu, A. and Liang, H. (2014). Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates. Ann. Statist. 42 592–624.
• Wareham, N. J., van Sluijs, E. M. F. and Ekelund, U. (2005). Physical activity and obesity prevention: A review of the current evidence. Proc Nutr Soc 64 229–247.
• Xue, L. and Liang, H. (2010). Polynomial spline estimation for a generalized additive coefficient model. Scand. J. Stat. 37 26–46.
• Xue, L. and Yang, L. (2006). Additive coefficient modeling via polynomial spline. Statist. Sinica 16 1423–1446.
• Zhou, S., Shen, X. and Wolfe, D. A. (1998). Local asymptotics for regression splines and confidence regions. Ann. Statist. 26 1760–1782.
• Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.

#### Supplemental materials

• Supplemental materials for “Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates”. The supplementary material presents additional numerical results and the proofs of Lemmas A.1 and A.2.