Annals of Applied Statistics

Variable selection for a categorical varying-coefficient model with identifications for determinants of body mass index

Jiti Gao, Bin Peng, Zhao Ren, and Xiaohui Zhang

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Obesity has become one of the major public health issues during the last three decades. A considerable number of determinants have been proposed for body mass index (BMI) by a large range of studies from multiple disciplines. In addition, it is well documented that impacts of these determinants are varying across demographic groups. However, little is known about the relative importance of these potential determinants and the varying impacts of all relatively important determinants. Using the shrinkage estimation technique, we propose a variable selection procedure for the categorical varying-coefficient model. We present a simulation study to exam performance of our method in different scenarios. We further apply the proposed method to examine the impacts of a large number of potential determinants on BMI using data from the 2013 National Health Interview Survey in the United States. By our method, the relevant determinants of BMI are identified through the variable selection procedure; and their varying impacts across demographic groups are quantified through the post-selection estimation.

Article information

Ann. Appl. Stat., Volume 11, Number 2 (2017), 1117-1145.

Received: November 2016
Revised: February 2017
First available in Project Euclid: 20 July 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Body mass index obesity optimal variable selection varying-coefficient regression


Gao, Jiti; Peng, Bin; Ren, Zhao; Zhang, Xiaohui. Variable selection for a categorical varying-coefficient model with identifications for determinants of body mass index. Ann. Appl. Stat. 11 (2017), no. 2, 1117--1145. doi:10.1214/17-AOAS1039.

Export citation


  • Aitchison, J. and Aitken, C. G. G. (1976). Multivariate binary discrimination by the kernel method. Biometrika 63 413–420.
  • Ali, S. M. and Lindström, M. (2006). Socioeconomic, psychosocial, behavioural, and psychological determinants of BMI among young women: Differing patterns for underweight and overweight/obesity. Eur. J. Public Health 16 324–330.
  • Bühlmann, P. and Mandozzi, J. (2014). High-dimensional variable screening and bias in subsequent inference, with an empirical comparison. Comput. Statist. 29 407–430.
  • Carey, M., Small, H., Yoong, S. L., Boyes, A., Bisquera, A. and Sanson-Fisher, R. (2014). Prevalence of comorbid depression and obesity in general practice: A cross-sectional survey. Br. J. Gen. Pract. 64 e122–e127.
  • Cawley, J. (2011). The Oxford Handbook of the Social Science of Obesity. Oxford Univ. Press, Oxford.
  • Cawley, J. and Scholder, S. v. H. K. (2013). The demand for cigarettes as derived from the demand for weight control. Technical Report, National Bureau of Economic Research.
  • Chu, W., Li, R. and Reimherr, M. (2016). Feature screening for time-varying coefficient models with ultrahigh-dimensional longitudinal data. Ann. Appl. Stat. 10 596–617.
  • Cohen, A. K., Rai, M., Rehkopf, D. H. and Abrams, B. (2013). Educational attainment and obesity: A systematic review. Obes. Rev. 14 989–1005.
  • Colditz, G. A., Giovannucci, E., Rimm, E. B., Stampfer, M. J., Rosner, B., Speizer, F. E., Gordis, E. and Willett, W. C. (1991). Alcohol intake in relation to diet and obesity in women and men. Am. J. Clin. Nutr. 54 49–55.
  • Dezeure, R., Bühlmann, P., Meier, L. and Meinshausen, N. (2015). High-dimensional inference: Confidence intervals, $p$-values and R-software hdi. Statist. Sci. 30 533–558.
  • Faith, M. S., Butryn, M., Wadden, T. A., Fabricatore, A., Nguyen, A. M. and Heymsfield, S. B. (2011). Evidence for prospective associations among depression and obesity in population-based studies. Obes. Rev. 12 e438–e453.
  • Fan, J. and Zhang, W. (1999). Statistical estimation in varying coefficient models. Ann. Statist. 27 1491–1518.
  • Fontaine, K. R., Redden, D. T., Wang, C., Westfall, A. O. and Allison, D. B. (2003). Years of life lost due to obesity. J. Amer. Medical Assoc. 289 187–193.
  • Galani, C. and Schneider, H. (2007). Prevention and treatment of obesity with lifestyle interventions: Review and meta-analysis. Int. J. Public Health 52 348–359.
  • Gao, J., Peng, B., Ren, Z. and Zhang, X. (2017). Supplement to “Variable selection for a categorical varying-coefficient model with identifications for determinants of body mass index.” DOI:10.1214/17-AOAS1039SUPP.
  • Gertheiss, J. and Tutz, G. (2010). Sparse modeling of categorial explanatory variables. Ann. Appl. Stat. 4 2150–2180.
  • Hall, P., Li, Q. and Racine, J. S. (2007). Nonparametric estimation of regression functions in the presence of irrelevant regressors. Rev. Econ. Stat. 89 784–789.
  • Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models. J. Roy. Statist. Soc. Ser. B 55 757–796.
  • Hill, J. O. and Peters, J. C. (1998). Environmental contributions to the obesity epidemic. Science 280 1371–1374.
  • Huang, J., Ma, S., Xie, H. and Zhang, C.-H. (2009). A group bridge approach for variable selection. Biometrika 96 339–355.
  • Koenker, R. (2005). Quantile Regression. Econometric Society Monographs 38. Cambridge Univ. Press, Cambridge.
  • Li, Q., Ouyang, D. and Racine, J. S. (2013). Categorical semiparametric varying-coefficient models. J. Appl. Econometrics 28 551–579.
  • Li, Q. and Racine, J. S. (2010). Smooth varying-coefficient estimation and inference for qualitative and quantitative data. Econometric Theory 26 1607–1637.
  • Lipowicz, A., Gronkiewicz, S. and Malina, R. M. (2002). Body mass index, overweight and obesity in married and never married men and women in Poland. Am. J. Human Biol. 14 468–475.
  • Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164–2204.
  • Ma, S., Carroll, R. J., Liang, H. and Xu, S. (2015). Estimation and inference in generalized additive coefficient models for nonlinear interactions with high-dimensional covariates. Ann. Statist. 43 2102–2131.
  • Oza-Frank, R. and Cunningham, S. A. (2010). The weight of US residence among immigrants: A systematic review. Obesity Reviews 11 271–280.
  • Rehkopf, D. H., Laraia, B. A., Segal, M., Braithwaite, D. and Epel, L. (2011). The relative importance of predictors of body mass index change, overweight and obesity in adolescent girls. Int. J. Pediatr. Obes. 6 233–242.
  • Sobal, J., Rauschenbach, B. S. and Frongillo, E. A. (1992). Marital status, fatness and obesity. Soc. Sci. Med. 35 915–923.
  • Stice, E., Shaw, H. and Marti, C. N. (2006). A meta-analytic review of obesity prevention programs for children and adolescents: The skinny on interventions that work. Psychol. Bull. 132 667–691.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 91–108.
  • Von Kries, R., Toschke, A. M., Koletzko, B. and Slikker, W. (2002). Maternal smoking during pregnancy and childhood obesity. Am. J. Epidemiol. 156 954–961.
  • Wang, H. and Leng, C. (2007). Unified LASSO estimation by least squares approximation. J. Amer. Statist. Assoc. 102 1039–1048.
  • Wang, L., Li, H. and Huang, J. Z. (2008). Variable selection in nonparametric varying-coefficient models for analysis of repeated measurements. J. Amer. Statist. Assoc. 103 1556–1569.
  • Wang, H. and Xia, Y. (2009). Shrinkage estimation of the varying coefficient model. J. Amer. Statist. Assoc. 104 747–757.
  • WHO (2015). Obesity and overweight Fact Sheet No. 311, Working paper. Available at
  • Yu, Y. (2012). Educational differences in obesity in the United States: A closer look at the trends. Obes. 20 904–908.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
  • Zeng, W., Eisenberg, D. T., Jovel, K. R., Undurraga, E. A., Nyberg, C., Tanner, S., Reyes-García, V., Leonard, W. R., Castano, J., Huanca, T. et al. (2013). Adult obesity: Panel study from native Amazonians. Econ. Hum. Biol. 11 227–235.
  • Zhang, Q. and Wang, Y. (2004). Socioeconomic inequality of obesity in the United States: Do gender, age, and ethnicity matter? Soc. Sci. Med. 58 1171–1180.
  • Zhao, W., Zhang, R. and Liu, J. (2014). Regularization and model selection for quantile varying coefficient model with categorical effect modifiers. Comput. Statist. Data Anal. 79 44–62.

Supplemental materials

  • Supplement to “Variable selection for a categorical varying-coefficient model with identifications for determinants of body mass index”. In this supplementary file, we provide a detailed presentation and discussion on (1) mathematical proofs of the main results, (2) estimation procedure of our method, (3) extra simulation results, and (4) other estimation results from the BMI study.