The Annals of Statistics

Variable selection in semiparametric regression modeling

Runze Li and Hua Liang

Full-text: Open access


In this paper, we are concerned with how to select significant variables in semiparametric modeling. Variable selection for semiparametric regression models consists of two components: model selection for nonparametric components and selection of significant variables for the parametric portion. Thus, semiparametric variable selection is much more challenging than parametric variable selection (e.g., linear and generalized linear models) because traditional variable selection procedures including stepwise regression and the best subset selection now require separate model selection for the nonparametric components for each submodel. This leads to a very heavy computational burden. In this paper, we propose a class of variable selection procedures for semiparametric regression models using nonconcave penalized likelihood. We establish the rate of convergence of the resulting estimate. With proper choices of penalty functions and regularization parameters, we show the asymptotic normality of the resulting estimate and further demonstrate that the proposed procedures perform as well as an oracle procedure. A semiparametric generalized likelihood ratio test is proposed to select significant variables in the nonparametric component. We investigate the asymptotic behavior of the proposed test and demonstrate that its limiting null distribution follows a chi-square distribution which is independent of the nuisance parameters. Extensive Monte Carlo simulation studies are conducted to examine the finite sample performance of the proposed variable selection procedures.

Article information

Ann. Statist., Volume 36, Number 1 (2008), 261-286.

First available in Project Euclid: 1 February 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G08: Nonparametric regression 62G10: Hypothesis testing
Secondary: 62G20: Asymptotic properties

Local linear regression nonconcave penalized likelihood SCAD varying coefficient models


Li, Runze; Liang, Hua. Variable selection in semiparametric regression modeling. Ann. Statist. 36 (2008), no. 1, 261--286. doi:10.1214/009053607000000604.

Export citation


  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19 716–723.
  • Antoniadis, A. and Fan, J. (2001). Regularization of wavelets approximations. J. Amer. Statist. Assoc. 96 939–967.
  • Breiman, L. (1996). Heuristics of instability and stabilization in model selection. Ann. Statist. 24 2350–2383.
  • Cai, Z., Fan, J. and Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. J. Amer. Statist. Assoc. 95 888–902.
  • Carroll, R. J., Fan, J., Gijbels, I. and Wand, M. P. (1997). Generalized partially linear single-index models. J. Amer. Statist. Assoc. 92 477–489.
  • Craven, P. and Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math. 31 377–403.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Chapman and Hall, New York.
  • Fan, J. and Huang, T. (2005). Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11 1031–1057.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J., Zhang, C. and Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Statist. 29 153–193.
  • Foster, D. and George, E. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
  • Frank, I. E. and Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics 35 109–148.
  • Härdle, W., Liang, H. and Gao, J. T. (2000). Partially Linear Models. Springer, Heidelberg.
  • Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models (with discussion). J. Roy. Statist. Soc. Ser. B 55 757–796.
  • Hunsberger, S. (1994). Semiparametric regression in likelihood-based models. J. Amer. Statist. Assoc. 89 1354–1365.
  • Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617–1642.
  • Li, R. and Liang, H. (2005). Variable selection in semiparametric regression modeling. Available at
  • Mack, Y. P. and Silverman, B. W. (1982). Weak and strong uniform consistency of kernel regression estimates. Z. Wahrsch. Verw. Gebiete 61 405–415.
  • Pollard, D. (1991). Asymptotics for least absolute deviation regression estimators. Econ. Theory 7 186–199.
  • Ruppert, D., Sheather, S. J. and Wand, M. P. (1995). An effective bandwidth selector for local least squares regression. J. Amer. Statist. Assoc. 90 1257–1270.
  • Ruppert, D., Wand, M. and Carroll, R. (2003). Semiparametric Regression. Cambridge Univ. Press.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • Severini, T. A. and Staniswalis, J. G. (1994). Quasilikelihood estimation in semiparametric models. J. Amer. Statist. Assoc. 89 501–511.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Xia, Y., Zhang, W. and Tong, H. (2004). Efficient estimation for semivarying-coefficient models. Biometrika 91 661–681.
  • Yatchew, A. (2003). Semiparametric Regression for the Applied Econometrician. Cambridge Univ. Press.
  • Zhang, W., Lee, S. Y. and Song, X. Y. (2002). Local polynomial fitting in semivarying coefficient model. J. Multivariate Anal. 82 166–188.