The Annals of Statistics

Consistent covariate selection and post model selection inference in semiparametric regression

Florentina Bunea

Full-text: Open access


This paper presents a model selection technique of estimation in semiparametric regression models of the type $Y_{i}={\beta}^{\prime}\underbar{X}_{i}+f(T_{i})+W_{i}$ , i=1,…,n. The parametric and nonparametric components are estimated simultaneously by this procedure. Estimation is based on a collection of finite-dimensional models, using a penalized least squares criterion for selection. We show that by tailoring the penalty terms developed for nonparametric regression to semiparametric models, we can consistently estimate the subset of nonzero coefficients of the linear part. Moreover, the selected estimator of the linear component is asymptotically normal.

Article information

Ann. Statist., Volume 32, Number 3 (2004), 898-927.

First available in Project Euclid: 24 May 2004

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G05: Estimation 62F99: None of the above, but in this section
Secondary: 62G08: Nonparametric regression 62J02: General nonlinear regression

Semiparametric regression consistent covariate selection post model selection inference penalized least squares oracle inequalities


Bunea, Florentina. Consistent covariate selection and post model selection inference in semiparametric regression. Ann. Statist. 32 (2004), no. 3, 898--927. doi:10.1214/009053604000000247.

Export citation


  • Akaike, H. (1974). A new look at the statistical model identification. IEEE Trans. Automat. Control 19 716--723.
  • Baraud, Y. (2000). Model selection for regression on a fixed design. Probab. Theory Related Fields 117 467--493.
  • Baraud, Y. (2002). Model selection for regression on a random design. ESAIM Probab. Statist. 6 127--146.
  • Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301--413.
  • Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. John Hopkins Univ. Press.
  • Birgé, L. and Massart, P. (1998). Minimum contrast estimators on sieves: Exponential bounds and rates of convergence. Bernoulli 4 329--375.
  • Birgé, L. and Massart, P. (2000). An adaptive compression algorithm in Besov spaces. Constr. Approx. 16 1--36.
  • Chen, H. (1988). Convergence rates for parametric components in a partly linear model. Ann. Statist. 16 136--146.
  • Chen, H. and Chen, K.-W. (1991). Selection of the splined variables and convergence rates in a partial spline model. Canad. J. Statist. 19 323--339.
  • DeVore, R. A. and Lorentz, G. G. (1993). Constructive Approximation. Springer, Berlin.
  • Engle, R., Granger, C., Rice, J. and Weiss, A. (1986). Semiparametric estimation of the relation between weather and electricity sales. J. Amer. Statist. Assoc. 81 310--320.
  • Green, P., Jennison, C. and Seheult, A. (1985). Analysis of field experiments by least squares smoothing. J. Roy. Statist. Soc. Ser. B 47 299--315.
  • Guyon, X. and Yao, J. (1999). On the underfitting and overfitting sets of models chosen by order selection criteria. J. Multivariate Anal. 70 221--249.
  • Härdle, G., Liang, H. and Gao, J. (2000). Partially Linear Models. Physica, Heidelburg.
  • Haughton, D. (1988). On the choice of a model to fit data from an exponential family. Ann. Statist. 16 342--355.
  • Heckman, N. E. (1986). Spline smoothing in a partly linear model. J. Roy. Statist. Soc. Ser. B 48 244--248.
  • Mallows, C. L. (1973). Some comments on $C_p$. Technometrics 15 661--675.
  • Petrov, V. V. (1995). Limit Theorems of Probability Theory. Sequences of Independent Random Variables. Oxford Univ. Press.
  • Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461--464.
  • Seber, G. A. F. (1977). Linear Regression Analysis. Wiley, New York.
  • Shibata, R. (1981). An optimal selection of regression variables. Biometrika 68 45--54.
  • Speckman, P. (1988). Kernel smoothing in partial linear models. J. Roy. Statist. Soc. Ser. B 50 413--436.
  • van der Geer, S. (2000). Applications of Empirical Process Theory. Cambridge Univ. Press.
  • Wahba, G. (1984). Partial spline models for the semiparametric estimation of functions of several variables. In Statistical Analysis of Time Series 319--329. Inst. Statist. Math., Tokyo.
  • Wegkamp, M. (2003). Model selection in nonparametric regression. Ann. Statist. 31 252--273.
  • Woodroofe, M. (1982). On model selection and the arcsine laws. Ann. Statist. 10 1182--1194.