The Annals of Statistics

Model selection in nonparametric regression

Marten Wegkamp

Full-text: Open access


Model selection using a penalized data-splitting device is studied in the context of nonparametric regression. Finite sample bounds under mild conditions are obtained. The resulting estimates are adaptive for large classes of functions.

Article information

Ann. Statist., Volume 31, Number 1 (2003), 252-273.

First available in Project Euclid: 26 February 2003

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60F05: Central limit and other weak theorems 60F17: Functional limit theorems; invariance principles
Secondary: 60G15: Gaussian processes 62E20: Asymptotic distribution theory

Adaptive estimation classification data-splitting least squares estimation model selection penalized least squares VC-major classes


Wegkamp, Marten. Model selection in nonparametric regression. Ann. Statist. 31 (2003), no. 1, 252--273. doi:10.1214/aos/1046294464.

Export citation


  • ANTHONY, M. and BARTLETT, P. (1999). Neural Network Learning: Theoretical Foundations. Cambridge Univ. Press.
  • BARAUD, Y. (2000). Model selection for regression on a fixed design. Probab. Theory Related Fields 117 467-493.
  • BARRON, A. (1987). Are Bay es rules consistent in information? In Open Problems in Communication and Computation (T. Cover and B. Gopinath, eds.) 85-91. Springer, Berlin.
  • BARRON, A. (1991). Complexity regularization with applications to artificial neural networks. In Nonparametric Functional Estimation and Related Topics (G. Roussas, ed.) 561-576. Kluwer, Dordrecht.
  • BARRON, A., BIRGÉ, L. and MASSART, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301-413.
  • BARTLETT, P., BOUCHERON, S. and LUGOSI, G. (2002). Model selection and error estimation. Machine Learning 48 85-113.
  • DEVROy E, L. and LUGOSI, G. (1996). A universally acceptable smoothing factor for kernel density estimates. Ann. Statist. 24 2499-2512.
  • DEVROy E, L. and LUGOSI, G. (1997). Nonasy mptotic universal smoothing factors, kernel complexity and Yatracos classes. Ann. Statist. 25 2626-2635.
  • EINMAHL, U. and MASON, D. (1996). Some universal results on the behavior of increments of partial sums. Ann. Probab. 24 1388-1407.
  • HENGARTNER, N., WEGKAMP, M. and MATZNER-LØBER, E. (2002). Bandwidth selection for local linear regression smoothers. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 791-804.
  • HENGARTNER, N. and WEGKAMP, M. (1999). A note on model selection procedures in nonparametric classification. Preprint, Dept. Statistics, Yale Univ.
  • HENGARTNER, N. and WEGKAMP, M. (2001). Estimation and selection procedures in regression: The L1-approach. Canad. J. Statist. 29 621-632.
  • IBRAGIMOV, R. and SHARAKHMETOV, SH. (1998). On an exact constant for the Rosenthal inequality. Theory Probab. Appl. 42 294-302.
  • LUGOSI, G. and NOBEL, A. (1999). Adaptive model selection using empirical complexities. Ann. Statist. 27 1830 - 1864.
  • SHAO, J. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486-494.
  • VAN DE GEER, S. (1990). Estimating a regression function. Ann. Statist. 18 907-924.
  • VAN DE GEER, S. (2000). Applications of Empirical Process Theory. Cambridge Univ. Press.
  • VAN DE GEER, S. and WEGKAMP, M (1996). Consistency for the least squares estimator in nonparametric regression. Ann. Statist. 24 2513-2523.
  • VAPNIK, V. (1998). Statistical Learning Theory. Wiley, New York.
  • WEGKAMP, M. H. (1999). Quasi-universal bandwidth selection for kernel density estimators. Canad. J. Statist. 27 409-420
  • YANG, Y. (2001). Adaptive regression by mixing. J. Amer. Statist. Assoc. 96 574-588.
  • YANG, Y. and BARRON, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564 - 1599.