Institute of Mathematical Statistics Collections

Model selection error rates in nonparametric and parametric model comparisons

Yongsung Joo, Martin T. Wells, and George Casella

Full-text: Open access


Since the introduction of Akaike’s information criteria (AIC) in 1973, numerous information criteria have been developed and widely used in model selection. Many papers concerning the justification of various model selection criteria followed, particularly with respect to model selection error rates (the probability of selecting a wrong model). A model selection criterion is called consistent if the model selection error rate decreases to zero as the sample size increases to infinity. Otherwise, it is inconsistent. In this paper, we explore sufficient consistency conditions for information criteria in the nonparametric (logspline) and parametric model comparison setting, and discuss finite sample model selection error rates.

Chapter information

James O. Berger, T. Tony Cai and Iain M. Johnstone, eds., Borrowing Strength: Theory Powering Applications – A Festschrift for Lawrence D. Brown (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2010), 166-183

First available in Project Euclid: 26 October 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Primary: 62G20: Asymptotic properties
Secondary: 62F99: None of the above, but in this section 62G08: Nonparametric regression

consistent model selection log spline model spline regression nonparametric regression

Copyright © 2010, Institute of Mathematical Statistics


Joo, Yongsung; Wells, Martin T.; Casella, George. Model selection error rates in nonparametric and parametric model comparisons. Borrowing Strength: Theory Powering Applications – A Festschrift for Lawrence D. Brown, 166--183, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2010. doi:10.1214/10-IMSCOLL612.

Export citation


  • [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In the Second International Symposium on Information Theory (B. N. Petrov and F. Czaki, eds.) 267–281. Akademiai Kiado, Budapest.
  • [2] Barron, A. R. and Sheu, C. (1991). Approximation of density functions by sequences of exponential families. Ann. Statist. 19 1347–1369.
  • [3] Barron, A. R., Birge, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
  • [4] Berger, J. and Pericchi, L. (1996). The intrinsic Bayes factor for model selection and prediction. J. Amer. Statist. Assoc. 91 109–122.
  • [5] Bozdogan, H. (1987). Model selection and Akaike’s information criterion: General theory and its analytical extensions. Psychometrika 52 345–370.
  • [6] Casella, G., Giron, F. J. and Moreno, E. (2009). Consistent model selection in regression.Ann. Statist. 37 1207–1228.
  • [7] Crain, B. R. (1974). Estimation of distributions using orthogonal expansions. Ann. Statist. 2 454–463.
  • [8] Crain, B. R. (1976). Exponential models, maximum likelihood estimation and the Haar conditions. J. Amer. Statist. Assoc. 71 737–740.
  • [9] Crain, B. R. (1976). More on estimation of distributions using orthogonal expansions. J. Amer. Statist. Assoc. 71 741–745.
  • [10] Crain, B. R. (1977). An information theoretic approach to approximating a probability distribution. SIAM J. Appl. Math. 32 339–346.
  • [11] Efron, B. (1983). Estimating error rate of a prediction rule: Improvement on cross validation. J. Amer. Statist. Assoc. 78 316–331.
  • [12] Foster, D. and George, E. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
  • [13] Friedman, J. (1991). Multivariate adaptive regression splines. Ann. Statist. 19 1–67.
  • [14] Gelfand, A. E. and Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. J. Roy. Statist. Soc. Ser. B 56 501–514.
  • [15] Haughton, D. (1988). On the choice of a model fit from an exponential family. Ann. Statist. 16 190–195.
  • [16] Haughton, D. (1994). Consistency of a class of information criteria for model selection in non linear regression. Theory Probab. Appl. 37 47–53.
  • [17] Hannan, E. P. and Quinn, B. G. (1979). The determination of the order of an autoregression. J. Roy. Statist. Soc. Ser. B 41 190–195.
  • [18] Hastie, T. (1989). Flexible parsimonious smoothing and additive modeling: Discussion. Technometrics 31 23–29.
  • [19] Hurvich, C., Shumway, R. and Tsai, C. (1990). Improved estimators of Kullback-Leibler information for autoregressive model selection in small samples. Biometrika 77 709–719.
  • [20] Kooperberg, C. and Stone, C. (1991). A study of logspline density estimation. Comput. Statist. Data Anal. 12 327–347.
  • [21] Kooperberg, C. and Stone, C. (1992). Logspline density estimation for censored data. J. Computat. Graph. Statist. 1 301–328.
  • [22] Kooperberg, C., Stone, C. and Truong, Y. K. (1995). Hazard regression. J. Amer. Statist. Assoc. 90 78–94.
  • [23] Leonard, T. (1978). Density estimation, stochastic processes and prior information (with discussion). J. Roy. Statist. Soc. Ser. B 40 113–146.
  • [24] Mallow, C. L. (1973). Some comments on Cp. Technometrics 15 661–675.
  • [25] Moreno, E., Giron, F. J. and Casella, G. (2010). Consistency of objective Bayes factors as the model dimension grows. Ann. Statist. To appear.
  • [26] Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. Ann. Statist. 12 758–765.
  • [27] Potscher, B. M. (1989). Model selection under nonstationarity: Autoregressive models and stochastic linear regression. Ann. Statist. 17 1257–1274.
  • [28] Ruppert, D., Wand, M. P. and Caroll, R. J. (2003). Semiparametric Regression. Cambridge Univ. Press, New York.
  • [29] Shao, P. (1993). Linear model selection by cross-validation. J. Amer. Statist. Assoc. 88 486–494.
  • [30] Shao, P. (1996). Bootstrap model selection. J. Amer. Statist. Assoc. 91 655–665.
  • [31] Shao, P. (1997). An asymptotic theory for linear model selection. Statist. Sinica 7 221–264.
  • [32] Shao, P. and Rao, S. (2000). The GIC for model selection: A hypothesis test approach. J. Statist. Plan. Inf. 88 215–231.
  • [33] Shibata, R. (1976). Selection of the order of an autoregressive model by Akaike’s information criterion. Biometrika 63 114–126.
  • [34] Shibata, R. (1981). An optimal selection of regression variables. Bimetrika 68 45–54.
  • [35] Silverman, B. W. (1982). On the estimation of a probability density function by the maximum penalized likelihood method. Ann. Statist. 10 795–810.
  • [36] Strawderman, R. L. and Tsiatis, A. A. (1996). On the asymptotic properties of a flexible hazard estimator. Ann. Statist. 24 41–63.
  • [37] Stone, C. (1990). Large sample inference for log-Spline models. Ann. Statist. 18 717–741.
  • [38] Stone, C. (1991). Asymptotics for doubly flexible logspline response models. Ann. Statist. 19 1832–1854.
  • [39] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • [40] Yang, Y. (2005). Can the strengths of AIC and BIC be shared? A conflict between model identification and regression estimation. Biometrika 92 937–950.
  • [41] Zheng, X. and Loh, W. (1995). Consistent variable selection in linear models. J. Amer. Statist. Assoc. 90 151–156.