Electronic Journal of Statistics

Model selection rates of information based criteria

Ashok Chaurasia and Ofer Harel

Full-text: Open access

Abstract

Model selection criteria proposed over the years have become common procedures in applied research. This article examines the true model selection rates of any model selection criteria; with true model meaning the data generating model. The rate at which model selection criteria select the true model is important because the decision of model selection criteria affects both interpretation and prediction.

This article provides a general functional form for the mean function of the true model selection rates process, for any model selection criteria. Until now, no other article has provided a general form for the mean function of true model selection rate processes. As an illustration of the general form, this article provides the mean function for the true model selection rates of two commonly used model selection criteria, Akaike’s Information Criterion (AIC) and Bayesian Information Criterion (BIC). The simulations reveal deeper insight into properties of consistency and efficiency of AIC and BIC. Furthermore, the methodology proposed here for tracking the mean function of model selection procedures, which is based on accuracy of selection, lends itself for determining sufficient sample size in linear models for reliable inference in model selection.

Article information

Source
Electron. J. Statist., Volume 7 (2013), 2762-2793.

Dates
First available in Project Euclid: 26 November 2013

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1385479509

Digital Object Identifier
doi:10.1214/13-EJS861

Mathematical Reviews number (MathSciNet)
MR3148367

Zentralblatt MATH identifier
1283.62083

Subjects
Primary: Model selection model selection rate AIC BIC discrete process discrete process mean function multiple linear regression linear models generalized linear models

Citation

Chaurasia, Ashok; Harel, Ofer. Model selection rates of information based criteria. Electron. J. Statist. 7 (2013), 2762--2793. doi:10.1214/13-EJS861. https://projecteuclid.org/euclid.ejs/1385479509


Export citation

References

  • Akaike, H. (1974). A new look at statistical model identification., IEEE Transactions on Automatic Control 19 6 716–723.
  • Aron, A. and Aron, E. N. (2003)., Statistics for Psychology. Prentice Hall/Pearson Education.
  • Burnham, K. P. and Anderson, D. R. (2002)., Model Selection and Multimodel Inference: A Practical-Theoretic Approach, 2 ed. Springer, New York.
  • Burnham, K. P. and Anderson, D. R. (2004). Multimodel inference: understanding AIC and BIC in model selection., Sociological Methods and Research 33 261–304.
  • Cavanaugh, J. (1997). Unifying the derivations of the Akaike and corrected Akaike information criteria., Statistics and Probability Letters 33 201–208.
  • Cetin, M. C. and Erar, A. (2002). Variable selection with Akaike information criteria: a comparative study., Hacettepe Journal of Mathematics and Statistics 31 89–97.
  • Claeskens, G. and Hjort, N. L. (2003). The focused information criterion., Journal of the American Statistical Association 98 900–916.
  • Cohen, J. (1988)., Statistical Power Analysis for the Behavioral Sciencies. Routledge.
  • Cohen, J. (1992). A power primer., Psychological Bulletin 112 155.
  • Cohen, J. and Cohen, P. (1975)., Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. Lawrence Erlbaum.
  • Cramér, H. (1946). A contribution to the theory of statistical estimation., Scandinavian Actuarial Journal 1946 85–94.
  • Green, S. B. (1991). How many subjects does it take to do a regression analysis?, Multivariate Behavioral Research 26 499–510.
  • Hannan, E. J. and Quinn, B. G. (1979). The determination of the order of an autoregression., Journal of the Royal Statistical Society. Series B (Methodological) 41 190–195.
  • Hurvich, C. M. and Tsai, C. L. (1989). Regression and time series model selection in small samples., Biometrika 76 297–307.
  • Hurvich, C. M. and Tsai, C. L. (1990). The impact of model selection on inference in linear regression., The American Statistician 44 214–217.
  • Hurvich, C. M. and Tsai, C.-L. (1995). Relative rate of convergence for efficinet model selection criteria in linear regression., Biometrika 82 418–425.
  • Kadane, J. B. and Lazar, N. A. (2004). Methods and criteria for model selection., Journal of the American Statistical Association 99 279–290.
  • Kass, R. E. and Raftery, A. E. (1995). Bayes factor., Journal of the American Statistical Association 90 773–795.
  • Kelley, K. and Maxwell, S. E. (2003). Sample size for multiple regression: obtaining regression coefficients that are accurate, not simply significant., Psychological Methods 8 305.
  • Neath, A. A. and Cavanaugh, J. E. (1997). Regression and time series model selection using variants of the Schwarz information criterion., Communications in Statistics 26 559–580.
  • Nishii, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression., Annals of Statistics 12 758–765.
  • Rao, C. R. (1945). Information and accuracy attainable in the estimation of statistical parameters., Bulletin of Cal. Math. Soc. 37 81–91.
  • Rao, C. R. and Wu, Y. (2001). Model selection., Lecture Notes-Monograph Series 38 1–64.
  • Schwarz, G. E. (1978). Estimating the dimension of a model., Annals of Statistics 6 461–464.
  • Shi, P. and Tsai, C.-L. (2002). Regression model selection—a residual likelihood approach., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 237–252.
  • Shibata, R. (1981). An optimal selection of regression variables., Biometrika 68 45–54.
  • Spiegelhalter, D. J., Best, N., Carlin, B. P. and Van der Linde, A. (1998). Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models Technical Report, Research Report, 98-009.