Statistical Science

Model Assessment Tools for a Model False World

Bruce Lindsay and Jiawei Liu

Full-text: Open access


A standard goal of model evaluation and selection is to find a model that approximates the truth well while at the same time is as parsimonious as possible. In this paper we emphasize the point of view that the models under consideration are almost always false, if viewed realistically, and so we should analyze model adequacy from that point of view. We investigate this issue in large samples by looking at a model credibility index, which is designed to serve as a one-number summary measure of model adequacy. We define the index to be the maximum sample size at which samples from the model and those from the true data generating mechanism are nearly indistinguishable. We use standard notions from hypothesis testing to make this definition precise. We use data subsampling to estimate the index. We show that the definition leads us to some new ways of viewing models as flawed but useful. The concept is an extension of the work of Davies [Statist. Neerlandica 49 (1995) 185–245].

Article information

Statist. Sci., Volume 24, Number 3 (2009), 303-318.

First available in Project Euclid: 31 March 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Model selection statistical distance bootstrap model credibility index normality


Lindsay, Bruce; Liu, Jiawei. Model Assessment Tools for a Model False World. Statist. Sci. 24 (2009), no. 3, 303--318. doi:10.1214/09-STS302.

Export citation


  • Agresti, A. (2002). Categorical Data Analysis, 2nd ed. Wiley, New York.
  • Berkson, J. (1938). Some difficulties of interpretation encounted in the application of the chi-square test. J. Amer. Statist. Assoc. 33 526–536.
  • Blom, G. (1976). Some properties of incomplete U-statistics. Biometrika 63 573–580.
  • Box, G. E. P. (1976). Science and statistics. J. Amer. Statist. Assoc. 71 791–799.
  • Claeskens, G. and Hjort, N. L. (2003). The focused information criterion. With discussions and a rejoinder by the authors. J. Amer. Statist. Assoc. 98 900–945.
  • Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press, Princeton, NJ.
  • Davies, R. B. (2002). Hypothesis testing when a nuisance parameter is present only under the alternative: Linear model case. Biometrika 89 484–489.
  • Davies, P. L. (1995). Data features. Statist. Neerlandica 49 185–245.
  • Dette, H. and Munk, A. (2003). Some methodological aspects of validation of models in nonparametric regression. Statist. Neerlandica 57 207–244.
  • Diaconis, P. and Efron, B. (1985). Reply to comments on “Testing for independence in a two-way table: New interpretations of the chi-square statistic.” Ann. Statist. 13 905–913.
  • Donoho, D. L. (1988). One-sided inference about functionals of a density. Ann. Statist. 16 1390–1420.
  • Ferguson, T. S. (1996). A Course in Large Sample Theory. Texts in Statistical Science Series. Chapman and Hall/CRC Press, Boca Raton, FL.
  • Freitag, G. and Munk, A. (2005). On Hadamard differentiability in k-sample semiparametric models—with applications to the assessment of structural relationships. J. Multivariate Anal. 94 123–158.
  • Ghosh, J. K. and Samanta, T. (2001). Model selection—An overview. Current Sci. 80 1135–1144.
  • Goutis, C. and Robert, C. P. (1998). Model choice in generalised linear models: A Bayesian approach via Kullback–Leibler projections. Biometrika 85 29–37.
  • Hodges, J. L. and Lehmann, E. L. (1954). Testing the approximate validity of statistical hypotheses. J. Roy. Statist. Soc. Ser. B 16 261–268.
  • Lehmann, E. L. (1999). Elements of Large-Sample Theory. Springer, New York.
  • Liu, J. and Lindsay, B. G. (2009). Building and using semiparametric tolerance regions for parametric multinomial models. Ann. Statist. 37 3644–3659.
  • Politis, D. N., Romano, J. P. and Wolf, M. (1999). Subsampling. Springer, New York.
  • Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York.
  • Shapiro, S. S. and Wilk, M. B. (1965). An analysis of variance test for normality: Complete samples. Biometrika 52 591–611.
  • Shapiro, S. S., Wilk, M. B. and Chen, H. J. (1965). A comparative study of various tests for normality. J. Amer. Statist. Assoc. 63 1343–1373.
  • Snee, R. D. (1974). Graphical display of two-way contingency tables. Amer. Statist. 28 9–12.