Source: Ann. Statist. Volume 32, Number 6
(2004), 2361-2384.
This article describes an extension of classical χ2 goodness-of-fit tests to Bayesian model assessment. The extension, which essentially involves evaluating Pearson’s goodness-of-fit statistic at a parameter value drawn from its posterior distribution, has the important property that it is asymptotically distributed as a χ2 random variable on K−1 degrees of freedom, independently of the dimension of the underlying parameter vector. By examining the posterior distribution of this statistic, global goodness-of-fit diagnostics are obtained. Advantages of these diagnostics include ease of interpretation, computational convenience and favorable power properties. The proposed diagnostics can be used to assess the adequacy of a broad class of Bayesian models, essentially requiring only a finite-dimensional parameter vector and conditionally independent observations.
References
Bayarri, M. J. and Berger, J. O. (2000). $P$ values for composite null models (with discussion). J. Amer. Statist. Assoc. 95 1127--1142, 1157--1170.
Berger, J. O. and Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and prediction. J. Amer. Statist. Assoc. 91 109--122.
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192--236.
Mathematical Reviews (MathSciNet):
MR373208
Best, D. J. and Rayner, J. C. W. (1981). Are two classes enough for the $\chi^2$ goodness of fit test. Statist. Neerlandica 35 157--163.
Chen, C. F. (1985). On asymptotic normality of limiting density functions with Bayesian implications. J. Roy. Statist. Soc. Ser. B 47 540--546.
Mathematical Reviews (MathSciNet):
MR844485
Chernoff, H. and Lehmann, E. L. (1954). The use of maximum likelihood estimates in $\chi^2$ tests for goodness of fit. Ann. Math. Statist. 25 579--586.
Mathematical Reviews (MathSciNet):
MR65109
Clayton, D. G. and Kaldor, J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics 43 671--681.
Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman and Hall, London.
Mathematical Reviews (MathSciNet):
MR370837
Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press.
Mathematical Reviews (MathSciNet):
MR16588
Dahiya, R. C. and Gurland, J. (1973). How many classes in the Pearson chi-square test? J. Amer. Statist. Assoc. 68 707--712.
Mathematical Reviews (MathSciNet):
MR365835
de la Horra, J. and Rodríguez-Bernal, M. T. (1997). Asymptotic behavior of the posterior predictive $P$-value. Comm. Statist. Theory Methods 26 2689--2699.
Dey, D. K., Gelfand, A. E., Swartz, T. B. and Vlachos, P. K. (1998). A simulation-intensive approach for checking hierarchical models. Test 7 325--346.
Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data, 2nd ed. MIT Press.
Mathematical Reviews (MathSciNet):
MR623082
Gelfand, A. E. (1996). Model determination using sampling-based methods. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.) 145--162. Chapman and Hall, London.
Gelman, A. and Meng, X.-L. (1996). Model checking and model improvement. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.) 189--202. Chapman and Hall, London.
Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statist. Sinica 6 733--807.
Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. J. Roy. Statist. Soc. Ser. B 29 83--100.
Mathematical Reviews (MathSciNet):
MR216699
Gvanceladze, L. G. and Chibisov, D. M. (1979). On tests of fit based on grouped data. In Contributions to Statistics: Jaroslav Hájek Memorial Volume (J. Jurecková, ed.) 79--89. Academia, Prague.
Mathematical Reviews (MathSciNet):
MR561261
Hamdan, M. (1963). The number and width of classes in the chi-square test. J. Amer. Statist. Assoc. 58 678--689.
Mathematical Reviews (MathSciNet):
MR156398
Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143 29--36.
Kallenberg, W. C. M., Oosterhoff, J. and Schriever, B. F. (1985). The number of classes in chi-squared goodness-of-fit tests. J. Amer. Statist. Assoc. 80 959--968.
Mathematical Reviews (MathSciNet):
MR819601
Koehler, K. J. and Gan, F. F. (1990). Chi-squared goodness-of-fit tests: Cell selection and power. Comm. Statist. Simulation Comput. 19 1265--1278.
Mann, H. B. and Wald, A. (1942). On the choice of the number of class intervals in the application of the chi-square test. Ann. Math. Statist. 13 306--317.
Mathematical Reviews (MathSciNet):
MR7224
Moore, D. S. and Spruill, M. C. (1975). Unified large-sample theory of general chi-squared statistics for tests of fit. Ann. Statist. 3 599--616.
Mathematical Reviews (MathSciNet):
MR375569
O'Hagan, A. (1995). Fractional Bayes factors for model comparison (with discussion). J. Roy. Statist. Soc. Ser. B 57 99--138.
Olver, F. W. J. (1974). Asymptotics and Special Functions. Academic Press, New York.
Mathematical Reviews (MathSciNet):
MR435697
Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine 50 157--175.
Quine, M. P. and Robinson, J. (1985). Efficiencies of chi-square and likelihood ratio goodness-of-fit tests. Ann. Statist. 13 727--742.
Mathematical Reviews (MathSciNet):
MR790568
Raftery, A. E. and Lewis, S. (1992). How many iterations in the Gibbs sampler? In Bayesian Statistics 4 (J. M. Bernardo, J. Berger, A. P. Dawid and A. F. M. Smith, eds.) 763--773. Oxford Univ. Press.
Robert, C. P. and Rousseau, J. (2002). A mixture approach to Bayesian goodness of fit. Preprint.
Robins, J. M., van der Vaart, A. and Ventura, V. (2000). Asymptotic distribution of $P$ values in composite null models (with discussion). J. Amer. Statist. Assoc. 95 1143--1167, 1171--1172.
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151--1172.
Mathematical Reviews (MathSciNet):
MR760681
Sinharay, S. and Stern, H. S. (2003). Posterior predictive model checking in hierarchical models. J. Statist. Plann. Inference 111 209--221.
Spiegelhalter, D., Best, N., Carlin, B. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583--639.
Spiegelhalter, D., Thomas, A. and Best, N. (2000). WinBUGS Version 1.3 Users Manual. Medical Research Council Biostatistics Unit, Cambridge. Available at www.mrc-bsu.cam.ac.uk/bugs.
Verdinelli, I. and Wasserman, L. (1998). Bayesian goodness-of-fit testing using infinite-dimensional exponential families. Ann. Statist. 26 1215--1241.
Watson, G. S. (1957). The $\chi^2$ goodness-of-fit test for normal distributions. Biometrika 44 336--348.
Mathematical Reviews (MathSciNet):
MR90951
Williams, C. (1950). On the choice of the number and width of classes for the chi-square test of goodness-of-fit. J. Amer. Statist. Assoc. 45 77--86.