Annals of Statistics

A Bayesian χ2 test for goodness-of-fit

Valen E. Johnson

Full-text: Open access


This article describes an extension of classical χ2 goodness-of-fit tests to Bayesian model assessment. The extension, which essentially involves evaluating Pearson’s goodness-of-fit statistic at a parameter value drawn from its posterior distribution, has the important property that it is asymptotically distributed as a χ2 random variable on K−1 degrees of freedom, independently of the dimension of the underlying parameter vector. By examining the posterior distribution of this statistic, global goodness-of-fit diagnostics are obtained. Advantages of these diagnostics include ease of interpretation, computational convenience and favorable power properties. The proposed diagnostics can be used to assess the adequacy of a broad class of Bayesian models, essentially requiring only a finite-dimensional parameter vector and conditionally independent observations.

Article information

Ann. Statist., Volume 32, Number 6 (2004), 2361-2384.

First available in Project Euclid: 7 February 2005

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C10: Bayesian problems; characterization of Bayes procedures
Secondary: 62E20: Asymptotic distribution theory

Bayesian model assessment Pearson’s chi-squared statistic posterior-predictive diagnostics, p-value Bayes factor intrinsic Bayes factor discrepancy functions


Johnson, Valen E. A Bayesian χ 2 test for goodness-of-fit. Ann. Statist. 32 (2004), no. 6, 2361--2384. doi:10.1214/009053604000000616.

Export citation


  • Bayarri, M. J. and Berger, J. O. (2000). $P$ values for composite null models (with discussion). J. Amer. Statist. Assoc. 95 1127–1142, 1157–1170.
  • Berger, J. O. and Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and prediction. J. Amer. Statist. Assoc. 91 109–122.
  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192–236.
  • Best, D. J. and Rayner, J. C. W. (1981). Are two classes enough for the $\chi^2$ goodness of fit test. Statist. Neerlandica 35 157–163.
  • Chen, C. F. (1985). On asymptotic normality of limiting density functions with Bayesian implications. J. Roy. Statist. Soc. Ser. B 47 540–546.
  • Chernoff, H. and Lehmann, E. L. (1954). The use of maximum likelihood estimates in $\chi^2$ tests for goodness of fit. Ann. Math. Statist. 25 579–586.
  • Clayton, D. G. and Kaldor, J. (1987). Empirical Bayes estimates of age-standardized relative risks for use in disease mapping. Biometrics 43 671–681.
  • Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman and Hall, London.
  • Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press.
  • Dahiya, R. C. and Gurland, J. (1973). How many classes in the Pearson chi-square test? J. Amer. Statist. Assoc. 68 707–712.
  • de la Horra, J. and Rodríguez-Bernal, M. T. (1997). Asymptotic behavior of the posterior predictive $P$-value. Comm. Statist. Theory Methods 26 2689–2699.
  • Dey, D. K., Gelfand, A. E., Swartz, T. B. and Vlachos, P. K. (1998). A simulation-intensive approach for checking hierarchical models. Test 7 325–346.
  • Fienberg, S. E. (1980). The Analysis of Cross-Classified Categorical Data, 2nd ed. MIT Press.
  • Gelfand, A. E. (1996). Model determination using sampling-based methods. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.) 145–162. Chapman and Hall, London.
  • Gelman, A. and Meng, X.-L. (1996). Model checking and model improvement. In Markov Chain Monte Carlo in Practice (W. R. Gilks, S. Richardson and D. J. Spiegelhalter, eds.) 189–202. Chapman and Hall, London.
  • Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statist. Sinica 6 733–807.
  • Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. J. Roy. Statist. Soc. Ser. B 29 83–100.
  • Gvanceladze, L. G. and Chibisov, D. M. (1979). On tests of fit based on grouped data. In Contributions to Statistics: Jaroslav Hájek Memorial Volume (J. Jurecková, ed.) 79–89. Academia, Prague.\goodbreak
  • Hamdan, M. (1963). The number and width of classes in the chi-square test. J. Amer. Statist. Assoc. 58 678–689.
  • Hanley, J. A. and McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143 29–36.
  • Kallenberg, W. C. M., Oosterhoff, J. and Schriever, B. F. (1985). The number of classes in chi-squared goodness-of-fit tests. J. Amer. Statist. Assoc. 80 959–968.
  • Koehler, K. J. and Gan, F. F. (1990). Chi-squared goodness-of-fit tests: Cell selection and power. Comm. Statist. Simulation Comput. 19 1265–1278.
  • Mann, H. B. and Wald, A. (1942). On the choice of the number of class intervals in the application of the chi-square test. Ann. Math. Statist. 13 306–317.
  • Moore, D. S. and Spruill, M. C. (1975). Unified large-sample theory of general chi-squared statistics for tests of fit. Ann. Statist. 3 599–616.
  • O'Hagan, A. (1995). Fractional Bayes factors for model comparison (with discussion). J. Roy. Statist. Soc. Ser. B 57 99–138.
  • Olver, F. W. J. (1974). Asymptotics and Special Functions. Academic Press, New York.
  • Pearson, K. (1900). On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philosophical Magazine 50 157–175.
  • Quine, M. P. and Robinson, J. (1985). Efficiencies of chi-square and likelihood ratio goodness-of-fit tests. Ann. Statist. 13 727–742.
  • Raftery, A. E. and Lewis, S. (1992). How many iterations in the Gibbs sampler? In Bayesian Statistics 4 (J. M. Bernardo, J. Berger, A. P. Dawid and A. F. M. Smith, eds.) 763–773. Oxford Univ. Press.
  • Robert, C. P. and Rousseau, J. (2002). A mixture approach to Bayesian goodness of fit. Preprint.
  • Robins, J. M., van der Vaart, A. and Ventura, V. (2000). Asymptotic distribution of $P$ values in composite null models (with discussion). J. Amer. Statist. Assoc. 95 1143–1167, 1171–1172.
  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
  • Sinharay, S. and Stern, H. S. (2003). Posterior predictive model checking in hierarchical models. J. Statist. Plann. Inference 111 209–221.
  • Spiegelhalter, D., Best, N., Carlin, B. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583–639.
  • Spiegelhalter, D., Thomas, A. and Best, N. (2000). WinBUGS Version 1.3 Users Manual. Medical Research Council Biostatistics Unit, Cambridge. Available at
  • Verdinelli, I. and Wasserman, L. (1998). Bayesian goodness-of-fit testing using infinite-dimensional exponential families. Ann. Statist. 26 1215–1241.
  • Watson, G. S. (1957). The $\chi^2$ goodness-of-fit test for normal distributions. Biometrika 44 336–348.
  • Williams, C. (1950). On the choice of the number and width of classes for the chi-square test of goodness-of-fit. J. Amer. Statist. Assoc. 45 77–86.