Posterior predictive $p$-values do not in general have uniform distributions under the null hypothesis (except in the special case of ancillary test variables) but instead tend to have distributions more concentrated near 0.5. From different perspectives, such nonuniform distributions have been portrayed as desirable (as reflecting an ability of vague prior distributions to nonetheless yield accurate posterior predictions) or undesirable (as making it more difficult to reject a false model). We explore this tension through two simple normal-distribution examples. In one example, we argue that the low power of the posterior predictive check is desirable from a statistical perspective; in the other, the posterior predictive check seems inappropriate. Our conclusion is that the relevance of the $p$-value depends on the applied context, a point which (ironically) can be seen even in these two toy examples.
"Two simple examples for understanding posterior p-values whose distributions are far from uniform." Electron. J. Statist. 7 2595 - 2602, 2013. https://doi.org/10.1214/13-EJS854