International Statistical Review

A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-fit Testing

Andrew Gelman

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Exploratory data analysis (EDA) and Bayesian inference (or, more generally, complex statistical modeling)---which are generally considered as unrelated statistical paradigms---can be particularly effective in combination. In this paper, we present a Bayesian framework for EDA based on posterior predictive checks. We explain how posterior predictive simulations can be used to create reference distributions for EDA graphs, and how this approach resolves some theoretical problems in Bayesian data analysis. We show how the generalization of Bayesian inference to include replicated data $y^{\rm rep}$ and replicated parameters $\theta^{\rm rep}$ follows a long tradition of generalizations in Bayesian theory.

On the theoretical level, we present a predictive Bayesian formulation of goodness-of-fit testing, distinguishing between $p$-values (posterior probabilities that specified antisymmetric discrepancy measures will exceed 0) and $u$-values (data summaries with uniform sampling distributions). We explain that $p$-values, unlike $u$-values, are Bayesian probability statements in that they condition on observed data.

Having reviewed the general theoretical framework, we discuss the implications for statistical graphics and exploratory data analysis, with the goal being to unify exploratory data analysis with more formal statistical methods based on probability models. We interpret various graphical displays as posterior predictive checks and discuss how Bayesian inference can be used to determine reference distributions.

The goal of this work is not to downgrade descriptive statistics, or to suggest they be replaced by Bayesian modeling, but rather to suggest how exploratory data analysis fits into the probability-modeling paradigm.

We conclude with a discussion of the implications for practical Bayesian inference. In particular, we anticipate that Bayesian software can be generalized to draw simulations of replicated data and parameters from their posterior predictive distribution, and these can in turn be used to calibrate EDA graphs.

footnote: Based on a paper presented at the Seventh Valencia Meeting on Bayesian Statistics.

Article information

Internat. Statist. Rev., Volume 71, Number 2 (2003), 369-382.

First available in Project Euclid: 18 November 2003

Permanent link to this document

Zentralblatt MATH identifier

Bootstrap Fisher's exact test Graphics Mixture model Model checking Multiple imputation Prior predictive check Posterior predictive check $p$-value $u$-value


Gelman, Andrew. A Bayesian Formulation of Exploratory Data Analysis and Goodness-of-fit Testing. Internat. Statist. Rev. 71 (2003), no. 2, 369--382.

Export citation