The Annals of Statistics

Bayes factors and the geometry of discrete hierarchical loglinear models

Gérard Letac and Hélène Massam

Full-text: Open access


A standard tool for model selection in a Bayesian framework is the Bayes factor which compares the marginal likelihood of the data under two given different models. In this paper, we consider the class of hierarchical loglinear models for discrete data given under the form of a contingency table with multinomial sampling. We assume that the prior distribution on the loglinear parameters is the Diaconis–Ylvisaker conjugate prior, and the uniform is the prior distribution on the space of models. Under these conditions, the Bayes factor between two models is a function of the normalizing constants of the prior and posterior distribution of the loglinear parameters. These constants are functions of the hyperparameters $(m,\alpha)$ which can be interpreted, respectively, as the marginal counts and total count of a fictive contingency table.

We study the behavior of the Bayes factor when $\alpha$ tends to zero. In this study, the most important tool is the characteristic function $\mathbb{J}_{C}$ of the interior $C$ of the convex hull $\overline{C}$ of the support of the multinomial distribution for a given hierarchical loglinear model. If $h_{C}$ is the support function of $C$, the function $\mathbb{J}_{C}$ is the Laplace transform of $\exp(-h_{C})$. We show that, when $\alpha$ tends to $0$, if the data lies on a face $F_{i}$ of $\overline{C}_{i}$, $i=1,2$, of dimension $k_{i}$, the Bayes factor behaves like $\alpha^{k_{1}-k_{2}}$. This implies in particular that when the data is in $C_{1}$ and in $C_{2}$, that is, when $k_{i}$ equals the dimension of model $J_{i}$, the sparser model is favored, thus confirming the idea of Bayesian regularization.

In order to find the faces of $\overline{C}$, we need to know its facets. We show that since here $C$ is a polytope, the denominator of the rational function $\mathbb{J}_{C}$ is the product of the equations of the facets. We also identify a category of facets common to all hierarchical models for discrete variables, not necessarily binary. Finally, we show that these facets are the only facets of $\overline{C}$ when the model is graphical with respect to a decomposable graph.

Article information

Ann. Statist. Volume 40, Number 2 (2012), 861-890.

First available in Project Euclid: 1 June 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H17: Contingency tables 62C10: Bayesian problems; characterization of Bayes procedures 62J12: Generalized linear models

Bayes sparse contingency tables conjugate priors characteristic function of a convex set Dirichlet distribution


Letac, Gérard; Massam, Hélène. Bayes factors and the geometry of discrete hierarchical loglinear models. Ann. Statist. 40 (2012), no. 2, 861--890. doi:10.1214/12-AOS974.

Export citation


  • [1] Azé, D. and Hiriart-Urruty, J. B. (1994). Analyse Variationnelle et Optimisation. Cépaduès, Toulouse.
  • [2] Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley, Chichester.
  • [3] Barvinok, A. (2002). A Course in Convexity. Graduate Studies in Mathematics 54. Amer. Math. Soc., Providence, RI.
  • [4] Darroch, J. N. and Speed, T. P. (1983). Additive and multiplicative models and interactions. Ann. Statist. 11 724–738.
  • [5] Dawid, A. P. and Lauritzen, S. L. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272–1317.
  • [6] Deza, M. M. and Laurent, M. (1995). Geometry of Cuts and Metrics. Springer, New York.
  • [7] Diaconis, P. and Ylvisaker, D. (1979). Conjugate priors for exponential families. Ann. Statist. 7 269–281.
  • [8] Edwards, D. and Havránek, T. (1985). A fast procedure for model search in multidimensional contingency tables. Biometrika 72 339–351.
  • [9] Eriksson, N., Fienberg, S. E., Rinaldo, A. and Sullivant, S. (2006). Polyhedral conditions for the nonexistence of the MLE for hierarchical log-linear models. J. Symbolic Comput. 41 222–233.
  • [10] Faraut, J. and Korányi, A. (1994). Analysis on Symmetric Cones. Clarendon Press, Oxford.
  • [11] Geiger, D., Meek, C. and Sturmfels, B. (2006). On the toric algebra of graphical models. Ann. Statist. 34 1463–1492.
  • [12] Haughton, D. M. A. (1988). On the choice of a model to fit data from an exponential family. Ann. Statist. 16 342–355.
  • [13] Hoşten, S. and Sullivant, S. (2002). Gröbner bases and polyhedral geometry of reducible and cyclic models. J. Combin. Theory Ser. A 100 277–301.
  • [14] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Clarendon Press, Oxford.
  • [15] Letac, G. and Massam, H. (2012). Supplement to “Bayes factors and the geometry of discrete hierarchical loglinear models.” DOI:10.1214/12-AOS974SUPP.
  • [16] Massam, H., Liu, J. and Dobra, A. (2009). A conjugate prior for discrete hierarchical log-linear models. Ann. Statist. 37 3431–3467.
  • [17] Rinaldo, A. (2006). On maximum likelihood estimation for log-linear models. Technical Report 833, Dept. Statistics, Carnegie Mellon Univ., Pittsburgh, PA.
  • [18] Schwarz, G. (1978). Estimating the dimension of a model. Ann. Statist. 6 461–464.
  • [19] Steck, S. and Jaakkola, T. S. (2002). On the Dirichlet prior and Bayesian regularization. Adv. Neural Inf. Process. Syst. 15.

Supplemental materials

  • Supplementary material: Proofs. This section contains a characterization of the hierarchical loglinear model as well as the statement and proofs of Lemmas 3.1, 3.3 and Theorems 3.1, 4.1 and 5.1.