The Annals of Statistics

A conjugate prior for discrete hierarchical log-linear models

Hélène Massam, Jinnan Liu, and Adrian Dobra

Full-text: Open access

Abstract

In Bayesian analysis of multi-way contingency tables, the selection of a prior distribution for either the log-linear parameters or the cell probabilities parameters is a major challenge. In this paper, we define a flexible family of conjugate priors for the wide class of discrete hierarchical log-linear models, which includes the class of graphical models. These priors are defined as the Diaconis–Ylvisaker conjugate priors on the log-linear parameters subject to “baseline constraints” under multinomial sampling. We also derive the induced prior on the cell probabilities and show that the induced prior is a generalization of the hyper Dirichlet prior. We show that this prior has several desirable properties and illustrate its usefulness by identifying the most probable decomposable, graphical and hierarchical log-linear models for a six-way contingency table.

Article information

Source
Ann. Statist. Volume 37, Number 6A (2009), 3431-3467.

Dates
First available in Project Euclid: 17 August 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1250515392

Digital Object Identifier
doi:10.1214/08-AOS669

Mathematical Reviews number (MathSciNet)
MR2549565

Zentralblatt MATH identifier
1369.62048

Subjects
Primary: 62F15: Bayesian inference 62H17: Contingency tables 62E15: Exact distribution theory

Keywords
Hierarchical log-linear models conjugate prior contingency tables hyper Markov property hyper Dirichlet model selection

Citation

Massam, Hélène; Liu, Jinnan; Dobra, Adrian. A conjugate prior for discrete hierarchical log-linear models. Ann. Statist. 37 (2009), no. 6A, 3431--3467. doi:10.1214/08-AOS669. https://projecteuclid.org/euclid.aos/1250515392


Export citation

References

  • [1] Agresti, A. (1990). Categorical Data Analysis. Wiley, New York.
  • [2] Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA.
  • [3] Clyde, M. and George, E. I. (2004). Model uncertainty. Statist. Sci. 19 81–94.
  • [4] Consonni, G. and Leucari, V. (2006). Reference priors for discrete graphical models. Biometrika 93 23–40.
  • [5] Consonni, G. and Veronese, P. (1992). Conjugate priors for exponential families having quadratic variance function. J. Amer. Statist. Assoc. 87 1123–1127.
  • [6] Darroch, J. N., Lauritzen, S. L. and Speed, T. P. (1980). Markov fields and log-linear models for contingency tables. Ann. Statist. 8 522–539.
  • [7] Darroch, J. N. and Speed, T. P. (1983). Additive and multiplicative models and interaction. Ann. Statist. 11 724–738.
  • [8] Dawid, A. P. and Lauritzen, S. L. (1993). Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272–1317.
  • [9] Dellaportas, P. and Forster, J. J. (1999). Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models. Biometrika 86 615–633.
  • [10] Diaconis, P. and Ylvisaker, D. (1979). Conjugate priors for exponential families. Ann. Statist. 7 269–281.
  • [11] Dobra, A., Briollais, L., Jarjanazi, H., Ozcelik, H. and Massam, H. (2009). Applications of the mode oriented stochastic search (MOSS) algorithm for discrete multi-way data to genomewide studies. In Bayesian Modeling in Bioinformatics (D. Dey, S. Ghosh and B. Mallick, eds.). Taylor and Francis. To appear.
  • [12] Dobra, A. and Fienberg, S. E. (2000). Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proc. Natl. Acad. Sci. USA 97 11185–11192.
  • [13] Dobra, A. and Massam, H. (2009). The mode oriented stochastic search (MOSS) for log-linear models with conjugate priors. Statist. Methodol. To appear.
  • [14] Edwards, D. E. and Havranek, T. (1985). A fast procedure for model search in multidimensional contingency tables. Biometrika 72 339–351.
  • [15] Gutierrez-Pena, E. and Smith, A. F. (1995). Conjugate parametrizations for natural exponential families. J. Amer. Statist. Assoc. 90 1347–1356.
  • [16] Hook, E. B., Albright, S. G. and Cross, P. K. (1980). Use of Bernoulli census and log-linear methods for estimating the prevalence of spina bifida in live births and the completeness of vital records reports in New York State. Amer. J. Epidemiology 112 750–758.
  • [17] King, R. and Brooks, S. P. (2001). Prior induction for log-linear models for general contingency table analysis. Ann. Statist. 29 715–747.
  • [18] Knuiman, M. W. and Speed, T. P. (1988). Incorporating prior information into the analysis of contingency tables. Biometrics 44 1061–1071.
  • [19] Lauritzen, S. L. (1996). Graphical Models. Clarendon Press, Oxford.
  • [20] Leucari, V. (2004). Prior distributions for parameters of discrete graphical models. Ph.D. thesis, Dept. Mathematics, Univ. Pavia.
  • [21] Madigan, D. and Raftery, A. E. (1994). Model selection and accounting for model uncertainty in graphical models using Occam’s window. J. Amer. Statist. Assoc. 89 1535–1546.
  • [22] Madigan, D. and York, J. (1997). Bayesian methods for estimation of the size of a closed population. Biometrika 84 19–31.
  • [23] Madigan, D., York, J. and Allard, D. (1995). Bayesian graphical models for discrete data. Int. Statist. Rev. 63 215–232.
  • [24] Mantel, N. (1966). Models for complex contingency tables and polychotomous dosage response curves. Biometrics 22 83–95.
  • [25] Perks, W. (1947). Some observations on inverse probability including a new indifference rule. J. Inst. Actuar. 73 285–334.
  • [26] Steck, H. and Jaakkola, T. (2002). On the Dirichlet prior and Bayesian regularization. In Advances in Neural Information Processing Systems (NIPS) (S. Becker, S. Thrun and K. Obermayer, eds.) 697–704. MIT Press, Cambridge, MA.
  • [27] Tarantola, C. (2004). MCMC model determination for discrete graphical models. Stat. Model. 4 39–61.
  • [28] Tierney, L. and Kadane, J. (1986). Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81 82–86.
  • [29] Wermuth, N. and Cox, D. R. (1992). On the relation between interactions obtained with alternatives codings of discrete variables. Methodika 6 76–85.
  • [30] Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics. Wiley, Chichester.