Bayesian Analysis

Optimal Gaussian Approximations to the Posterior for Log-Linear Models with Diaconis–Ylvisaker Priors

James Johndrow and Anirban Bhattacharya

Full-text: Open access


In contingency table analysis, sparse data is frequently encountered for even modest numbers of variables, resulting in non-existence of maximum likelihood estimates. A common solution is to obtain regularized estimates of the parameters of a log-linear model. Bayesian methods provide a coherent approach to regularization, but are often computationally intensive. Conjugate priors ease computational demands, but the conjugate Diaconis–Ylvisaker priors for the parameters of log-linear models do not give rise to closed form credible regions, complicating posterior inference. Here we derive the optimal Gaussian approximation to the posterior for log-linear models with Diaconis–Ylvisaker priors, and provide convergence rate and finite-sample bounds for the Kullback–Leibler divergence between the exact posterior and the optimal Gaussian approximation. We demonstrate empirically in simulations and a real data application that the approximation is highly accurate, even for modest sample sizes. We also propose a method for model selection using the approximation. The proposed approximation provides a computationally scalable approach to regularized estimation and approximate Bayesian inference for log-linear models.

Article information

Bayesian Anal. Volume 13, Number 1 (2018), 201-223.

First available in Project Euclid: 21 February 2017

Permanent link to this document

Digital Object Identifier

credible region conjugate prior contingency table Dirichet–Multinomial Kullback–Leibler divergence Laplace approximation

Creative Commons Attribution 4.0 International License.


Johndrow, James; Bhattacharya, Anirban. Optimal Gaussian Approximations to the Posterior for Log-Linear Models with Diaconis–Ylvisaker Priors. Bayesian Anal. 13 (2018), no. 1, 201--223. doi:10.1214/16-BA1046.

Export citation


  • Abramowitz, M. and Stegun, I. A. (1964).Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables. 55. Courier Corporation.
  • Agresti, A. (2002).Categorical Data Analysis, volume 359. John Wiley & Sons.
  • Attias, H. (1999). “Inferring parameters and structure of latent variable models by variational Bayes.” InProceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, 21–30. Morgan Kaufmann Publishers Inc.
  • Bishop, Y. M., Fienberg, S. E., and Holland, P. W. (2007).Discrete Multivariate Analysis: Theory and Practice. Springer Science & Business Media.
  • Chen, C.-P. and Qi, F. (2003). “The best lower and upper bounds of harmonic sequence.”RGMIA Research Report Collection, 6(2).
  • Consonni, G., Veronese, P., and Gutiérrez-Peña, E. (2004). “Reference priors for exponential families with simple quadratic variance function.”Journal of Multivariate Analysis, 88(2): 335–364.
  • Dellaportas, P. and Forster, J. J. (1999). “Markov chain Monte Carlo model determination for hierarchical and graphical log-linear models.”Biometrika, 86(3): 615–633.
  • Diaconis, P. and Ylvisaker, D. (1979). “Conjugate priors for exponential families.”The Annals of Statistics, 7(2): 269–281.
  • Dobra, A. and Lenkoski, A. (2011). “Copula Gaussian graphical models and their application to modeling functional disability data.”The Annals of Applied Statistics, 5(2A): 969–993.
  • Dobra, A. and Massam, H. (2010). “The mode oriented stochastic search (MOSS) algorithm for log-linear models with conjugate priors.”Statistical Methodology, 7(3): 240–253.
  • Fienberg, S. E. and Rinaldo, A. (2007). “Three centuries of categorical data analysis: Log-linear models and maximum likelihood estimation.”Journal of Statistical Planning and Inference, 137(11): 3430–3445.
  • Gelfand, A. E. and Smith, A. F. (1990). “Sampling-based approaches to calculating marginal densities.”Journal of the American Statistical Association, 85(410): 398–409.
  • Gutiérrez-Pena, E. and Smith, A. (1995). “Conjugate parameterizations for natural exponential families.”Journal of the American Statistical Association, 90(432): 1347–1356.
  • Haberman, S. J. (1974). “Log-linear models for frequency tables derived by indirect observation: Maximum likelihood equations.”The Annals of Statistics, 911–924.
  • Hoeting, J. A., Madigan, D., Raftery, A. E., and Volinsky, C. T. (1998). “Bayesian model averaging.” InIn Proceedings of the AAAI Workshop on Integrating Multiple Learned Models, 77–83. Citeseer.
  • Lauritzen, S. L. (1996).Graphical models. Oxford University Press.
  • Letac, G. and Massam, H. (2012). “Bayes factors and the geometry of discrete hierarchical loglinear models.”The Annals of Statistics, 40(2): 861–890.
  • Massam, H., Liu, J., and Dobra, A. (2009). “A conjugate prior for discrete hierarchical log-linear models.”The Annals of Statistics, 37(6A): 3431–3467.
  • Park, M. Y. and Hastie, T. (2007). “L1-regularization path algorithm for generalized linear models.”Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(4): 659–677.
  • Polson, N. G., Scott, J. G., and Windle, J. (2013). “Bayesian inference for logistic models using Pólya–Gamma latent variables.”Journal of the American Statistical Association, 108(504): 1339–1349.
  • Shun, Z. and McCullagh, P. (1995). “Laplace approximation of high dimensional integrals.”Journal of the Royal Statistical Society. Series B (Methodological), 749–760.
  • Tierney, L. and Kadane, J. B. (1986). “Accurate approximations for posterior moments and marginal densities.”Journal of the American Statistical Association, 81(393): 82–86.
  • Wang, B. and Titterington, D. (2004). “Lack of consistency of mean field and variational Bayes approximations for state space models.”Neural Processing Letters, 20(3): 151–170.
  • Wang, B. and Titterington, D. (2005). “Inadequacy of interval estimates corresponding to variational Bayesian approximations.”Proc. 10th Int. Workshop Artificial Intelligence and Statistics, 373–380.
  • Whittaker, J. (1990). “Graphical models in applied multivariate statistics.”
  • Zou, H. and Hastie, T. (2005). “Regularization and variable selection via the elastic net.”Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2): 301–320.