Bayesian Analysis

Incorporating Marginal Prior Information in Latent Class Models

Tracy A. Schifeling and Jerome P. Reiter

Full-text: Open access


We present an approach to incorporating informative prior beliefs about marginal probabilities into Bayesian latent class models for categorical data. The basic idea is to append synthetic observations to the original data such that (i) the empirical distributions of the desired margins match those of the prior beliefs, and (ii) the values of the remaining variables are left missing. The degree of prior uncertainty is controlled by the number of augmented records. Posterior inferences can be obtained via typical MCMC algorithms for latent class models, tailored to deal efficiently with the missing values in the concatenated data. We illustrate the approach using a variety of simulations based on data from the American Community Survey, including an example of how augmented records can be used to fit latent class models to data from stratified samples.

Article information

Bayesian Anal., Volume 11, Number 2 (2016), 499-518.

First available in Project Euclid: 18 June 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

categorical Dirichlet process missing mixture stratified survey


Schifeling, Tracy A.; Reiter, Jerome P. Incorporating Marginal Prior Information in Latent Class Models. Bayesian Anal. 11 (2016), no. 2, 499--518. doi:10.1214/15-BA959.

Export citation


  • Dunson, D. B. and Bhattacharya, A. (2011). “Nonparametric Bayes Regression and Classification Through Mixtures of Product Kernels.” In: Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 9, Proceedings of Ninth Valencia International Conference on Bayesian Statistics. Oxford University Press.
  • Dunson, D. B. and Xing, C. (2009). “Nonparametric Bayes modeling of multivariate categorical data.” Journal of the American Statistical Association, 104(487): 1042–1051.
  • Gebregziabher, M. and DeSantis, S. M. (2010). “Latent class based multiple imputation approach for missing categorical data.” Journal of Statistical Planning and Inference, 140(11): 3252–3262.
  • Gelman, A. and Rubin, D. B. (1992). “Inference from iterative simulation using multiple sequences.” Statistical Science, 7(4): 457–472.
  • Goodman, L. A. (1974). “Exploratory latent structure analysis using both identifiable and unidentifiable models.” Biometrika, 61(2): 215–231.
  • Greenland, S. (2007). “Prior data for non-normal priors.” Statistics in Medicine, 26: 3578–3590.
  • Hu, J. (2015). “Dirichlet Process Mixture Models for Nested Categorical Data.” Ph.D. thesis, Department of Statistical Science, Duke University.
  • Ishwaran, H. and James, L. F. (2001). “Gibbs sampling methods for stick-breaking priors.” Journal of the American Statistical Association, 96(453): pp. 161–173.
  • Jain, S. and Neal, R. M. (2004). “A split–merge Markov chain Monte Carlo procedure for the Dirichlet process mixture model.” Journal of Computational and Graphical Statistics, 13(1): 158–182.
  • Johndrow, J., Cron, A., and Dunson, D. B. (2014). “Bayesian tensor factorizations for massive web networks.” In: ISBA World Meeting 2014 in Cancun, Mexico.
  • Kalli, M., Griffin, J. E., and Walker, S. G. (2009). “Slice sampling mixture models.” Statistics and Computing, 21: 93–105.
  • Kamakura, W. A. and Wedel, M. (1997). “Statistical data fusion for cross-tabulation.” Journal of Marketing Research, 34: 485–498.
  • Kessler, D. C., Hoff, P. D., and Dunson, D. B. (2015). “Marginally specified priors for non-parametric Bayesian estimation.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 77(1): 35–58.
  • Kunihama, T. and Dunson, D. B. (2013). “Bayesian modeling of temporal dependence in large sparse contingency tables.” Journal of the American Statistical Association, 108(504): 1324–1338.
  • Kunihama, T., Herring, A. H., Halpern, C. T., and Dunson, D. B. (2014). “Nonparametric Bayes modeling with sample survey weights.” arXiv:1409.5914.
  • Manrique-Vallier, D. and Reiter, J. P. (2014a). “Bayesian estimation of discrete multivariate latent structure models with structural zeros.” Journal of Computational and Graphical Statistics, 23: 1061–1079.
  • — (2014b). “Bayesian multiple imputation for large-scale categorical data with structural zeros.” Survey Methodology, 40: 125–134.
  • Papaspiliopoulos, O. (2008). “A note on posterior sampling from Dirichlet mixture models.” Technical Report, Centre for Research in Statistical Methodology.
  • Sethuraman, J. (1994). “A constructive definition of D”irichlet priors. Statistica Sinica, 4: 639–650.
  • Si, Y. and Reiter, J. P. (2013). “Nonparametric Bayesian multiple imputation for incomplete categorical variables in large-scale assessment surveys.” Journal of Educational and Behavioral Statistics, 38(5): 499–521.
  • Si, Y., Reiter, J. P., and Hillygus, D. S. (2015). “Semi-parametric selection models for potentially non-ignorable attrition in panel studies with refreshment samples.” Political Analysis, 23(1): 92–112.
  • Vermunt, J. K., Van Ginkel, J. R., Van Der Ark, L. A., and Sijtsma, K. (2008). “Multiple imputation of incomplete categorical data using latent class analysis.” Sociological Methodology, 38(1): 369–397.
  • Wade, S., Mongelluzzo, S., and Petrone, S. (2011). “An Enriched Conjugate Prior for Bayesian Non-parametric Inference.” Bayesian Analysis, 6: 359– 385.
  • Walker, S. G. (2007). “Sampling the Dirichlet mixture model with slices.” Communications in Statistics – Simulation and Computation, 36(1): 45–54.
  • Zhou, J., Bhattacharya, A., Herring, A. H., and Dunson, D. B. (2014). “Bayesian factorizations of big sparse tensors.” Journal of the American Statistical Association, to appear.