Open Access
March 2018 Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data
Jingchen Hu, Jerome P. Reiter, Quanli Wang
Bayesian Anal. 13(1): 183-200 (March 2018). DOI: 10.1214/16-BA1047

Abstract

We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the American Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use files. Supplementary materials (Hu et al., 2017) for this article are available online.

Citation

Download Citation

Jingchen Hu. Jerome P. Reiter. Quanli Wang. "Dirichlet Process Mixture Models for Modeling and Generating Synthetic Versions of Nested Categorical Data." Bayesian Anal. 13 (1) 183 - 200, March 2018. https://doi.org/10.1214/16-BA1047

Information

Published: March 2018
First available in Project Euclid: 24 January 2017

zbMATH: 06873723
MathSciNet: MR3737948
Digital Object Identifier: 10.1214/16-BA1047

Keywords: Confidentiality , Disclosure , latent , Multinomial , synthetic

Vol.13 • No. 1 • March 2018
Back to Top