Open Access
December 2016 Categorical data fusion using auxiliary information
Bailey K. Fosdick, Maria DeYoreo, Jerome P. Reiter
Ann. Appl. Stat. 10(4): 1907-1929 (December 2016). DOI: 10.1214/16-AOAS925


In data fusion, analysts seek to combine information from two databases comprised of disjoint sets of individuals, in which some variables appear in both databases and other variables appear in only one database. Most data fusion techniques rely on variants of conditional independence assumptions. When inappropriate, these assumptions can result in unreliable inferences. We propose a data fusion technique that allows analysts to easily incorporate auxiliary information on the dependence structure of variables not observed jointly; we refer to this auxiliary information as glue. With this technique, we fuse two marketing surveys from the book publisher HarperCollins using glue from the online, rapid-response polling company CivicScience. The fused data enable estimation of associations between people’s preferences for authors and for learning about new books. The analysis also serves as a case study on the potential for using online surveys to aid data fusion.


Download Citation

Bailey K. Fosdick. Maria DeYoreo. Jerome P. Reiter. "Categorical data fusion using auxiliary information." Ann. Appl. Stat. 10 (4) 1907 - 1929, December 2016.


Received: 1 June 2015; Revised: 1 December 2015; Published: December 2016
First available in Project Euclid: 5 January 2017

zbMATH: 06688762
MathSciNet: MR3592042
Digital Object Identifier: 10.1214/16-AOAS925

Keywords: imputation , Integration , latent class , Matching

Rights: Copyright © 2016 Institute of Mathematical Statistics

Vol.10 • No. 4 • December 2016
Back to Top