Open Access
March 2012 Multiple imputation for sharing precise geographies in public use data
Hao Wang, Jerome P. Reiter
Ann. Appl. Stat. 6(1): 229-252 (March 2012). DOI: 10.1214/11-AOAS506

Abstract

When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects’ identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for generating simulated geographies and attributes based on regression trees, and we present methods for assessing disclosure risks with such simulated data.

Citation

Download Citation

Hao Wang. Jerome P. Reiter. "Multiple imputation for sharing precise geographies in public use data." Ann. Appl. Stat. 6 (1) 229 - 252, March 2012. https://doi.org/10.1214/11-AOAS506

Information

Published: March 2012
First available in Project Euclid: 6 March 2012

zbMATH: 1236.86015
MathSciNet: MR2951536
Digital Object Identifier: 10.1214/11-AOAS506

Keywords: Confidentiality , Disclosure , dissemination , spatial , synthetic , tree

Rights: Copyright © 2012 Institute of Mathematical Statistics

Vol.6 • No. 1 • March 2012
Back to Top