Statistical Science

Confidentiality and Differential Privacy in the Dissemination of Frequency Tables

Yosef Rinott, Christine M. O’Keefe, Natalie Shlomo, and Chris Skinner

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

For decades, national statistical agencies and other data custodians have been publishing frequency tables based on census, survey and administrative data. In order to protect the confidentiality of individuals represented in the data, tables based on original data are modified before release. Recently, in response to user demand for more flexible and responsive table publication services, frequency table publication schemes have been augmented with on-line table generating servers such as the US Census Bureau FactFinder and the Australian Bureau of Statistics (ABS) TableBuilder. These systems allow users to build their own custom tables, and make use of automated perturbation routines to protect confidentiality. Motivated by the growing popularity of table generating servers, in this paper we study confidentiality protection for perturbed frequency tables, including the trade-off with analytical utility, focusing on a version of the ABS TableBuilder as a concrete example of a data release mechanism, and examining its properties. Confidentiality protection is assessed in terms of the differential privacy standard, and this paper can be used as a practical introduction to differential privacy, to calculations related to its application, to the relationship between confidentiality protection and utility and to confidentiality in general.

Article information

Source
Statist. Sci., Volume 33, Number 3 (2018), 358-385.

Dates
First available in Project Euclid: 13 August 2018

Permanent link to this document
https://projecteuclid.org/euclid.ss/1534147228

Digital Object Identifier
doi:10.1214/17-STS641

Mathematical Reviews number (MathSciNet)
MR3843381

Zentralblatt MATH identifier
06991125

Keywords
Differential privacy statistical disclosure control contingency tables utility

Citation

Rinott, Yosef; O’Keefe, Christine M.; Shlomo, Natalie; Skinner, Chris. Confidentiality and Differential Privacy in the Dissemination of Frequency Tables. Statist. Sci. 33 (2018), no. 3, 358--385. doi:10.1214/17-STS641. https://projecteuclid.org/euclid.ss/1534147228


Export citation

References

  • Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K. and Zhang, L. (2016). Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security 308–318. ACM, New York.
  • Andersson, K., Jansson, I. and Kraft, K. (2015). Protection of frequency tables—current work at statistics Sweden. In Joint UNECE/Eurostat Work Session on Statistical Data Confidentiality (Helsinki, Finland, 57 October). 20 pp.
  • Auguste, K. (1883). La cryptographie militaire. J. Sci. Mil. 9 538.
  • Barak, B., Chaudhuri, K., Dwork, C., Kale, S., McSherry, F. and Talwar, K. (2007). Privacy, accuracy, and consistency too: A holistic solution to contingency table release. In Proceedings of the 26th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems (PODS) 273–282.
  • Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York.
  • Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA.
  • Brenner, H. and Nissim, K. (2010). Impossibility of differentially private universally optimal mechanisms. In Foundations of Computer Science (FOCS), 2010 51st Annual IEEE Symposium on 71–80. IEEE, New York.
  • Charest, A.-S. (2010). How can we analyse differentially-private synthetic datasets? J. Priv. Confid. 2 21–33.
  • Chaudhuri, K. and Mishra, N. (2006). When random sampling preserves privacy. In Proceedings of the 26th Annual International Conference on Advances in Cryptology: CRYPTO 2006 (C. Dwork, ed.). LNCS 4117 198–213. Springer, Berlin.
  • Chipperfield, J., Gow, D. and Loong, B. (2016). The Australian Bureau of Statistics and releasing frequency tables via a remote server. Stat. J. IAOS 32 53–64.
  • Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory, 2nd ed. Wiley, New York.
  • Drechsler, J. (2012). New data dissemination approaches in old Europe—synthetic datasets for a German establishment survey. J. Appl. Stat. 39 243–265.
  • Drechsler, J. and Reiter, J. P. (2011). An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput. Statist. Data Anal. 55 3232–3243.
  • Duncan, G. T., Elliot, M. and Salazar-Gonzàlez, J. J. (2011). Statistical Confidentiality. Springer, New York.
  • Duncan, G. T., Fienberg, S. E., Krishnan, R., Padman, R. and Roehrig, S. F. (2001). Disclosure limitation methods and information loss for tabular data. In Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies 135–166.
  • Dwork, C. (2006). Differential privacy. In ICALP 2006 (M. Bugliesi, B. Preneel, V. Sassone and I. Wegener, eds.). Lecture Notes in Computer Science 4052 1–12. Springer, Heidelberg.
  • Dwork, C. and Roth, A. (2014). The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9 211–407.
  • Dwork, C. and Rothblum, G. N. (2016). Concentrated differential privacy. Preprint. Available at arXiv:1603.01887.
  • Dwork, C., Rothblum, G. N. and Vadhan, S. (2010). Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science—FOCS 2010 51–60. IEEE Computer Soc., Los Alamitos, CA.
  • Dwork, C., McSherry, F., Nissim, K. and Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In 3rd IACR Theory of Cryptography Conference 265–284.
  • Evett, I. W., Jackson, G., Lambert, J. A. and McCrossan, S. (2000). The impact of the principles of evidence interpretation on the structure and content of statements. Sci. Justice 40 233–239.
  • Fellegi, I. P. (1972). On the question of statistical confidentiality. J. Amer. Statist. Assoc. 67 7–18.
  • Fienberg, S. E., Rinaldo, A. and Yang, X. (2010). Differential privacy and the risk-utility tradeoff for multi-dimensional contingency tables. In PSD’2010 Privacy in Statistical Databases (J. Domingo-Ferrer and E. Magkos, eds.). LNCS 6344 187–199. Springer, Berlin.
  • Fienberg, S. E. and Slavković, A. B. (2008). A survey of statistical approaches to preserving confidentiality of contingency table entries. In Privacy-Preserving Data Mining 291–312. Springer, Berlin.
  • Fraser, B. and Wooton, J. (2005). A proposed method for confidentialising tabular output to protect against differencing. In Joint UNECE/Eurostat Conference on Statistical Disclosure Control, Geneva, Switzerland, 911 November. Available at https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.46/2005/wp.35.e.pdf.
  • Fuller, W. A. (1993). Masking procedures for microdata disclosure limitation. J. Off. Stat. 9 383–383.
  • Gaboardi, M., Arias, E. J. G., Hsu, J., Roth, A. and Wu, Z. S. (2016). Dual query: Practical private query release for high dimensional data. J. Priv. Confid. 7 53–77.
  • Geng, Q. and Viswanath, P. (2016). The optimal noise-adding mechanism in differential privacy. IEEE Trans. Inform. Theory 62 925–951.
  • Ghosh, A., Roughgarden, T. and Sundararajan, M. (2012). Universally utility-maximizing privacy mechanisms. SIAM J. Comput. 41 1673–1693.
  • Gomatam, S. and Karr, A. (2003). Distortion measures for categorical data swapping. Technical report, National Institute of Statistical Sciences. Available at www.niss.org/downloadabletechreports.html.
  • Gotz, M., Machanavajjhala, A., Wang, G., Xiao, X. and Gehrke, J. (2012). Publishing search logs—a comparative study of privacy guarantees. IEEE Trans. Knowl. Data Eng. 24 520–532.
  • Gymrek, M., McGuire, A. L., Golan, D., Halperin, E. and Erlich, Y. (2013). Identifying personal genomes by surname inference. Science 339 321–324.
  • Hardt, M., Ligett, K. and McSherry, F. (2012). A simple and practical algorithm for differentially private data release. In Advances in Neural Information Processing Systems 2339–2347.
  • Hay, M., Rastogi, V., Miklau, G. and Suciu, D. (2010). Boosting the accuracy of differentially private histograms through consistency. Proc. VLDB Endow. 3 1021–1032.
  • Hay, M., Machanavajjhala, A., Miklau, G., Chen, Y. and Zhang, D. (2016). Principled evaluation of differentially private algorithms using DPBench. In Proceedings of the 2016 International Conference on Management of Data 139–154 ACM, New York.
  • Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J. V., Stephan, D. A., Nelson, S. F. and Craig, D. W. (2008). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4 e1000167.
  • Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E. S., Spicer, K. and de Wolf, P. P. (2012). Statistical Disclosure Control. Wiley, Chichester.
  • Jansson, I. (2012). Issues and plans for the disclosure control of the Swedish Census 2011. Technical Report No. 2012-04-02, Statistika centralbyrån.
  • Kairouz, P., Oh, S. and Viswanath, P. (2017). The composition theorem for differential privacy. IEEE Trans. Inform. Theory 63 4037–4049.
  • Karr, A. F., Kohnen, C. N., Oganian, A., Reiter, J. P. and Sanil, A. P. (2006). A framework for evaluating the utility of data altered to protect confidentiality. Amer. Statist. 60 224–232.
  • Karwa, V., Kifer, D. and Slavković, A. B. (2015). Private posterior distributions from variational approximations. Preprint. Available at arXiv:1511.07896.
  • Karwa, V., Slavković, A. et al. (2016). Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs. Ann. Statist. 44 87–112.
  • Li, C., Miklau, G., Hay, M., McGregor, A. and Rastogi, V. (2015). The matrix mechanism: Optimizing linear counting queries under differential privacy. VLDB J. 24 757–781.
  • Little, R. (1993). Statistical analysis of masked data. J. Off. Stat. 9 407–426.
  • Liu, F. (2017). Generalized gaussian mechanism for differential privacy. Preprint. Available at arXiv:1602.06028v5.
  • Longhurst, J., Tromans, N., Young, C. and Miller, C. (2007). Statistical disclosure control for the 2011 UK census. In Joint UNECE/Eurostat conference on Statistical Disclosure Control, Manchester, 1719 December. Available at http://ec.europa.eu/eurostat/documents/1001617/4569122/TOPIC-3-WP-28-IP-LONGHURST-ET-ALREV.pdf.
  • Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J. and Vilhuber, L. (2008). Privacy: Theory meets practice on the map. In Proceedings of the IEEE 24th International Conference on Data Engineering ICDE 277–286.
  • Marley, J. K. and Leaver, V. L. (2011). A method for confidentialising user-defined tables: Statistical properties and a risk-utility analysis. In Proc. 58th Congress of the International Statistical Institute, ISI 2011 21–26.
  • McSherry, F. and Mironov, I. (2009). Differentially private recommender systems: Building privacy into the net. In Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 627–636. ACM, New York.
  • McSherry, F. and Talwar, K. (2007). Mechanism design via differential privacy. In Foundations of Computer Science, 2007. FOCS’07. 48th Annual IEEE Symposium on 94–103. IEEE, New York.
  • Narayanan, A. and Shmatikov, V. (2008). Robust de-anonymization of large datasets. In Proc IEEE Security & Privacy Conference 111–125.
  • O’Keefe, C. M. and Chipperfield, J. O. (2013). A summary of attack methods and protective measures for fully automated remote analysis systems. Int. Stat. Rev. 81 426–455.
  • Rubin, D. B. (1993). Discussion: Statistical disclosure limitation. J. Off. Stat. 9 462–468.
  • Shannon, C. E. (1949). Communication theory of secrecy systems. Bell Syst. Tech. J. 28 656–715.
  • Shlomo, N. (2007). Statistical disclosure control methods for census frequency tables. Int. Stat. Rev. 75 199–217.
  • Shlomo, N., Antal, L. and Elliot, M. (2015). Measuring disclosure risk and data utility for flexible table generators. J. Off. Stat. 31 305–324.
  • Shlomo, N. and Young, C. (2008). Invariant post-tabular protection of census frequency counts. In PSD’2008 Privacy in Statistical Databases (J. Domingo-Ferrer and Y. Saygin, eds.). LNCS 5261 77–89. Springer, Berlin.
  • Steinke, T. and Ullman, J. (2016). Between pure and approximate differential privacy. J. Priv. Confid. 7 3–22.
  • Sweeney, L. (1997). Weaving technology and policy together to maintain confidentiality. J. Law Med. Ethics 25 98–110.
  • Thompson, G., Broadfood, S. and Elazar, D. (2013). Methodology for automatic confidentialisation of statistical outputs from remote servers at the Australian Bureau of Statistics. In Joint UNECE/Eurostat conference on Statistical Disclosure Control, Ottawa, 2830 October. Available at https://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.46/2013/Topic_1_ABS.pdf.
  • Uhler, C., Slavković, A. and Fienberg, S. E. (2013). Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confid. 5 137–166.
  • van den Hout, A. and van der Heijden, P. G. M. (2002). Randomized response, statistical disclosure control and misclassification: A review. Int. Stat. Rev. 70 269–288.
  • Wang, Y., Lee, J. and Kifer, D. (2017). Revisiting differentially private hypothesis tests for categorical data. Preprint. Available at arXiv:1511.03376v4.
  • Warner, S. L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60 63–69.
  • Wasserman, L. and Zhou, S. (2010). A statistical framework for differential privacy. J. Amer. Statist. Assoc. 105 375–389.
  • Willenborg, L. and de Waal, T. (2001). Elements of Statistical Disclosure Control. Lecture Notes in Statistics 155. Springer, Berlin.