The Annals of Statistics

Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs

Vishesh Karwa and Aleksandra Slavković

Full-text: Open access

Abstract

The $\beta$-model of random graphs is an exponential family model with the degree sequence as a sufficient statistic. In this paper, we contribute three key results. First, we characterize conditions that lead to a quadratic time algorithm to check for the existence of MLE of the $\beta$-model, and show that the MLE never exists for the degree partition $\beta$-model. Second, motivated by privacy problems with network data, we derive a differentially private estimator of the parameters of $\beta$-model, and show it is consistent and asymptotically normally distributed—it achieves the same rate of convergence as the nonprivate estimator. We present an efficient algorithm for the private estimator that can be used to release synthetic graphs. Our techniques can also be used to release degree distributions and degree partitions accurately and privately, and to perform inference from noisy degrees arising from contexts other than privacy. We evaluate the proposed estimator on real graphs and compare it with a current algorithm for releasing degree distributions and find that it does significantly better. Finally, our paper addresses shortcomings of current approaches to a fundamental problem of how to perform valid statistical inference from data released by privacy mechanisms, and lays a foundational groundwork on how to achieve optimal and private statistical inference in a principled manner by modeling the privacy mechanism; these principles should be applicable to a class of models beyond the $\beta$-model.

Article information

Source
Ann. Statist. Volume 44, Number 1 (2016), 87-112.

Dates
Received: August 2014
Revised: June 2015
First available in Project Euclid: 10 December 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1449755958

Digital Object Identifier
doi:10.1214/15-AOS1358

Mathematical Reviews number (MathSciNet)
MR3449763

Zentralblatt MATH identifier
1331.62114

Subjects
Primary: 62F12: Asymptotic properties of estimators 91D30: Social networks
Secondary: 62F30: Inference under constraints

Keywords
Degree sequence differential privacy $\beta$-model existence of MLE measurement error

Citation

Karwa, Vishesh; Slavković, Aleksandra. Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs. Ann. Statist. 44 (2016), no. 1, 87--112. doi:10.1214/15-AOS1358. https://projecteuclid.org/euclid.aos/1449755958


Export citation

References

  • Arratia, R. and Liggett, T. M. (2005). How likely is an i.i.d. degree sequence to be graphical? Ann. Appl. Probab. 15 652–670.
  • Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. Wiley, Chichester.
  • Bhattacharya, A., Sivasubramanian, S. and Srinivasan, M. K. (2006). The polytope of degree partitions. Electron. J. Combin. 13 Research Paper 46, 18 pp. (electronic).
  • Blitzstein, J. and Diaconis, P. (2010). A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Internet Math. 6 489–522.
  • Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. Monographs on Statistics and Applied Probability 105. Chapman & Hall/CRC, Boca Raton, FL.
  • Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435.
  • Duchi, J. C., Jordan, M. I. and Wainwright, M. J. (2013). Local privacy, data processing inequalities, and statistical minimax rates. Preprint. Available at arXiv:1302.3203.
  • Dwork, C., McSherry, F., Nissim, K. and Smith, A. (2006a). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography. Lecture Notes in Computer Science 3876 265–284. Springer, Berlin.
  • Dwork, C., Kenthapadi, K., McSherry, F., Mironov, I. and Naor, M. (2006b). Our data, ourselves: Privacy via distributed noise generation. In Advances in Cryptology—EUROCRYPT 2006. Lecture Notes in Computer Science 4004 486–503. Springer, Berlin.
  • Engström, A. and Norén, P. (2010). Polytopes from subgraph statistics. Preprint. Available at arXiv:1011.3552.
  • Fienberg, S. E., Rinaldo, A. and Yang, X. (2010). Differential privacy and the risk-utility tradeoff for multi-dimensional contingency tables. In Proceedings of the 2010 International Conference on Privacy in Statistical Databases, PSD’10 187–199. Springer, Berlin.
  • Fienberg, S. E. and Slavković, A. B. (2010). Data privacy and confidentiality. In International Encyclopedia of Statistical Science 342–345. Springer, Berlin.
  • Ghosh, A., Roughgarden, T. and Sundararajan, M. (2009). Universally utility-maximizing privacy mechanisms. In STOC’09—Proceedings of the 2009 ACM International Symposium on Theory of Computing 351–359. ACM, New York.
  • Goodreau, S. M., Kitts, J. A. and Morris, M. (2009). Birds of a feather, or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography 46 103–125.
  • Hakimi, S. L. (1962). On realizability of a set of integers as degrees of the vertices of a linear graph. I. J. Soc. Indust. Appl. Math. 10 496–506.
  • Handcock, M. S. and Gile, K. J. (2010). Modeling social networks from sampled data. Ann. Appl. Stat. 4 5–25.
  • Havel, V. (1955). A remark on the existence of finite graphs. Casopis Pest. Mat. 80 477–480.
  • Hay, M., Li, C., Miklau, G. and Jensen, D. (2009). Accurate estimation of the degree distribution of private networks. In Ninth IEEE International Conference on Data Mining, ICDM’09 169–178. IEEE, New York.
  • Helleringer, S. and Kohler, H.-P. (2007). Sexual network structure and the spread of HIV in Africa: Evidence from Likoma island, Malawi. AIDS 21 2323–2332.
  • Helleringer, S., Kohler, H.-P., Chimbiri, A., Chatonda, P. and Mkandawire, J. (2009). The Likoma network study: Context, data collection, and initial results. Demogr. Res. 21 427–468.
  • Hillar, C. and Wibisono, A. (2013). Maximum entropy distributions on graphs. Preprint. Available at arXiv:1301.3321.
  • Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65.
  • Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E. S., Spicer, K. and de Wolf, P.-P. (2012). Statistical Disclosure Control. Wiley, Chichester.
  • Hunter, D. R. (2004). MM algorithms for generalized Bradley–Terry models. Ann. Statist. 32 384–406.
  • Hunter, D. R., Goodreau, S. M. and Handcock, M. S. (2008). Goodness of fit of social network models. J. Amer. Statist. Assoc. 103 248–258.
  • Karwa, V. and Slavković, A. B. (2012). Differentially private graphical degree sequences and synthetic graphs. In Privacy in Statistical Databases 273–285. Spinger, Berlin.
  • Karwa, V. and Slavković, A. (2015). Supplement to “Inference using noisy degrees: Differentially private $\beta$-model and synthetic graphs.” DOI:10.1214/15-AOS1358SUPP.
  • Karwa, V., Slavković, A. B. and Krivitsky, P. (2014). Differentially private exponential random graphs. In Privacy in Statistical Databases 143–155. Springer, Berlin.
  • Kasiviswanathan, S. P., Nissim, K., Raskhodnikova, S. and Smith, A. (2013). Analyzing graphs with node differential privacy. In Theory of Cryptography 457–476. Springer, Berlin.
  • Mahadev, N. V. and Peled, U. N. (1995). Threshold Graphs and Related Topics. Elsevier, Amsterdam.
  • Narayanan, A. and Shmatikov, V. (2009). De-anonymizing social networks. In 30th IEEE Symposium on Security and Privacy 173–187. IEEE, New York.
  • Nissim, K., Raskhodnikova, S. and Smith, A. (2007). Smooth sensitivity and sampling in private data analysis. In STOC’07—Proceedings of the 39th Annual ACM Symposium on Theory of Computing 75–84. ACM, New York.
  • Ogawa, M., Hara, H. and Takemura, A. (2011). Graver basis for an undirected graph and its application to testing the beta model of random graphs. Preprint. Available at arXiv:1102.2583.
  • Olhede, S. C. and Wolfe, P. J. (2012). Degree-based network models. Preprint. Available at arXiv:1211.6537.
  • Perry, P. O. and Wolfe, P. J. (2012). Null models for network data. Preprint. Available at arXiv:1201.5871.
  • Ramanayake, A. and Zayatz, L. (2010). Balancing disclosure risk with data quality. Statistical Research Division Research Report Series No. 2010-04, U.S. Census Bureau, Washington, DC.
  • Rinaldo, A., Fienberg, S. E. and Zhou, Y. (2009). On the geometry of discrete exponential families with application to exponential random graph models. Electron. J. Stat. 3 446–484.
  • Rinaldo, A., Petrović, S. and Fienberg, S. E. (2013). Maximum likelihood estimation in the $\beta$-model. Ann. Statist. 41 1085–1110.
  • Sadeghi, K. and Rinaldo, A. (2014). Statistical models for degree distributions of networks. Preprint. Available at arXiv:1411.3825.
  • Sampson, S. F. (1968). A novitiate in a period of change: An experimental and case study of social relationships Ph.D. thesis, Cornell Univ., Ithaca, NY.
  • Smith, A. (2008). Efficient, differentially private point estimators. Preprint. Available at arXiv:0809.4794.
  • Snijders, T. A. B. (2003). Accounting for degree distributions in empirical analysis of network dynamics. In Dynamic Social Network Modeling and Analysis: Workshop Summary and Papers 146–161. The National Academies Press, Washington, DC.
  • Vu, D. and Slavković, A. (2009). Differential privacy for clinical trial data: Preliminary evaluations. In IEEE International Conference on Data Mining Workshops, ICDMW’09 138–143. IEEE, New York.
  • Wasserman, L. and Zhou, S. (2010). A statistical framework for differential privacy. J. Amer. Statist. Assoc. 105 375–389.
  • Willenborg, L. and de Waal, T. (1996). Statistical Disclosure Control in Practice. Springer, New York.
  • Yan, T. and Xu, J. (2013). A central limit theorem in the $\beta$-model for undirected random graphs with a diverging number of vertices. Biometrika 100 519–524.
  • Yan, T., Zhao, Y. and Qin, H. (2015). Asymptotic normality in the maximum entropy models on graphs with an increasing number of parameters. J. Multivariate Anal. 133 61–76.
  • Zachary, W. W. (1977). An information flow model for conflict and fission in small groups. Journal of Anthropological Research 33 452–473.
  • Zhang, J. and Chen, Y. (2013). Sampling for conditional inference on network data. J. Amer. Statist. Assoc. 108 1295–1307.

Supplemental materials

  • Supplement to “Inference using noisy degrees: Differentially Private $\beta$-model and synthetic graphs”. This supplementary material contains the proof of the key Theorems 2, 3 and 4 from the paper.