Open Access
2021 Bayesian nonparametric disclosure risk assessment
Stefano Favaro, Francesca Panero, Tommaso Rigon
Author Affiliations +
Electron. J. Statist. 15(2): 5626-5651 (2021). DOI: 10.1214/21-EJS1933


Any decision about the release of microdata for public use is supported by the estimation of measures of disclosure risk, the most popular being the number τ1 of sample uniques that are also population uniques. In such a context, parametric and nonparametric partition-based models have been shown to have: i) the strength of leading to estimators of τ1 with desirable features, including ease of implementation, computational efficiency and scalability to massive data; ii) the weakness of producing underestimates of τ1 in realistic scenarios, with the underestimation getting worse as the tail behaviour of the empirical distribution of microdata gets heavier. To fix this underestimation phenomenon, we propose a Bayesian nonparametric partition-based model that can be tuned to the tail behaviour of the empirical distribution of microdata. Our model relies on the Pitman–Yor process prior, and it leads to a novel estimator of τ1 with all the desirable features of partition-based estimators and that, in addition, allows to reduce underestimation by tuning a “discount” parameter. We show the effectiveness of our estimator through its application to synthetic data and real data.

Funding Statement

Stefano Favaro received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme under grant agreement No 817257. Stefano Favaro gratefully acknowledge the financial support from the Italian Ministry of Education, University and Research (MIUR), “Dipartimenti di Eccellenza” grant 2018-2022.


The authors are grateful to an Associate Editor and an anonymous Referee for all their critical comments, corrections, and suggestions which improved remarkably the present paper.


Download Citation

Stefano Favaro. Francesca Panero. Tommaso Rigon. "Bayesian nonparametric disclosure risk assessment." Electron. J. Statist. 15 (2) 5626 - 5651, 2021.


Received: 1 May 2021; Published: 2021
First available in Project Euclid: 27 December 2021

Digital Object Identifier: 10.1214/21-EJS1933

Primary: 62F15 , 62G05

Keywords: Bayesian nonparametrics , data confidentiality , Dirichlet process prior , Disclosure risk assessment , Empirical Bayes , Pitman–Yor process prior

Vol.15 • No. 2 • 2021
Back to Top