Open Access
2024 Efficient and Scalable Bipartite Matching with Fast Beta Linkage (fabl)
Brian Kundinger, Jerome P. Reiter, Rebecca C. Steorts
Author Affiliations +
Bayesian Anal. Advance Publication 1-24 (2024). DOI: 10.1214/24-BA1427

Abstract

Within the field of record linkage, Bayesian methods have the crucial advantage of quantifying uncertainty from imperfect linkages. However, current implementations of Bayesian Fellegi-Sunter models are computationally intensive, making them challenging to use on larger-scale record linkage tasks. To address these computational difficulties, we propose fast beta linkage (fabl), an extension to the Beta Record Linkage (BRL) method of Sadinle (2017). Specifically, we use independent prior distributions over the matching space, allowing us to use hashing techniques that reduce computational overhead. This also allows us to complete pairwise record comparisons over large data files through parallel computing and to reduce memory costs through a new technique called storage efficient indexing. Through simulations and two case studies, we show that fabl can have markedly increased speed with minimal loss of accuracy when compared to BRL.

Acknowledgements

We thank the reviewers for extensive comments and suggestions that immensely improved the quality of the article.

Citation

Download Citation

Brian Kundinger. Jerome P. Reiter. Rebecca C. Steorts. "Efficient and Scalable Bipartite Matching with Fast Beta Linkage (fabl)." Bayesian Anal. Advance Publication 1 - 24, 2024. https://doi.org/10.1214/24-BA1427

Information

Published: 2024
First available in Project Euclid: 17 April 2024

Digital Object Identifier: 10.1214/24-BA1427

Keywords: data cleaning , data fusion , entity resolution , hashing , record linkage

Rights: © 2024 International Society for Bayesian Analysis

Advance Publication
Back to Top