Abstract
Within the field of record linkage, Bayesian methods have the crucial advantage of quantifying uncertainty from imperfect linkages. However, current implementations of Bayesian Fellegi-Sunter models are computationally intensive, making them challenging to use on larger-scale record linkage tasks. To address these computational difficulties, we propose fast beta linkage (fabl), an extension to the Beta Record Linkage (BRL) method of Sadinle (2017). Specifically, we use independent prior distributions over the matching space, allowing us to use hashing techniques that reduce computational overhead. This also allows us to complete pairwise record comparisons over large data files through parallel computing and to reduce memory costs through a new technique called storage efficient indexing. Through simulations and two case studies, we show that fabl can have markedly increased speed with minimal loss of accuracy when compared to BRL.
Acknowledgements
We thank the reviewers for extensive comments and suggestions that immensely improved the quality of the article.
Citation
Brian Kundinger. Jerome P. Reiter. Rebecca C. Steorts. "Efficient and Scalable Bipartite Matching with Fast Beta Linkage (fabl)." Bayesian Anal. Advance Publication 1 - 24, 2024. https://doi.org/10.1214/24-BA1427
Information