Open Access
March 2023 Reproducible Model Selection Using Bagged Posteriors
Jonathan H. Huggins, Jeffrey W. Miller
Author Affiliations +
Bayesian Anal. 18(1): 79-104 (March 2023). DOI: 10.1214/21-BA1301

Abstract

Bayesian model selection is premised on the assumption that the data are generated from one of the postulated models. However, in many applications, all of these models are incorrect (that is, there is misspecification). When the models are misspecified, two or more models can provide a nearly equally good fit to the data, in which case Bayesian model selection can be highly unstable, potentially leading to self-contradictory findings. To remedy this instability, we propose to use bagging on the posterior distribution (“BayesBag”) – that is, to average the posterior model probabilities over many bootstrapped datasets. We provide theoretical results characterizing the asymptotic behavior of the posterior and the bagged posterior in the (misspecified) model selection setting. We empirically assess the BayesBag approach on synthetic and real-world data in (i) feature selection for linear regression and (ii) phylogenetic tree reconstruction. Our theory and experiments show that, when all models are misspecified, BayesBag (a) provides greater reproducibility and (b) places posterior mass on optimal models more reliably, compared to the usual Bayesian posterior; on the other hand, under correct specification, BayesBag is slightly more conservative than the usual posterior, in the sense that BayesBag posterior probabilities tend to be slightly farther from the extremes of zero and one. Overall, our results demonstrate that BayesBag provides an easy-to-use and widely applicable approach that improves upon Bayesian model selection by making it more stable and reproducible.

Funding Statement

J.H.H. was supported in part by the National Institute of General Medical Sciences of the National Institutes of Health under grant number R01GM144963 as part of the Joint NSF/NIGMS Mathematical Biology Program. J.W.M. was supported by the National Cancer Institute of the National Institutes of Health under grant number R01CA240299. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Acknowledgments

We thank Pierre Jacob for bringing P. Bühlmann’s BayesBag paper to our attention, Ziheng Yang for sharing the whale dataset and his MrBayes scripts, Ryan Giordano and Pierre Jacob for helpful feedback on an earlier draft of this paper, Peter Grünwald, Natalia Bochkina, Mathieu Gerber, and Anthony Lee for helpful discussions, the AE and three reviewers for their constructive comments, and especially the third reviewer, who provided numerous insightful comments that substantially enhanced the scope and readability of the paper.

Citation

Download Citation

Jonathan H. Huggins. Jeffrey W. Miller. "Reproducible Model Selection Using Bagged Posteriors." Bayesian Anal. 18 (1) 79 - 104, March 2023. https://doi.org/10.1214/21-BA1301

Information

Published: March 2023
First available in Project Euclid: 8 February 2022

MathSciNet: MR4515726
Digital Object Identifier: 10.1214/21-BA1301

Keywords: asymptotics , bagging , Bayesian model averaging , bootstrap , model misspecification , stability

Vol.18 • No. 1 • March 2023
Back to Top