Open Access
December 2010 Reuse, recycle, reweigh: Combating influenza through efficient sequential Bayesian computation for massive data
Jennifer A. Tom, Janet S. Sinsheimer, Marc A. Suchard
Ann. Appl. Stat. 4(4): 1722-1748 (December 2010). DOI: 10.1214/10-AOAS349

Abstract

Massive datasets in the gigabyte and terabyte range combined with the availability of increasingly sophisticated statistical tools yield analyses at the boundary of what is computationally feasible. Compromising in the face of this computational burden by partitioning the dataset into more tractable sizes results in stratified analyses, removed from the context that justified the initial data collection. In a Bayesian framework, these stratified analyses generate intermediate realizations, often compared using point estimates that fail to account for the variability within and correlation between the distributions these realizations approximate. However, although the initial concession to stratify generally precludes the more sensible analysis using a single joint hierarchical model, we can circumvent this outcome and capitalize on the intermediate realizations by extending the dynamic iterative reweighting MCMC algorithm. In doing so, we reuse the available realizations by reweighting them with importance weights, recycling them into a now tractable joint hierarchical model. We apply this technique to intermediate realizations generated from stratified analyses of 687 influenza A genomes spanning 13 years allowing us to revisit hypotheses regarding the evolutionary history of influenza within a hierarchical statistical framework.

Citation

Download Citation

Jennifer A. Tom. Janet S. Sinsheimer. Marc A. Suchard. "Reuse, recycle, reweigh: Combating influenza through efficient sequential Bayesian computation for massive data." Ann. Appl. Stat. 4 (4) 1722 - 1748, December 2010. https://doi.org/10.1214/10-AOAS349

Information

Published: December 2010
First available in Project Euclid: 4 January 2011

zbMATH: 1220.62144
MathSciNet: MR2829934
Digital Object Identifier: 10.1214/10-AOAS349

Keywords: Gibbs variable selection , hierarchical Bayesian model , importance sampling , influenza A , Markov chain Monte Carlo , massive data

Rights: Copyright © 2010 Institute of Mathematical Statistics

Vol.4 • No. 4 • December 2010
Back to Top