- Bayesian Anal. (2017), 25 pages.
Big Data Bayesian Linear Regression and Variable Selection by Normal-Inverse-Gamma Summation
We introduce the normal-inverse-gamma summation operator, which combines Bayesian regression results from different data sources and leads to a simple split-and-merge algorithm for big data regressions. The summation operator is also useful for computing the marginal likelihood and facilitates Bayesian model selection methods, including Bayesian LASSO, stochastic search variable selection, Markov chain Monte Carlo model composition, etc. Observations are scanned in one pass and then the sampler iteratively combines normal-inverse-gamma distributions without reloading the data. Simulation studies demonstrate that our algorithms can efficiently handle highly correlated big data. A real-world data set on employment and wage is also analyzed.
Bayesian Anal. (2017), 25 pages.
First available in Project Euclid: 8 November 2017
Permanent link to this document
Digital Object Identifier
Qian, Hang. Big Data Bayesian Linear Regression and Variable Selection by Normal-Inverse-Gamma Summation. Bayesian Anal., advance publication, 8 November 2017. doi:10.1214/17-BA1083. https://projecteuclid.org/euclid.ba/1510110046
- Supplementary Material for Big Data Bayesian Linear Regression and Variable Selection by Normal-Inverse-Gamma Summation. Proofs of Proposition 1–9 and data cleaning procedures in Section 6 (in a separate document).