Open Access
September 2017 Bayesian large-scale multiple regression with summary statistics from genome-wide association studies
Xiang Zhu, Matthew Stephens
Ann. Appl. Stat. 11(3): 1561-1592 (September 2017). DOI: 10.1214/17-AOAS1046


Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors, they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously unreported loci that show evidence for association with height in our analyses. Software is available at


Download Citation

Xiang Zhu. Matthew Stephens. "Bayesian large-scale multiple regression with summary statistics from genome-wide association studies." Ann. Appl. Stat. 11 (3) 1561 - 1592, September 2017.


Received: 1 March 2016; Revised: 1 April 2017; Published: September 2017
First available in Project Euclid: 5 October 2017

zbMATH: 1380.62263
MathSciNet: MR3709570
Digital Object Identifier: 10.1214/17-AOAS1046

Keywords: association study , Bayesian regression , explained variation , genome wide , heritability , Markov chain Monte Carlo , multiple-SNP analysis , Summary statistics , Variable selection

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.11 • No. 3 • September 2017
Back to Top