The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 11, Number 4 (2017), 2027-2051.
A unified framework for variance component estimation with summary statistics in genome-wide association studies
Linear mixed models (LMMs) are among the most commonly used tools for genetic association studies. However, the standard method for estimating variance components in LMMs—the restricted maximum likelihood estimation method (REML)—suffers from several important drawbacks: REML requires individual-level genotypes and phenotypes from all samples in the study, is computationally slow, and produces downward-biased estimates in case control studies. To remedy these drawbacks, we present an alternative framework for variance component estimation, which we refer to as MQS. MQS is based on the method of moments (MoM) and the minimal norm quadratic unbiased estimation (MINQUE) criterion, and brings two seemingly unrelated methods—the renowned Haseman–Elston (HE) regression and the recent LD score regression (LDSC)—into the same unified statistical framework. With this new framework, we provide an alternative but mathematically equivalent form of HE that allows for the use of summary statistics. We provide an exact estimation form of LDSC to yield unbiased and statistically more efficient estimates. A key feature of our method is its ability to pair marginal $z$-scores computed using all samples with SNP correlation information computed using a small random subset of individuals (or individuals from a proper reference panel), while capable of producing estimates that can be almost as accurate as if both quantities are computed using the full data. As a result, our method produces unbiased and statistically efficient estimates, and makes use of summary statistics, while it is computationally efficient for large data sets. Using simulations and applications to 37 phenotypes from 8 real data sets, we illustrate the benefits of our method for estimating and partitioning SNP heritability in population studies as well as for heritability estimation in family studies. Our method is implemented in the GEMMA software package, freely available at www.xzlab.org/software.html.
Ann. Appl. Stat., Volume 11, Number 4 (2017), 2027-2051.
Received: November 2016
Revised: March 2017
First available in Project Euclid: 28 December 2017
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Zhou, Xiang. A unified framework for variance component estimation with summary statistics in genome-wide association studies. Ann. Appl. Stat. 11 (2017), no. 4, 2027--2051. doi:10.1214/17-AOAS1052. https://projecteuclid.org/euclid.aoas/1514430276
- Supplementary Material. Supplementary figures, tables and text.