Electronic Journal of Statistics

Statistical properties of simple random-effects models for genetic heritability

David Steinsaltz, Andrew Dahl, and Kenneth W. Wachter

Full-text: Open access

Abstract

Random-effects models are a popular tool for analysing total narrow-sense heritability for quantitative phenotypes, on the basis of large-scale SNP data. Recently, there have been disputes over the validity of conclusions that may be drawn from such analysis. We derive some of the fundamental statistical properties of heritability estimates arising from these models, showing that the bias will generally be small. We show that the score function may be manipulated into a form that facilitates intelligible interpretations of the results. We go on to use this score function to explore the behavior of the model when certain key assumptions of the model are not satisfied — shared environment, measurement error, and genetic effects that are confined to a small subset of sites.

The variance and bias depend crucially on the variance of certain functionals of the singular values of the genotype matrix. A useful baseline is the singular value distribution associated with genotypes that are completely independent — that is, with no linkage and no relatedness — for a given number of individuals and sites. We calculate the corresponding variance and bias for this setting.

Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 321-358.

Dates
Received: November 2016
First available in Project Euclid: 15 February 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1518663656

Digital Object Identifier
doi:10.1214/17-EJS1386

Mathematical Reviews number (MathSciNet)
MR3763909

Zentralblatt MATH identifier
1383.92051

Subjects
Primary: 92D10: Genetics {For genetic algebras, see 17D92}
Secondary: 62P10: Applications to biology and medical sciences 62F10: Point estimation 60B20: Random matrices (probabilistic aspects; for algebraic aspects see 15B52)

Keywords
Heritability random-effects models random matrices Marčenko–Pastur distribution GCTA

Rights
Creative Commons Attribution 4.0 International License.

Citation

Steinsaltz, David; Dahl, Andrew; Wachter, Kenneth W. Statistical properties of simple random-effects models for genetic heritability. Electron. J. Statist. 12 (2018), no. 1, 321--358. doi:10.1214/17-EJS1386. https://projecteuclid.org/euclid.ejs/1518663656


Export citation

References

  • [1] Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. Fitting linear mixed-effects models using lme4., Journal of Statistical Software, 67(1):1–48, 2015.
  • [2] S R Browning and B L Browning. Population structure can inflate SNP-based heritability estimates., The American Journal of Human Genetics, 89(1):191–193, 2011.
  • [3] Francesco Paolo Casale, Barbara Rakitsch, Christoph Lippert, and Oliver Stegle. Efficient set tests for the genetic analysis of correlated traits., Nature Methods, 12(8):755–758, June 2015.
  • [4] Guo-Bo Chen. Estimating heritability of complex traits from genome-wide association studies using IBS-based Haseman–Elston regression., Frontiers in Genetics, 5:107, 2014.
  • [5] Cross-Disorder Group of the Psychiatric Genomics Consortium. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs., Nature Genetics, 45(9):984–994, September 2013.
  • [6] Andrew Dahl, Valentina Iotchkova, Amelie Baud, Åsa Johansson, Ulf Gyllensten, Nicole Soranzo, Richard Mott, Andreas Kranis, and Jonathan Marchini. A multiple-phenotype imputation method for genetic studies., Nature Genetics, February 2016.
  • [7] Andy Dahl, Victoria Hore, Valentina Iotchkova, and Jonathan Marchini. Network inference in matrix-variate Gaussian models with non-independent noise. arXiv :1312.1622, December, 2013.
  • [8] Martin Egozcue, L Fuentes Garcıa, Wing Keung Wong, and Ricardas Zitikis. The smallest upper bound for the $p$-th absolute central moment of a class of random variables., The Mathematical Scientist, 37(2), 2012.
  • [9] Hilary K Finucane, Brendan Bulik-Sullivan, Alexander Gusev, Gosia Trynka, Yakir Reshef, Po-Ru Loh, Verneri Anttila, Han Xu, Chongzhi Zang, Kyle Farh, Stephan Ripke, Felix R Day, Shaun Purcell, Eli Stahl, Sara Lindström, John R B Perry, Yukinori Okada, Soumya Raychaudhuri, Mark J Daly, Nick Patterson, Benjamin M Neale, and Alkes L Price. Partitioning heritability by functional annotation using genome-wide association summary statistics., Nature Genetics, September 2015.
  • [10] Daniel Gianola. Priors in whole-genome regression: the Bayesian alphabet returns., Genetics, 194(3):573–596, July 2013.
  • [11] Daniel Gianola, Gustavo de los Campos, William G Hill, Eduardo Manfredi, and Rohan Fernando. Additive genetic variability and the Bayesian alphabet., Genetics, 183(1):347–363, September 2009.
  • [12] Michael E Goddard, Sang Hong Lee, Jian Yang, Naomi R Wray, and Peter M Visscher. Response to Browning and Browning., The American Journal of Human Genetics, 89(1):193–195, 2011.
  • [13] David Golan, Eric S Lander, and Saharon Rosset. Measuring missing heritability: inferring the contribution of common variants., Proceedings of the National Academy of Sciences of the United States of America, 111(49): E5272–81, December 2014.
  • [14] Ivan P Gorlov, Olga Y Gorlova, Shamil R Sunyaev, Margaret R Spitz, and Christopher I Amos. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms., American Journal of Human Genetics, 82(1):100–112, January 2008.
  • [15] Jiming Jiang. REML estimation: asymptotic behavior and related topics., The Annals of Statistics, 24(1):255–286, 1996.
  • [16] Jiming Jiang, Cong Li, Debashis Paul, Can Yang, and Hongyu Zhao. On high-dimensional misspecified mixed model analysis in genome-wide association study., The Annals of Statistics, 44(5) :2127–2160, October 2016.
  • [17] Hyun Min Kang, Jae Hoon Sul, Susan K Service, Noah A Zaitlen, Sit-yee Kong, Nelson B Freimer, Chiara Sabatti, and Eleazar Eskin. Variance component model to account for sample structure in genome-wide association studies., Nature Genetics, 42(4):348–354, March 2010.
  • [18] Hyun Min Kang, Noah A Zaitlen, Claire M Wade, Andrew Kirby, David Heckerman, Mark J Daly, and Eleazar Eskin. Efficient control of population structure in model organism association mapping., Genetics, 178(3) :1709–1723, March 2008.
  • [19] Siddharth Krishna Kumar, Marcus W. Feldman, David H. Rehkopf, and Shripad Tuljapurkar. GCTA produces unreliable heritability estimates (letter)., Proceedings of the National Academy of Sciences of the United States of America, 113(32): E4581, 9 August 2016.
  • [20] Siddharth Krishna Kumar, Marcus W Feldman, David H Rehkopf, and Shripad Tuljapurkar. Limitations of GCTA as a solution to the missing heritability problem., Proceedings of the National Academy of Sciences, 113(1):E61–E70, 2016.
  • [21] Siddharth Krishna Kumar, Marcus W. Feldman, David H. Rehkopf, and Shripad Tuljapurkar. Response to “Commentary on ‘Limitations of GCTA as a solution to the missing heritability problem’ ”. bioRxiv /2016/039594, 2016.
  • [22] J J Lee and C C Chow. Conditions for the validity of SNP-based heritability estimation., Human Genetics, 133(8), 1011–1022, August 2014.
  • [23] S Hong Lee, Jian Yang, Guo-Bo Chen, Stephan Ripke, Eli A Stahl, Christina M Hultman, Pamela Sklar, Peter M Visscher, Patrick F Sullivan, Michael E Goddard, and Naomi R Wray. Estimation of SNP heritability from dense genotype data., The American Journal of Human Genetics, 93(6) :1151–1155, December 2013.
  • [24] Sang Hong Lee, Naomi R Wray, Michael E Goddard, and Peter M Visscher. Estimating Missing Heritability for Disease from Genome-wide Association Studies., The American Journal of Human Genetics, 88(3):294–305, March 2011.
  • [25] Sang Hong Lee, Jian Yang, Michael E Goddard, Peter M Visscher, and Naomi R Wray. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood., Bioinformatics, 28(19) :2540–2542, October 2012.
  • [26] Bruce Lindsay and Jiawei Liu. Model assessment tools for a model false world., Statistical Science, 24(3): 303–318, 2009.
  • [27] Christoph Lippert, Jennifer Listgarten, Ying Liu, Carl M Kadie, Robert I Davidson, and David Heckerman. FaST linear mixed models for genome-wide association studies., Nature Methods, 8(10):833–835, 2011.
  • [28] Christoph Lippert, Gerald Quon, Eun Yong Kang, Carl M Kadie, Jennifer Listgarten, and David Heckerman. The benefits of selecting phenotype-specific variants for applications of mixed models in genomics., Scientific reports, 3 :1815, 2013.
  • [29] Christoph Lippert, Jing Xiang, Danilo Horta, Christian Widmer, Carl Kadie, David Heckerman, and Jennifer Listgarten. Greater power and computational efficiency for kernel-based association testing of sets of genetic variants., Bioinformatics, 30(22) :3206–3214, November 2014.
  • [30] Jennifer Listgarten, Christoph Lippert, and David Heckerman. FaST-LMM-Select for addressing confounding from spatial structure and rare variants., Nature Genetics, 45(5):470–471, May 2013.
  • [31] Jennifer Listgarten, Christoph Lippert, Eun Yong Kang, Jing Xiang, Carl M Kadie, and David Heckerman. A powerful and efficient set test for genetic markers that handles confounders., Bioinformatics, 29(12) :1526–1533, June 2013.
  • [32] Colin L Mallows and Kenneth W Wachter. Asymptotic configuration of Wishart eigenvalues., Annals of Mathematical Statistics, 41(4) :1384, 1970. (Abstract of paper presented at IMS annual meeting, Laramie, August 25–28, 1970.).
  • [33] Vladimir A Marčenko and Leonid A Pastur. Distribution of eigenvalues for some sets of random matrices., Mathematics of the USSR-Sbornik, 1(4):457, 1967.
  • [34] Jonathan Marchini, Lon R Cardon, Michael S Phillips, and Peter Donnelly. The effects of human population structure on large genetic association studies., Nature Genetics, 36(5):512–517, May 2004.
  • [35] Matti Pirinen, Peter Donnelly, and Chris C A Spencer. Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies., The Annals of Applied Statistics, 7(1):369–390, March 2013.
  • [36] J K Pritchard, M Stephens, N A Rosenberg, and P Donnelly. Association mapping in structured populations., The American Journal of Human Genetics, 67(1):170–181, July 2000.
  • [37] Barbara Rakitsch, Christoph Lippert, Karsten M Borgwardt, and Oliver Stegle. It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals. In, Advances in neural information processing systems, pp. 1466–1474, 2013.
  • [38] Vincent Segura, Bjarni J Vilhjálmsson, Alexander Platt, Arthur Korte, Ümit Seren, Quan Long, and Magnus Nordborg. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations., Nature Genetics, 44(7):825–830, July 2012.
  • [39] Doug Speed and David J Balding. MultiBLUP: improved SNP-based prediction for complex traits., Genome Research, 24(9) :1550–1557, June 2014.
  • [40] Doug Speed, Na Cai, Michael Johnson, Sergey Nejentsev, and David J Balding. Re-evaluation of SNP heritability in complex human traits., Nature Genectics, 49(7):986–992 July 2017.
  • [41] Doug Speed, Gibran Hemani, Michael R Johnson, and David J Balding. Improved heritability estimation from genome-wide SNPs., American Journal of Human Genetics, 91(6) :1011–1021, December 2012.
  • [42] Doug Speed, Gibran Hemani, Michael R Johnson, and David J Balding. Response to Lee et al.: SNP-Based Heritability Analysis with Dense Data., American Journal of Human Genetics, 93(6) :1155–1157, December 2013.
  • [43] Oliver Stegle, Christoph Lippert, Joris M Mooij, Neil D Lawrence, and Karsten M Borgwardt. Efficient inference in matrix-variate Gaussian models with i.i.d. observation noise., NIPS, pages 630–638, 2011.
  • [44] David Steinsaltz, Andy Dahl, and Kenneth W. Wachter. On negative heritability and negative estimates of heritability. bioRxiv, /2017/232843.
  • [45] Gulnara R Svishcheva, Tatiana I Axenovich, Nadezhda M Belonogova, Cornelia M van Duijn, and Yurii S Aulchenko. Rapid variance components-based method for whole-genome association analysis., Nature Genetics, 44(10) :1166–1170, October 2012.
  • [46] Peter M Visscher and Michael E Goddard. A general unified framework to assess the sampling variance of heritability estimates using pedigree or marker-based relationships., Genetics, 199(1):223–232, 2015.
  • [47] Kenneth W Wachter. The strong limits of random matrix spectra for sample matrices of independent elements., The Annals of Probability, pages 1–18, 1978.
  • [48] Jon Wakefield. Bayes factors for genome-wide association studies: comparison with p-values., Genetic Epidemiology, 33(1):79–86, January 2009.
  • [49] Andrew R Wood, Tonu Esko, Jian Yang, Sailaja Vedantam, Tune H Pers, Stefan Gustafsson, Audrey Y Chu, Karol Estrada, Jian’an Luan, Zoltán Kutalik, et al. Defining the role of common variation in the genomic and biological architecture of adult human height., Nature Genetics, 46(11) :1173–1186, 2014.
  • [50] Naomi R Wray, Jian Yang, Ben J Hayes, Alkes L Price, Michael E Goddard, and Peter M Visscher. Pitfalls of predicting complex traits from SNPs., Nature Reviews Genetics, 14(7):507–515, 2013.
  • [51] Jian Yang, Beben Benyamin, Brian P McEvoy, Scott Gordon, Anjali K Henders, Dale R Nyholt, Pamela A Madden, Andrew C Heath, Nicholas G Martin, Grant W Montgomery, et al. Common SNPs explain a large proportion of the heritability for human height., Nature Genetics, 42(7):565–569, 2010.
  • [52] Jian Yang, S Hong Lee, Michael E Goddard, and Peter M Visscher. GCTA: a tool for genome-wide complex trait analysis., The American Journal of Human Genetics, 88(1):76–82, 2011.
  • [53] Jian Yang, S. Hong Lee, Naomi R. Wray, Michael E. Goddard, and Peter M. Visscher. Commentary on “Limitations of GCTA as a solution to the missing heritability problem”. bioRxiv, /2016/036574.
  • [54] Jian Yang, S. Hong Lee, Naomi R. Wray, Michael E. Goddard, and Peter M. Visscher. GCTA–GREML accounts for linkage disequilibrium when estimating genetic variance from genome-wide SNPs (letter)., Proceedings of the National Academy of Sciences of the United States of America, 113(32): E4579–E4580, 9 August 2016.
  • [55] Jian Yang, Noah A Zaitlen, Michael E Goddard, Peter M Visscher, and Alkes L Price. Advantages and pitfalls in the application of mixed-model association methods., Nature Genetics, 46(2):100–106, 2014.
  • [56] Noah Zaitlen and Peter Kraft. Heritability in the genome-wide association era., Human Genetics, 131(10) :1655–1664, July 2012.
  • [57] Noah Zaitlen, Peter Kraft, Nick Patterson, Bogdan Pasaniuc, Gaurav Bhatia, Samuela Pollack, and Alkes L Price. Using extended genealogy to estimate components of heritability for 23 quantitative and dichotomous traits., PLoS Genetics, 9(5): e1003520, May 2013.
  • [58] Xiang Zhou, Peter Carbonetto, and Matthew Stephens. Polygenic Modeling with Bayesian Sparse Linear Mixed Models., PLoS Genetics, 9(2): e1003264, February 2013.
  • [59] Xiang Zhou and Matthew Stephens. Genome-wide efficient mixed-model analysis for association studies., Nature Genetics, 44(7):821–824, June 2012.
  • [60] Xiang Zhou and Matthew Stephens. Efficient multivariate linear mixed model algorithms for genome-wide association studies., Nature Methods, 11(4):407–409, February 2014.