Electronic Journal of Statistics

Heritability estimation in case-control studies

Anna Bonnet

Full-text: Open access


In the field of genetics, the concept of heritability refers to the proportion of variations of a biological trait or disease that can be explained by genetic factors. Quantifying the heritability of a disease is a fundamental challenge in human genetics, especially when the causes are plural and not clearly identified. Although the literature regarding heritability estimation for binary traits is less rich than for quantitative traits, several methods have been proposed to estimate the heritability of complex diseases. However, to the best of our knowledge, the existing methods are not supported by theoretical grounds. Moreover, most of the methodologies do not take into account a major specificity of the data coming from medical studies, which is the oversampling of the number of patients compared to controls. We propose in this paper to investigate the theoretical properties of the Phenotype Correlation Genotype Correlation (PCGC) regression developed by Golan, Lander and Rosset (2014), which is one of the major techniques used in statistical genetics and which is very efficient in practice, despite the oversampling of patients. Our main result is the proof of the consistency of this estimator, under several assumptions that we will state and discuss. We also provide a numerical study to compare two approximations leading to two heritability estimators.

Article information

Electron. J. Statist., Volume 12, Number 1 (2018), 1662-1716.

Received: September 2017
First available in Project Euclid: 29 May 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Case-control studies heritability high dimension mixed models

Creative Commons Attribution 4.0 International License.


Bonnet, Anna. Heritability estimation in case-control studies. Electron. J. Statist. 12 (2018), no. 1, 1662--1716. doi:10.1214/18-EJS1424. https://projecteuclid.org/euclid.ejs/1527559245

Export citation


  • Bonnet, A., Gassiat, E. and Levy-Leduc, C. (2015). Heritability estimation in high-dimensional sparse linear mixed models., Electronic Journal of Statistics 9 2099–2129.
  • Breslow, N. E. and Clayton, D. G. (1993). Approximate Inference in Generalized Linear Mixed Models., Journal of the American Statistical Association 88 9-25.
  • de Villemereuil, P., Gimenez, O. and Doligez, B. (2013). Comparing parent–offspring regression with frequentist and Bayesian animal models to estimate heritability in wild populations: a simulation study for Gaussian and binary traits., Methods in Ecology and Evolution 4 260–275.
  • Falconer, D. S. (1965). The inheritance of liability to certain diseases, estimated from the incidence among relatives., Annals of Human Genetics 29 51–76.
  • Golan, D., Lander, E. S. and Rosset, S. (2014). Measuring missing heritability: Inferring the contribution of common variants., Proceedings of the National Academy of Sciences 111 E5272–E5281.
  • Hadfield, J. D. (2010). MCMC Methods for Multi-Response Generalized Linear Mixed Models: The MCMCglmm R Package., Journal of Statistical Software 33 1–22.
  • Jiang, J., Li, C., Paul, D., Yang, C. and Zhao, H. (2016). On high-dimensional misspecified mixed model analysis in genome-wide association study., Ann. Statist. 44 2127–2160.
  • Lee, S. H., Wray, N. R., Goddard, M. E. and Visscher, P. M. (2011). Estimating missing heritability for disease from genome-wide association studies., The American Journal of Human Genetics 88 294–305.
  • Patterson, N., Price, A. L. and Reich, D. (2006). Population Structure and Eigenanalysis., PLoS Genetics 2 997–1004.
  • Patterson, H. D. and Thompson, R. (1971). Recovery of Inter-block Information when Block Sizes are Unequal., Biometrika 58 545-554.
  • Pirinen, M., Donnelly, P. and Spencer, C. C. A. (2013). Efficient computation with a linear mixed model on large-scale data sets with applications to genetic studies., The Annals of Applied Statistics 7 369–390.
  • Purcell, S., Wray, N., Stone, J., Visscher, P., O’Donovan, M., Sullivan, P., Sklar, P., International Schizophrenia Consortium and Pickard, B. (2009). Common polygenic variation contributes to risk of schizophrenia and bipolar disorder., Nature 460 748–752.
  • Searle, S. R., Casella, G. and McCulloch, C. E. (1992)., Variance Components. Wiley Series in Probability and Statistics. Wiley, New Jersey.
  • Speed, D., Hemani, G., Johnson, M. and Balding, D. (2012). Improved Heritability Estimation from Genome-wide SNPs., The American Journal of Human Genetics 91 1011–1021.
  • Tenesa, A. and Haley, C. (2013). The heritability of human disease: estimation, uses and abuses., Nature Reviews Genetics 14 139–49.
  • Yang, J., Benyamin, B., McEvoy, B. P., Gordon, S., Henders, A. K., Nyholt, D. R., Madden, P. A., Heath, A. C., Martin, N. G., Montgomery, G. W., Goddard, M. E. and Visscher, P. M. (2010). Common SNPs explain a large proportion of the heritability for human height., Nature Genetics 42 565–569.
  • Yang, J., Lee, S. H., Goddard, M. E. and Visscher, P. M. (2011). GCTA: A Tool for Genome-wide Complex Trait Analysis., The American Journal of Human Genetics 88 76–82.
  • Zhou, X. and Stephens, M. (2012). Genome-wide efficient mixed-model analysis for association studies., Nature genetics 44 821–824.