Statistical Science

Estimating Effects and Making Predictions from Genome-Wide Marker Data

Michael E. Goddard, Naomi R. Wray, Klara Verbyla, and Peter M. Visscher

Full-text: Open access


In genome-wide association studies (GWAS), hundreds of thousands of genetic markers (SNPs) are tested for association with a trait or phenotype. Reported effects tend to be larger in magnitude than the true effects of these markers, the so-called “winner’s curse.” We argue that the classical definition of unbiasedness is not useful in this context and propose to use a different definition of unbiasedness that is a property of the estimator we advocate. We suggest an integrated approach to the estimation of the SNP effects and to the prediction of trait values, treating SNP effects as random instead of fixed effects. Statistical methods traditionally used in the prediction of trait values in the genetics of livestock, which predates the availability of SNP data, can be applied to analysis of GWAS, giving better estimates of the SNP effects and predictions of phenotypic and genetic values in individuals.

Article information

Statist. Sci. Volume 24, Number 4 (2009), 517-529.

First available in Project Euclid: 20 April 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Genome-wide association study prediction estimation


Goddard, Michael E.; Wray, Naomi R.; Verbyla, Klara; Visscher, Peter M. Estimating Effects and Making Predictions from Genome-Wide Marker Data. Statist. Sci. 24 (2009), no. 4, 517--529. doi:10.1214/09-STS306.

Export citation


  • [1] Allison, D. B., Fernandez, J. R., Heo, M., Zhu, S. K., Etzel, C., Beasley, T. M. and Amos, C. I. (2002). Bias in estimates of quantitative-trait-locus effect in genome scans: Demonstration of the phenomenon and a method-of-moments procedure for reducing bias. Am. J. Hum. Genet. 70 575–585.
  • [2] Almasy, L. and Blangero, J. (1998). Multipoint quantitative-trait linkage analysis in general pedigrees. Am. J. Hum. Genet. 62 1198–1211.
  • [3] Aulchenko, Y. S., Struchalin, M. V., Belonogova, N. M., Axenovich, T. I., Weedon, M. N., Hofman, A., Uitterlinden, A. G., Kayser, M., Oostra, B. A., van Duijn, C. M., Janssens, A. C. and Borodin, P. M. (2009). Predicting human height by Victorian and genomic methods. Eur. J. Hum. Genet. 17 1070–1075.
  • [4] Beavis, W. D. (1988). QTL analyses: Power, precision and accuracy. Molecular Dissection of Complex Traits (A. H. Paterson, ed.) 145–162. CRC Press, New York.
  • [5] Beavis, W. D. (1994). The power and deceit of QTL experiments: Lessons from comparative QTL studies. In Proceedings of the Forty-ninth Annual Corn and Sorghum Industry Research Conference 250–266. American Seed Trade Association, Washington, DC.
  • [6] Bhangale, T. R., Rieder, M. J. and Nickerson, D. A. (2008). Estimating coverage and power for genetic association studies using near-complete variation data. Nature Genetics 40 841–843.
  • [7] Bogdan, M. and Doerge, R. W. (2005). Biased estimators of quantitative trait locus heritability and location in interval mapping. Heredity 95 476–484.
  • [8] Casella, G. and Berger, R. L. (1990). Statistical Inference. Duxbury Press, Belmont.
  • [9] de Roos, A. P. W., Hayes, B. J., Spelman, R. J. and Goddard, M. E. (2008). Linkage disequilibrium and persistence of phase in Holstein–Friesian, Jersey and Angus cattle. Genetics 179 1503–1512.
  • [10] Fernando, R. L. and Gianola D. (1986). Optimal properties of the conditional mean as a selection criterion. Theor. Appl. Genet. 72 822–825.
  • [11] Foster, S. D., Verbyla, A. P. and Pitchford, W. S. (2007). Incorporating LASSO effects into a mixed model for quantitative trait loci detection. Journal of Agricultural Biological and Environmental Statistics 12 300–314.
  • [12] Foster, S. D., Verbyla, A. P. and Pitchford, W. S. (2008). A random model approach for the LASSO. Comput. Statist. 23 217–233.
  • [13] Frazer, K. A., Ballinger, D. G., Cox, D. R., Hinds, D. A., Stuve, L. L., Gibbs, R. A., Belmont, J. W., Boudreau, A., Hardenbol, P., Leal, S. M., Pasternak, S., Wheeler, D. A., Willis, T. D., Yu, F. L., Yang, H. M., Zeng, C. Q., Gao, Y., Hu, H. R., Hu, W. T., Li, C. H., Lin, W., Liu, S. Q., Pan, H., Tang, X. L., Wang, J., Wang, W., Yu, J., Zhang, B., Zhang, Q. R., Zhao, H. B., Zhao, H., Zhou, J., Gabriel, S. B., Barry, R., Blumenstiel, B., Camargo, A., Defelice, M., Faggart, M., Goyette, M., Gupta, S., Moore, J., Nguyen, H., Onofrio, R. C., Parkin, M., Roy, J., Stahl, E., Winchester, E., Ziaugra, L., Altshuler, D., Shen, Y., Yao, Z. J., Huang, W., Chu, X., He, Y. G., Jin, L., Liu, Y. F., Shen, Y. Y., Sun, W. W., Wang, H. F., Wang, Y., Xiong, X. Y., Xu, L., Waye, M. M. Y., Tsui, S. K. W., Wong, J. T. F., Galver, L. M., Fan, J. B., Gunderson, K., Murray, S. S., Oliphant, A. R., Chee, M. S., Montpetit, A., Chagnon, F., Ferretti, V., Leboeuf, M., Olivier, J. F., Phillips, M. S., Roumy, S., Sallee, C., Verner, A., Hudson, T. J., Kwok, P. Y., Cai, D. M., Koboldt, D. C., Miller, R. D., Pawlikowska, L., Taillon-Miller, P., Xiao, M., Tsui, L. C., Mak, W., Song, Y. Q., Tam, P. K. H., Nakamura, Y., Kawaguchi, T., Kitamoto, T., Morizono, T., Nagashima, A., Ohnishi, Y., Sekine, A., Tanaka, T., Tsunoda, T., Deloukas, P., Bird, C. P., Delgado, M., Dermitzakis, E. T., Gwilliam, R., Hunt, S., Morrison, J., Powell, D., Stranger, B. E., Whittaker, P., Bentley, D. R., Daly, M. J., de Bakker, P. I. W., Barrett, J., Chretien, Y. R., Maller, J., McCarroll, S., Patterson, N., Pe’er, I., Price, A., Purcell, S., Richter, D. J., Sabeti, P., Saxena, R., Schaffner, S. F., Sham, P. C., Varilly, P., Stein, L. D., Krishnan, L., Smith, A. V., Tello-Ruiz, M. K., Thorisson, G. A., Chakravarti, A., Chen, P. E., Cutler, D. J., Kashuk, C. S., Lin, S., Abecasis, G. R., Guan, W. H., Li, Y., Munro, H. M., Qin, Z. H. S., Thomas, D. J., McVean, G., Auton, A., Bottolo, L., Cardin, N., Eyheramendy, S., Freeman, C., Marchini, J., Myers, S., Spencer, C., Stephens, M., Donnelly, P., Cardon, L. R., Clarke, G., Evans, D. M., Morris, A. P., Weir, B. S., Johnson, T. A., Mullikin, J. C., Sherry, S. T., Feolo, M., Skol, A. and Int HapMap, C. (2007). A second generation human haplotype map of over 3.1 million SNPs. Nature 449 851–861.
  • [14] Ghosh, A., Zou, F. and Wright, F. A. (2008). Estimating odds ratios in genome scans: An approximate conditional likelihood approach. Am. J. Hum. Genet. 82 1064–1074.
  • [15] Gianola, D., Fernando, R. L. and Stella, A. (2006). Genomic-assisted prediction of genetic value with semiparametric procedures. Genetics 173 1761–1776.
  • [16] Goddard, M. E. (1991). Mapping genes for quantitative traits using linkage disequilibrium. Genetics Selection and Evolution 23 S131–S134.
  • [17] Goddard, M. E. (2009). Genomic selection: Prediction of accuracy and maximisation of long term response. Genetica 136 245–257.
  • [18] Goring, H. H. H., Terwilliger, J. D. and Blangero, J. (2001). Large upward bias in estimation of locus-specific effects from genomewide scans. Am. J. Hum. Genet. 69 1357–1369.
  • [19] Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • [20] Hayes, B. J., Bowman, P. J., Chamberlain, A. J. and Goddard, M. E. (2009). Invited review: Genomic selection in dairy cattle: Progress and challenges. Journal of Dairy Science 92 433–443.
  • [21] Hayes, B. J., Gjuvsland, A. and Omholt, S. (2006). Power of QTL mapping experiments in commercial Atlantic salmon populations, exploiting linkage and linkage disequilibrium and effect of limited recombination in males. Heredity 97 19–26.
  • [22] Hayes, B. J., Visscher, P. M. and Goddard, M. E. (2009). Increased accuracy of artificial selection by using the realised relationship matrix. Genetics Research 91 47–60.
  • [23] Henderson, C. R. (1950). Estimation of genetic parameters. Ann. Math. Stat. 21 309–310.
  • [24] Henderson, C. R. (1973). Sire evaluation and genetic trends. In: Proceedings of the Animal Breeding and Genetics Symposium in Honor of Dr. J. L. Lush; 1973 10–41. American Society of Animal Science, Campaign, IL.
  • [25] Henderson, C. R. (1975). Best linear unbiased estimation and prediction under a selection model. Biometrics 31 423–449.
  • [26] Hill, W. G., Goddard, M. E. and Visscher, P. M. (2008). Data and theory point to mainly additive genetic variance for complex traits. Plos Genetics 4 1–10.
  • [27] Hindorff, L. A., Junkins, H. A., Mehta, J. P. and Manolio, T. A. (2009). A catalog of published genome-wide association studies. Available at
  • [28] Hoggart, C. J., Whittaker, J. C., De Iorio, M. and Balding, D. J. (2008). Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 4 e1000130.
  • [29] Kruuk, L. E. B. (2004). Estimating genetic parameters in natural populations using the “animal model.” Philosophical Transactions of the Royal Society of London Series B—Biological Sciences 359 873–890.
  • [30] Lande, R. and Thompson, R. (1990). Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124 743–756.
  • [31] Lee, S. H., van der Werf, J. H. J., Hayes, B. J., Goddard, M. E. and Visscher, P. M. (2008). Predicting unobserved phenotypes for complex traits from whole-genome SNP data. Plos Genetics 4 1–11.
  • [32] Lewinger, J. P., Conti, D. V., Baurley, J. W., Triche, T. J. and Thomas, D. C. (2007). Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genet. Epidemiol. 31 871–882.
  • [33] Lynch, M. and Walsh, B. (1998). Genetics and Analysis of Quantitative Traits. Sinauer, Sunderland, MA.
  • [34] Maher, B. (2008). Personal genomes: The case of the missing heritability. Nature 456 18–21.
  • [35] McCarthy, M. I., Abecasis, G. R., Cardon, L. R., Goldstein, D. B., Little, J., Ioannidis, J. P. A. and Hirschhorn, J. N. (2008). Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nature Reviews Genetics 9 356–369.
  • [36] Meuwissen, T. H. E., Hayes, B. J. and Goddard, M. E. (2001). Prediction of total genetic value using genome-wide dense marker maps. Genetics 157 1819–1829.
  • [37] Robinson, G. K. (1991). That BLUP is a good thing: The estimation of random effects. Statist. Sci. 6 15–32.
  • [38] Siegmund, D. (2002). Upward bias in estimation of genetic effects. Am. J. Hum. Genet. 71 1183–1188.
  • [39] Skol, A. D., Scott, L. J., Abecasis, G. R. and Boehnke, M. (2006). Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature Genetics 38 209–213.
  • [40] Sorenson, P. and Gianola, D. (2002). Likelihood, Bayesian, and MCMC Methods in Quantitative Genetics. Springer, New York.
  • [41] Sun, L. and Bull, S. B. (2005). Reduction of selection bias in genomewide studies by resampling. Genet. Epidemiol. 28 352–367.
  • [42] Tenesa, A., Navarro, P., Hayes, B. J., Duffy, D. L., Clarke, G. M., Goddard, M. E. and Visscher, P. M. (2007). Recent human effective population size estimated from linkage disequilibrium. Genome Research 17 520–526.
  • [43] Thompson, R. (1979). Sire evaluation. Biometrics 35 339–353.
  • [44] Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • [45] Utz, H. F., Melchinger, A. E. and Schon, C. C. (2000). Bias and sampling error of the estimated proportion of genotypic variance explained by quantitative trait loci determined from experimental data in maize using cross validation and validation with independent samples. Genetics 154 1839–1849.
  • [46] Valdar, W., Solberg, L. C., Gauguier, D., Cookson, W. O., Rawlins, J. N. P., Mott, R. and Flint, J. (2006). Genetic and environmental effects on complex traits in mice. Genetics 174 959–984.
  • [47] VanRaden, P. M. (2008). Efficient methods to compute genomic predictions. Journal of Dairy Science 91 4414–4423.
  • [48] Visscher, P. M. (2008). Sizing up human height variation. Nature Genetics 40 489–490.
  • [49] Visscher, P. M., Medland, S. E., Ferreira, M. A. R., Morley, K. I., Zhu, G., Cornes, B. K., Montgomery, G. W. and Martin, N. G. (2006). Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings. Plos Genetics 2 316–325.
  • [50] Visscher, P. M., Thompson, R., Haley, C. S. (1996). Confidence intervals in QTL mapping by bootstrapping. Genetics 143 1013–1020.
  • [51] Weller, J. I., Shlezinger, M. and Ron, M. (2005). Correcting for bias in estimation of quantitative trait loci effects. Genetics Selection Evolution 37 501–522.
  • [52] Wray, N. R., Goddard, M. E. and Visscher, P. M. (2007). Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17 1520–1528.
  • [53] Xu, S. Z. (2003). Theoretical basis of the Beavis effect. Genetics 165 2259–2268.
  • [54] Zhong, H. and Prentice, R. L. (2008). Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9 621–634.
  • [55] Zollner, S. and Pritchard, J. K. (2007). Overcoming the winner’s curse: Estimating penetrance parameters from case-control data. Am. J. Hum. Genet. 80 605–615.
  • [56] WTCCC (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 661–678.