Statistical Science

Replication in Genome-Wide Association Studies

Peter Kraft, Eleftheria Zeggini, and John P. A. Ioannidis

Full-text: Open access

Abstract

Replication helps ensure that a genotype-phenotype association observed in a genome-wide association (GWA) study represents a credible association and is not a chance finding or an artifact due to uncontrolled biases. We discuss prerequisites for exact replication, issues of heterogeneity, advantages and disadvantages of different methods of data synthesis across multiple studies, frequentist vs. Bayesian inferences for replication, and challenges that arise from multi-team collaborations. While consistent replication can greatly improve the credibility of a genotype-phenotype association, it may not eliminate spurious associations due to biases shared by many studies. Conversely, lack of replication in well-powered follow-up studies usually invalidates the initially proposed association, although occasionally it may point to differences in linkage disequilibrium or effect modifiers across studies.

Article information

Source
Statist. Sci. Volume 24, Number 4 (2009), 561-573.

Dates
First available in Project Euclid: 20 April 2010

Permanent link to this document
http://projecteuclid.org/euclid.ss/1271770349

Digital Object Identifier
doi:10.1214/09-STS290

Mathematical Reviews number (MathSciNet)
MR2779344

Keywords
Genome-wide association study replication meta-analysis

Citation

Kraft, Peter; Zeggini, Eleftheria; Ioannidis, John P. A. Replication in Genome-Wide Association Studies. Statist. Sci. 24 (2009), no. 4, 561--573. doi:10.1214/09-STS290. http://projecteuclid.org/euclid.ss/1271770349.


Export citation

References

  • [1] (1999). Freely associating. Nat. Genet. 22 1–2.
  • [2] Barrett, J. C. et al. (2008). Genome-wide association defines more than 30 distinct susceptibility loci for Crohn’s disease. Nat. Genet. 40 955–962.
  • [3] Bertram, L. et al. (2007). Systematic meta-analyses of Alzheimer disease genetic association studies: The AlzGene database. Nat. Genet. 39 17–23.
  • [4] Biggerstaff, B. J. and Jackson, D. (2008). The exact distribution of Cochran’s heterogeneity statistic in one-way random effects meta-analysis. Stat. Med. 27 6093–6110.
  • [5] Breitling, L., Steyerberg, E. and Brenner, H. (2009). The novel genomic pathway approach to complex diseases: A reason for (over-) optimism? Epidemiol. 20 500–507.
  • [6] Caporaso, N. et al. (2009). Genome-wide and candidate gene association study of cigarette smoking behaviors. PLoS ONE 4 e4653.
  • [7] Chanock, S. J. et al. (2007). Replicating genotype-phenotype associations. Nature 447 655–660.
  • [8] Clarke, G. M. et al. (2007). Fine mapping versus replication in whole-genome association studies. Am. J. Hum. Genet. 81 995–1005.
  • [9] Cochran, W. (1954). The combination of estimates from different experiments. Biometrics 10 101–129.
  • [10] Cooper, J. D. et al. (2008). Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nat. Genet. 40 1399–1401.
  • [11] Das, S. K. and Elbein, S. C. (2007). The search for type 2 diabetes susceptibility loci: The chromosome 1q story. Curr. Diab. Rep. 7 154–164.
  • [12] de Bakker, P. I. et al. (2008). Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum. Mol. Genet. 17 R122–R128.
  • [13] Frayling, T. M. et al. (2007). A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316 889–894.
  • [14] Gibson, G. (2009). Decanalization and the origin of complex disease. Nat. Rev. Genet. 10 134–140.
  • [15] Greenland, S. (2003). Quantifying biases in causal models: Classical confounding vs collider-stratification bias. Epidemiology 14 300–306.
  • [16] Guan, W. et al. (2008). Meta-analysis of 23 type 2 diabetes linkage studies from the International Type 2 Diabetes Linkage Analysis Consortium. Hum. Hered. 66 35–49.
  • [17] Guan, W. et al. (2009). Genotype-based matching to correct for population stratification in large-scale case-control genetic association studies. Genet. Epidemiol. 33 508–517.
  • [18] Gusnato, A. and Dudbridge, F. (2007). Estimating genome-wide significance levels for association. In European Mathematical Genetics Meetings, Heidelberg.
  • [19] Han, J. et al. (2008). A genome-wide association study identifies novel alleles associated with hair color and skin pigmentation. PLoS Genet. 4 e1000074.
  • [20] Higgins, J. (2009). A re-evaluation of random-effects meta-analysis. J. Roy. Statist. Soc. Ser. A 172 137–159.
  • [21] Higgins, J. and Thompson, S. (2002). Quantifying heterogeneity in a meta-analysis. Stat. Med. 21 1539–1558.
  • [22] Hill, A. B. (1965). The environment and disease: Association or causation? Proc. R. Soc. Med. 58 295–300.
  • [23] Hirschhorn, J. N. and Altshuler, D. (2002). Once and again-issues surrounding replication in genetic association studies. J. Clin. Endocrinol. Metab. 87 4438–4441.
  • [24] Hoggart, C. J. et al. (2008). Genome-wide significance for dense SNP and resequencing data. Genet. Epidemiol. 32 179–185.
  • [25] Homer, N. et al. (2008). Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4 e1000167.
  • [26] Huedo-Medina, T. B. et al. (2006). Assessing heterogeneity in meta-analysis: Q statistic or I2 index? Psychol. Methods 11 193–206.
  • [27] Ioannidis, J. P. (2005). Why most published research findings are false. PLoS. Med. 2 e124.
  • [28] Ioannidis, J. P. (2008). Why most discovered true associations are inflated. Epidemiology 19 640–648.
  • [29] Ioannidis, J. P. (2008). Effect of formal statistical significance on the credibility of observational associations. Am. J. Epidemiol. 168 374–383; discussion 384–390.
  • [30] Ioannidis, J. P., Patsopoulos, N. A. and Evangelou, E. (2007). Uncertainty in heterogeneity estimates in meta-analyses. Bmj. 335 914–916.
  • [31] Ioannidis, J. P. et al. (2008). Assessment of cumulative evidence on genetic associations: Interim guidelines. Int. J. Epidemiol. 37 120–132.
  • [32] Kavvoura, F. K. and Ioannidis, J. P. (2008). Methods for meta-analysis in genetic association studies: A review of their potential and pitfalls. Hum. Genet. 123 1–14.
  • [33] Kraft, P. and Cox, D. G. (2008). Study designs for genome-wide association studies. Adv. Genet. 60 465–504.
  • [34] Kraft, P. et al. (2009). Beyond odds ratios—communicating disease risk based on genetic profiles. Nat. Rev. Genet. 10 264–269.
  • [35] Lesnick, T. G. et al. (2007). A genomic pathway approach to a complex disease: Axon guidance and Parkinson disease. PLoS Genet. 3 e98.
  • [36] Lettre, G., Lange, C. and Hirschhorn, J. N. (2007). Genetic model testing and statistical power in population-based association studies of quantitative traits. Genet. Epidemiol. 31 358–362.
  • [37] Lettre, G. et al. (2008). Identification of ten loci associated with height highlights new biological pathways in human growth. Nat. Genet. 40 584–591.
  • [38] Li, Y. and Abecasis, G. (2006). Mach 1.0: Rapid haplotype reconstruction and missing genotype inference. Am. J. Hum. Genet. S79 2290.
  • [39] Li, Y. et al. (2009). Markov model for rapid haplotyping and genotype imputation in genome-wide studies. To appear.
  • [40] Lin, D. Y. and Zeng, D. (2008). Proper analysis of secondary phenotype data in case-control association studies. Genet. Epidemiol. 33 256–265.
  • [41] Lin, P. I. et al. (2007). No gene is an island: The flip–flop phenomenon. Am. J. Hum. Genet. 80 531–538.
  • [42] Little, J. et al. (2009). STrengthening the REporting of Genetic Association Studies (STREGA): An extension of the STROBE statement. PLoS Med. 6 e22.
  • [43] Loos, R. J. et al. (2008). Common variants near MC4R are associated with fat mass, weight and risk of obesity. Nat. Genet. 40 768–775.
  • [44] Luca, D. et al. (2008). On the use of general control samples for genome-wide association studies: Genetic matching highlights causal variants. Am. J. Hum. Genet. 82 453–463.
  • [45] Manolio, T. A., Brooks, L. D. and Collins, F. S. (2008). A HapMap harvest of insights into the genetics of common disease. J. Clin. Invest. 118 1590–1605.
  • [46] Marchini, J. et al. (2007). A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39 906–913.
  • [47] Mitchell, A. A., Cutler, D. J. and Chakravarti, A. (2003). Undetected genotyping errors cause apparent overtransmission of common alleles in the transmission/disequilibrium test. Am. J. Hum. Genet. 72 598–610.
  • [48] Monsees, G., Tamimi, R. and Kraft, P. (2009). Genome-wide association scans for secondary traits using case-control studies. Genet. Epidemiol. 33 717–728.
  • [49] Mutsuddi, M. et al. (2006). Analysis of high-resolution HapMap of DTNBP1 (Dysbindin) suggests no consistency between reported common variant associations and schizophrenia. Am. J. Hum. Genet. 79 903–909.
  • [50] Nothnagel, M. et al. (2009). A comprehensive evaluation of SNP genotype imputation. Hum. Genet. 125 163–171.
  • [51] Pe’er, I. et al. (2008). Estimation of the multiple testing burden for genomewide association studies of nearly all common variants. Genet. Epidemiol. 32 381–385.
  • [52] Price, A. L. et al. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38 904–909.
  • [53] Prokopenko, I. et al. (2009). Variants in MTNR1B influence fasting glucose levels. Nat. Genet. 41 77–81.
  • [54] Russell, B. (1959). The Problems of Philosophy. Oxford Univ. Press, London.
  • [55] Saccone, N. L. et al. (2008). In search of causal variants: Refining disease association signals using cross-population contrasts. BMC Genet. 9 58.
  • [56] Saxena, R. et al. (2007). Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316 1331–1336.
  • [57] Scott, L. J. et al. (2007). A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316 1341–1345.
  • [58] Skol, A. D. et al. (2006). Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38 209–213.
  • [59] The Wellcome Trust Case Control Consortium (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 661–678.
  • [60] Thomas, D. et al. (2009). Methodological issues in multistage genome-wide association studies. Statist. Sci. 24 414–429.
  • [61] Trikalinos, T. A. et al. (2008). Meta-analysis methods. Adv. Genet. 60 311–334.
  • [62] Udler, M. S. et al. (2009). FGFR2 variants and breast cancer risk: Fine-scale mapping using African American studies and analysis of chromatin conformation. Hum. Mol. Genet. 18 1692–1703.
  • [63] Vineis, P. et al. (2009). A field synopsis on low-penetrance variants in DNA repair genes and cancer susceptibility. J. Natl. Cancer. Inst. 101 24–36.
  • [64] Wacholder, S. et al. (2004). Assessing the probability that a positive report is false: An approach for molecular epidemiology studies. J. Natl. Cancer. Inst. 96 434–442.
  • [65] Wakefield, J. (2007). A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am. J. Hum. Genet. 81 208–227.
  • [66] Weedon, M. N. et al. (2008). Genome-wide association analysis identifies 20 loci that influence adult height. Nat. Genet. 40 575–583.
  • [67] Willer, C. J. et al. (2009). Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 41 25–34.
  • [68] Wray, N. R., Goddard, M. E. and Visscher, P. M. (2007). Prediction of individual genetic risk to disease from genome-wide association studies. Genome. Res. 17 1520–1528.
  • [69] Xiao, R. and Boehnke, M. (2009). Quantifying and correcting for the winner’s curse in genetic association studies. Genet. Epidemiol. 33 453–462.
  • [70] Yu, K. et al. (2007). Flexible design for following up positive findings. Am. J. Hum. Genet. 81 540–551.
  • [71] Zeggini, E. and Ioannidis, J. P. (2009). Meta-analysis in genome-wide association studies. Pharmacogenomics 10 191–201.
  • [72] Zeggini, E. et al. (2006). Variation within the gene encoding the upstream stimulatory factor 1 does not influence susceptibility to type 2 diabetes in samples from populations with replicated evidence of linkage to chromosome 1q. Diabetes 55 2541–2548.
  • [73] Zeggini, E. et al. (2007). Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316 1336–1341.
  • [74] Zeggini, E. et al. (2008). Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40 638–645.
  • [75] Zhong, H. and Prentice, R. L. (2008). Bias-reduced estimators and confidence intervals for odds ratios in genome-wide association studies. Biostatistics 9 621–634.
  • [76] Zollner, S. and Pritchard, J. K. (2007). Overcoming the winner’s curse: Estimating penetrance parameters from case-control data. Am. J. Hum. Genet. 80 605–615.