Statistical Science

On Combining Data From Genome-Wide Association Studies to Discover Disease-Associated SNPs

Ruth M. Pfeiffer, Mitchell H. Gail, and David Pee

Full-text: Open access


Combining data from several case-control genome-wide association (GWA) studies can yield greater efficiency for detecting associations of disease with single nucleotide polymorphisms (SNPs) than separate analyses of the component studies. We compared several procedures to combine GWA study data both in terms of the power to detect a disease-associated SNP while controlling the genome-wide significance level, and in terms of the detection probability (DP). The DP is the probability that a particular disease-associated SNP will be among the T most promising SNPs selected on the basis of low p-values. We studied both fixed effects and random effects models in which associations varied across studies. In settings of practical relevance, meta-analytic approaches that focus on a single degree of freedom had higher power and DP than global tests such as summing chi-square test-statistics across studies, Fisher’s combination of p-values, and forming a combined list of the best SNPs from within each study.

Article information

Statist. Sci., Volume 24, Number 4 (2009), 547-560.

First available in Project Euclid: 20 April 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Whole genome scans hypothesis testing random effects Wald test multiple comparison


Pfeiffer, Ruth M.; Gail, Mitchell H.; Pee, David. On Combining Data From Genome-Wide Association Studies to Discover Disease-Associated SNPs. Statist. Sci. 24 (2009), no. 4, 547--560. doi:10.1214/09-STS286.

Export citation


  • Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrics 11 375–386.
  • DerSimonian, R. and Laird, N. (1986). Meta-analysis in clinical trials. Control. Clin. Trials 7 177–188.
  • Devlin, B. and Roeder, K. (1999). Genomic control for association studies. Biometrics 55 997–1004.
  • Easton, D. F., Pooley, K. A., Dunning, A. M. et al. (2007). Genome-wide association study identifies novel breast cancer susceptibility loci. Nature 28 1087–1093.
  • Fisher, R. A. (1932). Statistical Methods for Research Workers, 4th ed. Oliver and Boyd, London.
  • Follmann, D. A. and Proschan, M. A. (1999). Valid inference in random effects meta-analysis. Biometrics 55 732–737.
  • Gail, M. H., Pfeiffer, R. M., Wheeler, W. and Pee, D. (2008a). Probability of detecting disease-associated single nucleotide polymorphisms in case-control genome-wide association studies. Biostatistics 9 201–215.
  • Gail, M. H., Pfeiffer, R. M., Wheeler, W. and Pee, D. (2008b). Probability that a two-stage genome-wide association study will detect a disease-associated SNP and implications for multistage designs. Ann. Hum. Genet. 72 812–820.
  • Liptak, T. (1958). On the combination of independent tests. Magyar Tudomanyos Akademia Matematikai Kutato Intezetenek Kozlemenyei 3 1971–1977.
  • Loughin, T. M. (2004). A systematic comparison of methods for combining p-values from independent tests. Comput. Statist. Data Anal. 47 467–485.
  • Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer Inst. 22 719–748.
  • Pfeiffer, R. M. and Gail, M. H. (2003). Sample size calculations for population- and family-based case-control association studies on marker genotypes. Genet. Epidemiol. 25 136–148.
  • Sasieni, P. D. (1997). From genotypes to genes: Doubling the sample size. Biometrics 53 1253–1261.
  • Skol, A. D., Scott, L. J., Abacasis, G. R. and Boehnke, M. (2007). Optimal designs for two-stage genome-wide association studies. Genet. Epidemiol. 31 776–788.
  • Skol, A. D., Scott, L. J., Abacasis, G. R. and Boehnke, M. (2006). Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38 209–213.
  • Yusuf, S., Peto, R., Lewis, J., Collins, R. and Sleight, P. (1985). Beta blockade during and after myocardial infarction: An overview of the randomized trials. Prog. Cardiovasc. Dis. 27 335–371.
  • Yeager, M., Orr, N., Hayes, R. B. et al. (2007). Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat. Genet. 39 645–649.
  • Zeggini, E., Scott, L. J., Saxena, R. et al. (2008). Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 40 638–645.