Statistical Science

Genome-Wide Significance Levels and Weighted Hypothesis Testing

Kathryn Roeder and Larry Wasserman

Full-text: Open access


Genetic investigations often involve the testing of vast numbers of related hypotheses simultaneously. To control the overall error rate, a substantial penalty is required, making it difficult to detect signals of moderate strength. To improve the power in this setting, a number of authors have considered using weighted p-values, with the motivation often based upon the scientific plausibility of the hypotheses. We review this literature, derive optimal weights and show that the power is remarkably robust to misspecification of these weights. We consider two methods for choosing weights in practice. The first, external weighting, is based on prior information. The second, estimated weighting, uses the data to choose weights.

Article information

Statist. Sci. Volume 24, Number 4 (2009), 398-413.

First available in Project Euclid: 20 April 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bonferroni correction multiple testing weighted p-values


Roeder, Kathryn; Wasserman, Larry. Genome-Wide Significance Levels and Weighted Hypothesis Testing. Statist. Sci. 24 (2009), no. 4, 398--413. doi:10.1214/09-STS289.

Export citation


  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Benjamini, Y. and Hochberg, Y. (1997). Multiple hypotheses testing with weights. Scand. J. Statist. 24 407–418.
  • Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 491–507.
  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • Bickel, P., Klaassen, C., Ritov, Y. and Wellner, J. (1993). Efficient and adaptive estimation for semiparametric models. Technical report, Johns Hopkins Series in the Mathematical Statistics, Baltimore, Maryand.
  • Blanchard, G. and Roquain, E. (2008). Two simple sufficient conditions for FDR control. Electron. J. Stat. 2 963–992.
  • Blanchard, G. and Roquain, E. (2009). Adaptive FDR control under independence and dependence. J. Mach. Learn. Res. To appear.
  • Chen, J. J., Lin, K. K., Huque, M. and Arani, R. B. (2000). Weighted p-value adjustments for animal carcinogenicity trend test. Biometrics 56 586–592.
  • Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • Efron, B. (2007). Simultaneous inference: When should hypothesis testing problems be combined? Ann. Appl. Statist. 2 197–223.
  • Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 499–517.
  • Genovese, C. R., Roeder, K. and Wasserman, L. (2006). False discovery control with p-value weighting. Biometrika 93 509–524.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
  • Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6 65–70.
  • Ionita-Laza, I., McQueen, M., Laird, N. and Lange, C. (2007). Genomewide weighted hypothesis testing in family-based association studies, with an application to a 100K scan. Am. J. Hum. Genet. 81 607–614.
  • Kropf, S., Läuter, J., Eszlinger, M., Krohn, K. and Paschke, R. (2004). Nonparametric multiple test procedures with data-driven order of hypotheses and with weighted hypotheses. J. Statist. Plann. Inference 125 31–47.
  • McQueen, M. and colleagues (2008). Personal communication.
  • Roeder, K., Bacanu, S.-A., Wasserman, L. and Devlin, B. (2006). Using linkage genome scans to improve power of association in genome scans. Am. J. Hum. Genet. 78 243–252.
  • Roeder, K., Wasserman, L. and Devlin, B. (2007). Improving power in genome-wide association studies: Weights tip the scale. Genet. Epidemiol. 31 741–747.
  • Romano, J. P., Shaikh, A. M. and Wolf, M. (2008). Control of the false discovery rate under dependence using the bootstrap and subsampling. TEST 17 417–442.
  • Roquain, E. and van de Wiel, M. (2008). Multi-weighting for FDR control. Available at arXiv:0807.4081.
  • Rosenthal, R. and Rubin, D. (1983). Ensemble-adjusted p-values. Psychol. Bull. 94 540–541.
  • Rubin, D., Dudoit, S. and van der Laan, M. (2006). A method to increase the power of multiple testing procedures through sample splitting. Stat. Appl. Genet. Mol. Biol. 5, Art. 19 (electronic).
  • Sabatti, C., Service, S. and Freimer, N. (2003). False discovery rate in linkage and association genome screens for complex disorders. Genetics 164 829–833.
  • Saccone, S., Hinrichs, A. L., Saccone, N., Chase, G., Konvicka, K., Madden, P., Breslau, N., Johnson, E., Hatsukami, D., Pomerleau, O., Swan, G., Goate, A., Rutter, J., Bertelsen, S., Fox, L., Fugman, D., Martin, N., Montgomery, G., Wang, J., Ballinger, D., Rice, J. and Bierut, L. (2007). Cholinergic nicotinic receptor genes implicated in a nicotine dependence association study targeting 348 candidate genes with 3713 SNPs. Hum. Mol. Genet. 16 36–49.
  • Sarkar, S. and Heller, R. (2008). Comments on: Control of the false discovery rate under dependence using the bootstrap and subsampling. TEST 17 450–455.
  • Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239–257.
  • Satagopan, J. and Elston, R. (2003). Optimal two-stage genotyping in population-based association studies. Genet. Epidemiol. 25 149–157.
  • Schuster, E., Kropf, S. and Roeder, I. (2004). Micro array based gene expression analysis using parametric multivariate tests per gene—a generalized application of multiple procedures with data-driven order of hypotheses. Biom. J. 46 687–698.
  • Signoravitch, J. (2006). Optimal multiple testing under the general linear model. Technical report, Harvard Biostatistics.
  • Skol, A., Scott, L., Abecasis, G. and Boehnke, M. (2006). Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nat. Genet. 38 390–394.
  • Spjøtvoll, E. (1972). On the optimality of some multiple comparison procedures. Ann. Math. Statist. 43 398–411.
  • Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
  • Storey, J. D. (2007). The optimal discovery procedure: A new approach to simultaneous significance testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 347–368.
  • Storey, J. and Tibshirani, R. (2003). Statistical significance for genome-wide studies. Proc. Natl. Acad. Sci. USA 100 9440–9445.
  • Sun, W. and Cai, T. T. (2007). Oracle and adaptive compound decision rules for false discovery rate control. J. Amer. Statist. Assoc. 102 901–912.
  • Todd, J., Walker, N., Cooper, J., Smyth, D., Downes, K., Plagnol, V., Bailey, R., Nejentsev, S., Field, S., Payne, F., Lowe, C., Szeszko, J., Hafler, J., Zeitels, L., Yang, J., Vella, A., Nutland, S., Stevens, H., Schuilenburg, H., Coleman, G., Maisuria, M., Meadows, W., Smink, L. J., Healy, B., Burren, O., Lam, A., Ovington, N., Allen, J., Adlem, E., Leung, H., Wallace, C., Howson, J., Guja, C., Ionescu-Tirgovi, C., Genetics of Type 1 Diabetes in Finland, Simmonds, M., Heward, J., Gough, S., Wellcome Trust Case Control Consortium, Dunger, D., Wicker, L. and Clayton, D. (2007). Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat. Genet. 39 857–864.
  • Wang, H., Thomas, D., Pe’er, I. and Stram, D. (2006). Optimal two-stage genotyping designs for genome-wide association scans. Genet. Epidemiol. 30 356–368.
  • Wang, K., Li, M. and Bucan, M. (2007). Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. 81 1278–1283.
  • Westfall, P., Krishen, A. and Young, S. (1998). Using prior information to allocate significance levels for multiple endpoints. Stat. Med. 17 2107–2119.
  • Westfall, P. H. and Krishen, A. (2001). Optimally weighted, fixed sequence and gatekeeper multiple testing procedures. J. Statist. Plann. Inference 99 25–40.
  • Westfall, P. H., Kropf, S. and Finos, L. (2004). Weighted FWE-controlling methods in high-dimensional situations. In Recent Developments in Multiple Comparison Procedures. IMS Lecture Notes Monogr. Ser. 47 143–154. IMS, Beachwood, OH.
  • Westfall, P. H. and Soper, K. A. (2001). Using priors to improve multiple animal carcinogenicity tests. J. Amer. Statist. Assoc. 96 827–834.