Institute of Mathematical Statistics Collections

A comparison of the Benjamini-Hochberg procedure with some Bayesian rules for multiple testing

Małgorzata Bogdan, Jayanta K. Ghosh, Surya T. Tokdar

Abstract

In the spirit of modeling inference for microarrays as multiple testing for sparse mixtures, we present a similar approach to a simplified version of quantitative trait loci (QTL) mapping. Unlike in case of microarrays, where the number of tests usually reaches tens of thousands, the number of tests performed in scans for QTL usually does not exceed several hundreds. However, in typical cases, the sparsity p of significant alternatives for QTL mapping is in the same range as for microarrays. For methodological interest, as well as some related applications, we also consider non-sparse mixtures. Using simulations as well as theoretical observations we study false discovery rate (FDR), power and misclassification probability for the Benjamini-Hochberg (BH) procedure and its modifications, as well as for various parametric and nonparametric Bayes and Parametric Empirical Bayes procedures. Our results confirm the observation of Genovese and Wasserman (2002) that for small p the misclassification error of BH is close to optimal in the sense of attaining the Bayes oracle. This property is shared by some of the considered Bayes testing rules, which in general perform better than BH for large or moderate p’s.

First Page: Show Hide
Primary Subjects: 62C10
Secondary Subjects: 62C12
Keywords: Bayesian multiple testing; empirical Bayes; nonparametric Bayes
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.imsc/1207058275
Digital Object Identifier: doi:10.1214/193940307000000158

References

[1] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
Mathematical Reviews (MathSciNet): MR1325392
[2] Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery fate in multiple testing with independent statistics. J. Educ. Behav. Stat. 25 60–83.
[3] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
Mathematical Reviews (MathSciNet): MR1869245
Zentralblatt MATH: 1041.62061
Digital Object Identifier: doi:10.1214/aos/1013699998
Project Euclid: euclid.aos/1013699998
[4] Bogdan, M., Ghosh, J. K. and Doerge, R. W. (2004). Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitive trait loci. Genetics 167 989–999.
[5] Chen, J. and Sarkar, S. K. (2004). Multiple testing of response rates with a control: A Bayesian stepwise approach. J. Statist. Plann. Inference 125 3–16.
Mathematical Reviews (MathSciNet): MR2086885
Zentralblatt MATH: 1096.62022
Digital Object Identifier: doi:10.1016/j.jspi.2003.05.001
[6] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
Mathematical Reviews (MathSciNet): MR2065195
Zentralblatt MATH: 1092.62051
Digital Object Identifier: doi:10.1214/009053604000000265
Project Euclid: euclid.aos/1085408492
[7] Efron, B. and Tibshirani, R. (2002). Empirical bayes methods and false discovery rates for microarrays. Genetic Epidemiology 23 70–86.
[8] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
Mathematical Reviews (MathSciNet): MR1946571
Zentralblatt MATH: 1073.62511
Digital Object Identifier: doi:10.1198/016214501753382129
[9] Elmore, R., Hall, P. and Neeman, A. (2005). An application of classical invariant theory to identifiability in nonparametric mixtures. Ann. Inst. Fourier (Grenoble) 55 1–28.
Mathematical Reviews (MathSciNet): MR2141286
[10] Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577–588.
Mathematical Reviews (MathSciNet): MR1340510
Zentralblatt MATH: 0826.62021
Digital Object Identifier: doi:10.2307/2291069
[11] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. Ann. Statist. 1 209–230.
Mathematical Reviews (MathSciNet): MR350949
Zentralblatt MATH: 0255.62037
Digital Object Identifier: doi:10.1214/aos/1176342360
Project Euclid: euclid.aos/1176342360
[12] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 499–517.
Mathematical Reviews (MathSciNet): MR1924303
Zentralblatt MATH: 1090.62072
Digital Object Identifier: doi:10.1111/1467-9868.00347
[13] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
Mathematical Reviews (MathSciNet): MR2065197
Zentralblatt MATH: 1092.62065
Digital Object Identifier: doi:10.1214/009053604000000283
Project Euclid: euclid.aos/1085408494
[14] Ghosh, J. K. and Ramamoorthi, R. V. (2003). Bayesian Nonparametrics. Springer, New York.
Mathematical Reviews (MathSciNet): MR1992245
[15] Ghosh, J. K. and Sen, P. K. (1985). On the asymptotic performance of the log likelihood ratio statistic for the mixture model and related results. In Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer II (Berkeley, Calif., 1983) 789–806. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet): MR822065
[16] Holm, S. (1979). A simple sequentially rejective multiple test procedure. Scand. J. Statist. 6 65–70.
Mathematical Reviews (MathSciNet): MR538597
[17] Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the familywise error rate. Ann. Statist. 33 1138–1154.
Mathematical Reviews (MathSciNet): MR2195631
Zentralblatt MATH: 1072.62060
Digital Object Identifier: doi:10.1214/009053605000000084
Project Euclid: euclid.aos/1120224098
[18] Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373–393.
Mathematical Reviews (MathSciNet): MR2275246
Zentralblatt MATH: 1091.62059
Digital Object Identifier: doi:10.1214/009053605000000741
Project Euclid: euclid.aos/1146576267
[19] Müller, P., Parmigiani, G., Robert, C. and Rousseau, J. (2004). Optimal sample size for multiple testing: The case of gene expression microarrays. J. Amer. Statist. Assoc. 99 990–1001.
[20] Müller, P., Parmigiani, G. and Rice, K. (2007). FDR and Bayesian multiple comparison. In Bayesian Statistics 8 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.). Oxford Univ. Press.
[21] Newton, M. A. (2002). On a nonparametric recursive estimator of the mixing distribution. Sankhyā Ser. A 64 306–322.
Mathematical Reviews (MathSciNet): MR1981761
[22] Otto, S. P. and Jones, C. D. (2000). Detecting the undetected: Estimating the total number of loci underlying a quantitative trait. Genetics 156 2093–2107.
[23] Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239–257.
Mathematical Reviews (MathSciNet): MR1892663
Zentralblatt MATH: 1101.62349
Digital Object Identifier: doi:10.1214/aos/1015362192
Project Euclid: euclid.aos/1015362192
[24] Sarkar, S. K. (2006). False discovery and false nondiscovery rates in single-step multiple testing procedures. Ann. Statist. 34 394–415.
Mathematical Reviews (MathSciNet): MR2275247
Zentralblatt MATH: 1091.62060
Digital Object Identifier: doi:10.1214/009053605000000778
Project Euclid: euclid.aos/1146576268
[25] Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
Mathematical Reviews (MathSciNet): MR2235051
Zentralblatt MATH: 1087.62039
Digital Object Identifier: doi:10.1016/j.jspi.2005.08.031
[26] Seeger, P. (1968). A note on a method for the analysis of significance en masse. Technometrics 10 586–593.
[27] Simes, R. J. (1986). An improved Bonferroni procedure for multiple tests of significance. Biometrika 73 751–754.
Mathematical Reviews (MathSciNet): MR897872
Zentralblatt MATH: 0613.62067
Digital Object Identifier: doi:10.1093/biomet/73.3.751
[28] Sorić, B. (1989). Statistical “discoveries” and effect-size estimation. J. Amer. Statist. Assoc. 84 608–610.
[29] Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
Mathematical Reviews (MathSciNet): MR1924302
Zentralblatt MATH: 1090.62073
Digital Object Identifier: doi:10.1111/1467-9868.00346
[30] Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value. Ann. Statist. 31 2013–2035.
Mathematical Reviews (MathSciNet): MR2036398
Zentralblatt MATH: 1042.62026
Digital Object Identifier: doi:10.1214/aos/1074290335
Project Euclid: euclid.aos/1074290335
[31] Storey, J. D. (2007). The optimal discovery procedure: A new approach to simultaneous significance testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 347–368.
Mathematical Reviews (MathSciNet): MR2323757
Digital Object Identifier: doi:10.1111/j.1467-9868.2007.005592.x
[32] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–205.
Mathematical Reviews (MathSciNet): MR2035766
Zentralblatt MATH: 1061.62110
Digital Object Identifier: doi:10.1111/j.1467-9868.2004.00439.x
[33] Yi, N. (2004). A unified markov chain monte carlo framework for mapping multiple quantitative trait loci. Genetics 167 967–975.

2012 © Institute of Mathematical Statistics

Institute of Mathematical Statistics Collections

Institute of Mathematical Statistics Collections