Electronic Journal of Statistics

Some optimality properties of FDR controlling rules under sparsity

Florian Frommlet and Małgorzata Bogdan

Full-text: Open access

Abstract

False Discovery Rate (FDR) and the Bayes risk are two different statistical measures, which can be used to evaluate and compare multiple testing procedures. Recent results show that under sparsity FDR controlling procedures, like the popular Benjamini-Hochberg (BH) procedure, perform also very well in terms of the Bayes risk. In particular asymptotic Bayes optimality under sparsity (ABOS) of BH was shown previously for location and scale models based on log-concave densities. This article extends previous work to a substantially larger set of distributions of effect sizes under the alternative, where the alternative distribution of true signals does not change with the number of tests $m$, while the sample size $n$ slowly increases. ABOS of BH and the corresponding step-down procedure based on FDR levels proportional to $n^{-1/2}$ are proved. A simulation study shows that these asymptotic results are relevant already for relatively small values of $m$ and $n$. Apart from showing asymptotic optimality of BH, our results on the optimal FDR level provide a natural extension of the well known results on the significance levels of Bayesian tests.

Article information

Source
Electron. J. Statist., Volume 7 (2013), 1328-1368.

Dates
First available in Project Euclid: 10 May 2013

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1368193534

Digital Object Identifier
doi:10.1214/13-EJS808

Mathematical Reviews number (MathSciNet)
MR3063610

Zentralblatt MATH identifier
1337.62184

Keywords
Asymptotic optimality Bayes risk false discovery rate multiple testing two groups model

Citation

Frommlet, Florian; Bogdan, Małgorzata. Some optimality properties of FDR controlling rules under sparsity. Electron. J. Statist. 7 (2013), 1328--1368. doi:10.1214/13-EJS808. https://projecteuclid.org/euclid.ejs/1368193534


Export citation

References

  • [1] Abramovich F., Benjamini Y., Donoho D. L. and Johnstone I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate., Ann. Statist. 34, 584–653. MR2281879
  • [2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing., J. Roy. Statist. Soc. Ser. B. 57, 289–300. MR1325392
  • [3] Bogdan, M., Ghosh, J. K., and Doerge, R. W. (2004). Modifying the Schwarz Bayesian Information Criterion to locate multiple interacting quantitive trait loci., Genetics 167, 989–999.
  • [4] Bogdan, M., Ghosh, J. K., Ochman, A. and Tokdar, S. T. (2007) On the Empirical Bayes approach to the problem of multiple testing., Quality and Reliability Engineering International 23, 727–739.
  • [5] Bogdan, M., Ghosh, J. K. and Tokdar S. T. (2008). A comparison of the Simes-Benjamini-Hochberg procedure with some Bayesian rules for multiple testing., IMS Collections, Vol.1, Beyond Parametrics in Interdisciplinary Research: Fetschrift in Honor of Professor Pranab K. Sen, edited by N. Balakrishnan, Edsel Peña and Mervyn J. Silvapulle 211–230. Beachwood Ohio.
  • [6] Bogdan, M., Chakrabati, A., Frommlet F. and Ghosh, J. K. (2011) Asymptotic Bayes Optimality under sparsity of some multiple testing procedures., Ann. Statist., 39, 1551–1579.
  • [7] Bogdan, M., Ghosh, J. K. and Żak-Szatkowska, M. (2008) Selecting explanatory variables with the modified version of Bayesian Information Criterion, Quality and Reliability Engineering International 24, 627–641.
  • [8] Bühlmann, P. and van de Geer, S. (2011), Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer.
  • [9] Cai, T. and Jin, J. (2010). Optimal rates of convergence for estimating the null and proportion of non-null effects in large-scale multiple testing., Ann. Statist. 38, 100–145.
  • [10] Chi, Z. (2008). False discovery rate control with multivariate $p$-values., Electronic Journal of Statistics 2, 368–411.
  • [11] Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman and, Hall.
  • [12] Donoho, D. L. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures., Ann. Statist. 32, 962–994.
  • [13] Donoho, D. L. and Jin, J. (2006). Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data., Ann. Statist. 34, 2980–3018.
  • [14] Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over $l_p$-balls for $l_q$-error., Probab. Theory Related Fields 99, 277–303.
  • [15] Efron, B. and Tibshirani, R. (2002). Empirical bayes methods and false discovery rates for microarrays., Genetic Epidemiology, 23, 70–86.
  • [16] Efron, B. (2008). Microarrays, Empirical Bayes and the two-group model., Stat. Sci., 23(1), 1–22.
  • [17] Fan, J., Hall, P. and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal, Student’s t or bootstrap calibration be applied?, J. Amer. Statist. Assoc., 102, 1282–1288.
  • [18] Feller, W. (1966). An introduction to probability theory and its applications. Vol. 2: Wiley, New, York.
  • [19] Finner, H., Dickhaus, T. and Roters, M. (2009). On the false discovery rate and an asymptotically optimal rejection curve., Ann. Statist. 37, 596–618.
  • [20] Frommlet, F., Ruhaltinger, F., Twarog, P. and Bogdan, M. (2012). Modified versions of Bayesian Information Criterion for genome-wide association studies. Comput. Stat. Data An., 56, 1038–1051.
  • [21] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure., J. R. Stat. Soc. Ser. B Stat. Methodol. 64(3), 499–517.
  • [22] George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection., Biometrika 87, 731–747.
  • [23] Guo, W. and Rao, M. B. (2008). On optimality of the Benjamini-Hochberg procedure for the false discovery rate., Statistics and Probability Letters 78, 2024–2030.
  • [24] Jin, J. and Cai, T. C. (2007). Estimating the null and the proportion of non-null effects in large-scale multiple comparisons., J. Amer. Statist. Assoc. 102, 495-506.
  • [25] Johnson, B. R. and Truax, D. R. (1973). Asymptotic behavior of Bayes tests and Bayes risk., Ann. Statist. 2, 278–294.
  • [26] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences., Ann. Statist. 32, 1594–1649.
  • [27] Johnstone, I. M. and Silverman, B. W. (2005). EbayesThresh: R programs for empirical Bayes thresholding., J. Stat. Software 12, Issue 8.
  • [28] Lehmann, E. L. 1957. A theory of some multiple decision problems, I., Ann. Math. Stat. 28, 1–25.
  • [29] Lehmann, E. L. and Romano, J. P. (2005). Generalizations of the familywise error rate., Ann. Statist. 33, 1138–1154.
  • [30] Lehmann, E. L., Romano, J. P. and Popper Shaffer, J. (2005). On optimality of stepdown and stepup multiple test procedures., Ann. Statist. 33, 1084–1108.
  • [31] Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses., Ann. Statist. 34, 373–393. MR2275246
  • [32] Meuwissen, T. and Goddard, M. (2010). Accurate Prediction of Genetic Values for Complex Traits by Whole-Genome Resequencing., Genetics 185 (2), 623–631
  • [33] Neuvial, P. and Roquain, E. (2011). On false discovery rate thresholding for classification under sparsity., arXiv:1106.6147
  • [34] Peña, E. A., Habiger, J. D., and Wu, W. (2011). Power-enhanced multiple decision functions controlling family-wise error and false discovery rates., Ann. Statist. 39(1), 556–583.
  • [35] Purdom, E. and Holmes, S. P. (2005) Error Distribution for Gene Expression Data., SAGMB 4(1), Article 16
  • [36] Roquain, E., and van de Wiel, M. A. (2009). Optimal weighting for false discovery rate control., Electronic Journal of Statistics 3, 678–711.
  • [37] Schwarz, G. (1978). Estimating the Dimension of a Model., Ann. Statist. 6(2), 461-464.
  • [38] Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing., J. Statist. Plann. Inference 136(7), 2144–2162.
  • [39] Shorack, G. R. and Wellner, J. A. (1986)., Empirical processes with applications to Statistics, Wiley Series in Probability and Mathematical Statistics.
  • [40] Storey, J. D. (2003). The positive false discovery rate: a Bayesian interpretation and the $q$-value., Ann. Statist., 31(6), 2013–2035.
  • [41] Storey, J. D. (2007). The optimal discovery procedure: a new approach to simultaneous significance testing., J. R. Statist. Soc. B 69, 347-368.
  • [42] Sun, W. and Cai, T. C. (2007). Oracle and adaptive compound decision rules for false discovery rate control., J. Amer. Statist. Assoc. 102, 901–912.
  • [43] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso., J. R. Stat. Soc. Ser. B, 58, 267–288.
  • [44] Żak-Szatkowska, M. and Bogdan, M. (2011). Modified versions of Bayesian Information Criterion for sparse Generalized Linear Models, Comput. Stat. Data An. 55, 2908–2924.