The Annals of Statistics

On false discovery rate thresholding for classification under sparsity

Pierre Neuvial and Etienne Roquain

Full-text: Open access

Abstract

We study the properties of false discovery rate (FDR) thresholding, viewed as a classification procedure. The “$0$”-class (null) is assumed to have a known density while the “$1$”-class (alternative) is obtained from the “$0$”-class either by translation or by scaling. Furthermore, the “$1$”-class is assumed to have a small number of elements w.r.t. the “$0$”-class (sparsity). We focus on densities of the Subbotin family, including Gaussian and Laplace models. Nonasymptotic oracle inequalities are derived for the excess risk of FDR thresholding. These inequalities lead to explicit rates of convergence of the excess risk to zero, as the number $m$ of items to be classified tends to infinity and in a regime where the power of the Bayes rule is away from $0$ and $1$. Moreover, these theoretical investigations suggest an explicit choice for the target level $\alpha_{m}$ of FDR thresholding, as a function of $m$. Our oracle inequalities show theoretically that the resulting FDR thresholding adapts to the unknown sparsity regime contained in the data. This property is illustrated with numerical experiments.

Article information

Source
Ann. Statist. Volume 40, Number 5 (2012), 2572-2600.

Dates
First available in Project Euclid: 4 February 2013

Permanent link to this document
https://projecteuclid.org/euclid.aos/1359987531

Digital Object Identifier
doi:10.1214/12-AOS1042

Mathematical Reviews number (MathSciNet)
MR3097613

Zentralblatt MATH identifier
06344386

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 62H15: Hypothesis testing

Keywords
False discovery rate sparsity classification multiple testing Bayes’ rule adaptive procedure oracle inequality

Citation

Neuvial, Pierre; Roquain, Etienne. On false discovery rate thresholding for classification under sparsity. Ann. Statist. 40 (2012), no. 5, 2572--2600. doi:10.1214/12-AOS1042. https://projecteuclid.org/euclid.aos/1359987531


Export citation

References

  • [1] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • [2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 289–300.
  • [3] Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 491–507.
  • [4] Blanchard, G., Lee, G. and Scott, C. (2010). Semi-supervised novelty detection. J. Mach. Learn. Res. 11 2973–3009.
  • [5] Blanchard, G. and Roquain, É. (2009). Adaptive false discovery rate control under independence and dependence. J. Mach. Learn. Res. 10 2837–2871.
  • [6] Bogdan, M., Chakrabarti, A., Frommlet, F. and Ghosh, J. K. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 39 1551–1579.
  • [7] Bogdan, M., Ghosh, J. K. and Tokdar, S. T. (2008). A comparison of the Benjamini–Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen. Inst. Math. Stat. Collect. 1 211–230. IMS, Beachwood, OH.
  • [8] Chi, Z. (2007). On the performance of FDR control: Constraints and a partial solution. Ann. Statist. 35 1409–1431.
  • [9] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • [10] Donoho, D. and Jin, J. (2006). Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. Ann. Statist. 34 2980–3018.
  • [11] Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1–22.
  • [12] Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23 70–86.
  • [13] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • [14] Ferreira, J. A. and Zwinderman, A. H. (2006). On the Benjamini–Hochberg method. Ann. Statist. 34 1827–1849.
  • [15] Finner, H., Dickhaus, T. and Roters, M. (2009). On the false discovery rate and an asymptotically optimal rejection curve. Ann. Statist. 37 596–618.
  • [16] Finner, H. and Roters, M. (2002). Multiple hypotheses testing and expected number of type I errors. Ann. Statist. 30 220–238.
  • [17] Gavrilov, Y., Benjamini, Y. and Sarkar, S. K. (2009). An adaptive step-down procedure with proven FDR control under independence. Ann. Statist. 37 619–629.
  • [18] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 499–517.
  • [19] Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
  • [20] Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003, with a foreword by Jean Picard.
  • [21] Neuvial, P. and Roquain, E. (2012). Supplement to “On false discovery rate thresholding for classification under sparsity.” DOI:10.1214/12-AOS1042SUPP.
  • [22] Roquain, E. (2011). Type I error rate control for testing many hypotheses: A survey with proofs. J. SFdS 152 3–38.
  • [23] Roquain, E. and Villers, F. (2011). Exact calculations for false discovery proportion with application to least favorable configurations. Ann. Statist. 39 584–612.
  • [24] Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures. Ann. Statist. 30 239–257.
  • [25] Sarkar, S. K. (2008). On methods controlling the false discovery rate. Sankhyā 70 135–168.
  • [26] Sarkar, S. K., Zhou, T. and Ghosh, D. (2008). A general decision theoretic formulation of procedures controlling FDR and FNR from a Bayesian perspective. Statist. Sinica 18 925–945.
  • [27] Sawyers, C. L. (2008). The cancer biomarker problem. Nature 452 548–552.
  • [28] Seeger, P. (1968). A note on a method for the analysis of significances en masse. Technometrics 10 586–593.
  • [29] Sen, P. K. (1999). Some remarks on Simes-type multiple tests of significance. J. Statist. Plann. Inference 82 139–145. Multiple comparisons (Tel Aviv, 1996).
  • [30] Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley, New York.
  • [31] Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
  • [32] Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the $q$-value. Ann. Statist. 31 2013–2035.
  • [33] Tamhane, A. C., Liu, W. and Dunnett, C. W. (1998). A generalized step-up-down multiple test procedure. Canad. J. Statist. 26 353–363.

Supplemental materials

  • Supplementary material: Supplement to: On false discovery rate thresholding for classification under sparsity. Proofs, additional experiments and supplementary notes for the present paper.