The Annals of Statistics

Phase transition and regularized bootstrap in large-scale $t$-tests with false discovery rate control

Weidong Liu and Qi-Man Shao

Full-text: Open access

Abstract

Applying the Benjamini and Hochberg (B–H) method to multiple Student’s $t$ tests is a popular technique for gene selection in microarray data analysis. Given the nonnormality of the population, the true $p$-values of the hypothesis tests are typically unknown. Hence it is common to use the standard normal distribution $N(0,1)$, Student’s $t$ distribution $t_{n-1}$ or the bootstrap method to estimate the $p$-values. In this paper, we prove that when the population has the finite 4th moment and the dimension $m$ and the sample size $n$ satisfy $\log m=o(n^{1/3})$, the B–H method controls the false discovery rate (FDR) and the false discovery proportion (FDP) at a given level $\alpha$ asymptotically with $p$-values estimated from $N(0,1)$ or $t_{n-1}$ distribution. However, a phase transition phenomenon occurs when $\log m\geq c_{0}n^{1/3}$. In this case, the FDR and the FDP of the B–H method may be larger than $\alpha$ or even converge to one. In contrast, the bootstrap calibration is accurate for $\log m=o(n^{1/2})$ as long as the underlying distribution has the sub-Gaussian tails. However, such a light-tailed condition cannot generally be weakened. The simulation study shows that the bootstrap calibration is very conservative for the heavy tailed distributions. To solve this problem, a regularized bootstrap correction is proposed and is shown to be robust to the tails of the distributions. The simulation study shows that the regularized bootstrap method performs better than its usual counterpart.

Article information

Source
Ann. Statist., Volume 42, Number 5 (2014), 2003-2025.

Dates
First available in Project Euclid: 11 September 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1410440632

Digital Object Identifier
doi:10.1214/14-AOS1249

Mathematical Reviews number (MathSciNet)
MR3262475

Zentralblatt MATH identifier
1288.53027

Subjects
Primary: 62H15: Hypothesis testing

Keywords
Bootstrap correction false discovery rate multiple $t$-tests phase transition

Citation

Liu, Weidong; Shao, Qi-Man. Phase transition and regularized bootstrap in large-scale $t$-tests with false discovery rate control. Ann. Statist. 42 (2014), no. 5, 2003--2025. doi:10.1214/14-AOS1249. https://projecteuclid.org/euclid.aos/1410440632


Export citation

References

  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 289–300.
  • Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • Cao, H. and Kosorok, M. R. (2011). Simultaneous critical values for $t$-tests in very high dimensions. Bernoulli 17 347–394.
  • Delaigle, A., Hall, P. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s $t$-statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 283–301.
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • Fan, J., Hall, P. and Yao, Q. (2007). To how many simultaneous hypothesis tests can normal, Student’s $t$ or bootstrap calibration be applied? J. Amer. Statist. Assoc. 102 1282–1288.
  • Ferreira, J. A. and Zwinderman, A. H. (2006). On the Benjamini–Hochberg method. Ann. Statist. 34 1827–1849.
  • Liu, W. and Shao, Q. (2014). Supplement to “Phase transition and regularized bootstrap in large-scale $t$-tests with false discovery rate control.” DOI:10.1214/14-AOS1249SUPP.
  • Shao, Q.-M. (1999). A Cramér type large deviation result for Student’s $t$-statistic. J. Theoret. Probab. 12 385–398.
  • Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the $q$-value. Ann. Statist. 31 2013–2035.
  • Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–205.
  • Wang, Q. (2005). Limit theorems for self-normalized large deviation. Electron. J. Probab. 10 1260–1285 (electronic).
  • Wu, W. B. (2008). On false discovery control under dependence. Ann. Statist. 36 364–380.

Supplemental materials

  • Supplementary material: Supplement to “Phase transition and regularized bootstrap in large-scale $t$-tests with false discovery rate control”. The supplementary material includes part of numerical results and the proof of Lemma 6.1 and Propositions 2.1, 2.2, 2.3 and 3.1.