The Annals of Statistics

Single-index modulated multiple testing

Lilun Du and Chunming Zhang

Full-text: Open access

Abstract

In the context of large-scale multiple testing, hypotheses are often accompanied with certain prior information. In this paper, we present a single-index modulated (SIM) multiple testing procedure, which maintains control of the false discovery rate while incorporating prior information, by assuming the availability of a bivariate $p$-value, $(p_{1},p_{2})$, for each hypothesis, where $p_{1}$ is a preliminary $p$-value from prior information and $p_{2}$ is the primary $p$-value for the ultimate analysis. To find the optimal rejection region for the bivariate $p$-value, we propose a criteria based on the ratio of probability density functions of $(p_{1},p_{2})$ under the true null and nonnull. This criteria in the bivariate normal setting further motivates us to project the bivariate $p$-value to a single-index, $p(\theta)$, for a wide range of directions $\theta$. The true null distribution of $p(\theta)$ is estimated via parametric and nonparametric approaches, leading to two procedures for estimating and controlling the false discovery rate. To derive the optimal projection direction $\theta$, we propose a new approach based on power comparison, which is further shown to be consistent under some mild conditions. Simulation evaluations indicate that the SIM multiple testing procedure improves the detection power significantly while controlling the false discovery rate. Analysis of a real dataset will be illustrated.

Article information

Source
Ann. Statist., Volume 42, Number 4 (2014), 1262-1311.

Dates
First available in Project Euclid: 25 June 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1403715201

Digital Object Identifier
doi:10.1214/14-AOS1222

Mathematical Reviews number (MathSciNet)
MR3226157

Zentralblatt MATH identifier
1297.62217

Subjects
Primary: 62P10: Applications to biology and medical sciences
Secondary: 62G10: Hypothesis testing 62H15: Hypothesis testing

Keywords
Bivariate normality local false discovery rate multiple comparison $p$-value simultaneous inference symmetry property

Citation

Du, Lilun; Zhang, Chunming. Single-index modulated multiple testing. Ann. Statist. 42 (2014), no. 4, 1262--1311. doi:10.1214/14-AOS1222. https://projecteuclid.org/euclid.aos/1403715201


Export citation

References

  • [1] Bauer, S., Gagneur, J. and Robinson, P. N. (2010). Going Bayesian: Model-based gene set analysis of genome-scale data. Nucleic Acids Res. 38 3523–3532.
  • [2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 289–300.
  • [3] Benjamini, Y. and Hochberg, Y. (1997). Multiple hypotheses testing with weights. Scand. J. Stat. 24 407–418.
  • [4] Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educ. Behav. Stat. 25 60–83.
  • [5] Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 491–507.
  • [6] Bourgon, R., Gentleman, R. and Huber, W. (2010). Independent filtering increases detection power for high-throughput experiments. Proc. Natl. Acad. Sci. USA 107 9546–9551.
  • [7] Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. M. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. Chapman & Hall/CRC, Boca Raton, FL.
  • [8] Chi, Z. (2008). False discovery rate control with multivariate $p$-values. Electron. J. Stat. 2 368–411.
  • [9] Durrett, R. (2010). Probability: Theory and Examples, 4th ed. Cambridge Univ. Press, Cambridge.
  • [10] Efron, B. (2007). Size, power and false discovery rates. Ann. Statist. 35 1351–1377.
  • [11] Efron, B. and Tibshirani, R. (2002). Empirical Bayes methods and false discovery rates for microarrays. Genet. Epidemiol. 23 70–86.
  • [12] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • [13] Fan, J., Han, X. and Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence. J. Amer. Statist. Assoc. 107 1019–1035.
  • [14] Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 499–517.
  • [15] Genovese, C. R., Roeder, K. and Wasserman, L. (2006). False discovery control with $p$-value weighting. Biometrika 93 509–524.
  • [16] Hackstadt, A. J. and Hess, A. M. (2009). Filtering for increased power for microarray data analysis. BMC Bioinformatics 10 11.
  • [17] Hochberg, Y. and Benjamini, Y. (1990). More powerful procedures for multiple significance testing. Stat. Med. 9 811–818.
  • [18] Hu, J. X., Zhao, H. and Zhou, H. H. (2010). False discovery rate control with groups. J. Amer. Statist. Assoc. 105 1215–1227.
  • [19] Huang, D., Sherman, B. T. and Lempicki, R. A. (2008). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4 44–57.
  • [20] Kim, J. H., Dhanasekaran, S. M., Mehra, R., Tomlins, S. A., Gu, W., Yu, J., Kumar-Sinha, C., Cao, X., Dash, A., Wang, L., Ghosh, D., Shedden, K., Montie, J. E., Rubin, M. A., Pienta, K. J., Shah, R. B. and Chinnaiyan, A. M. (2007). Integrative analysis of genomic aberrations associated with prostate cancer progression. Cancer Res. 67 8229–8239.
  • [21] Lahti, L., Schäfer, M., Klein, H.-U., Bicciato, S. and Dugas, M. (2013). Cancer gene prioritization by integrative analysis of mRNA expression and DNA copy number data: A comparative review. Brief. Bioinform. 14 27–35.
  • [22] Lapointe, J., Li, C., Higgins, J. P., van de Rijn, M., Bair, E., Montgomery, K., Ferrari, M., Egevad, L., Rayford, W., Bergerheim, U. et al. (2004). Gene expression profiling identifies clinically relevant subtypes of prostate cancer. Proc. Natl. Acad. Sci. USA 101 811–816.
  • [23] Liang, K. and Nettleton, D. (2012). Adaptive and dynamic adaptive procedures for false discovery rate control and estimation. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 163–182.
  • [24] Lusa, L., Korn, E. L. and McShane, L. M. (2008). A class of comparison method with filtering-enhanced variable selection for high-dimensional data sets. Stat. Med. 27 5834–5849.
  • [25] McClintick, J. N. and Edenberg, H. J. (2006). Effects of filtering by present call on analysis of microarray experiments. BMC Bioinformatics 7 49.
  • [26] Roeder, K., Bacanu, S.-A., Wasserman, L. and Devlin, B. (2006). Using linkage genome scans to improve power of association in genome scans. Am. J. Hum. Genet. 78 243–252.
  • [27] Roeder, K. and Wasserman, L. (2009). Genome-wide significance levels and weighted hypothesis testing. Statist. Sci. 24 398–413.
  • [28] Salari, K., Tibshirani, R. and Pollack, J. R. (2010). DR-Integrator: A new analytic tool for integrating DNA copy number and gene expression data. Bioinformatics 26 414–416.
  • [29] Schweder, T. and Spjøtvoll, E. (1982). Plots of $P$-values to evaluate many tests simultaneously. Biometrika 69 493–502.
  • [30] Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
  • [31] Storey, J. D. (2007). The optimal discovery procedure: A new approach to simultaneous significance testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 347–368.
  • [32] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–205.
  • [33] Talloen, W., Clevert, D. A., Hochreiter, S., Amaratunga, D., Bijnens, L., Kass, S. and Göhlmann, H. W. H. (2007). I/NI-calls for the exclusion of noninformative genes: A highly effective filtering tool for microarray data. Bioinformatics 23 2897–2902.
  • [34] Tritchler, D., Parkhomenko, E. and Beyene, J. (2009). Filtering genes for cluster and network analysis. BMC Bioinformatics 10 193.
  • [35] Wang, Z., He, Q., Larget, B. and Newton, M. A. (2013). A multi-functional analyzer uses parameter constraints to improve the efficiency of model-based gene-set analysis. Preprint. Available at arXiv:1310.6322.
  • [36] Zhang, C., Fan, J. and Yu, T. (2011). Multiple testing via $\mathrm{FDR}_{L}$ for large-scale imaging data. Ann. Statist. 39 613–642.