## The Annals of Applied Statistics

### Testing the disjunction hypothesis using Voronoi diagrams with applications to genetics

#### Abstract

Testing of the disjunction hypothesis is appropriate when each gene or location studied is associated with multiple $p$-values, each of which is of individual interest. This can occur when more than one aspect of an underlying process is measured. For example, cancer researchers may hope to detect genes that are both differentially expressed on a transcriptomic level and show evidence of copy number aberration. Currently used methods of $p$-value combination for this setting are overly conservative, resulting in very low power for detection. In this work, we introduce a method to test the disjunction hypothesis by using cumulative areas from the Voronoi diagram of two-dimensional vectors of $p$-values. Our method offers much improved power over existing methods, even in challenging situations, while maintaining appropriate error control. We apply the approach to data from two published studies: the first aims to detect periodic genes of the organism Schizosaccharomyces pombe, and the second aims to identify genes associated with prostate cancer.

#### Article information

Source
Ann. Appl. Stat., Volume 8, Number 2 (2014), 801-823.

Dates
First available in Project Euclid: 1 July 2014

https://projecteuclid.org/euclid.aoas/1404229515

Digital Object Identifier
doi:10.1214/13-AOAS707

Mathematical Reviews number (MathSciNet)
MR3262535

Zentralblatt MATH identifier
06333777

#### Citation

Phillips, Daisy; Ghosh, Debashis. Testing the disjunction hypothesis using Voronoi diagrams with applications to genetics. Ann. Appl. Stat. 8 (2014), no. 2, 801--823. doi:10.1214/13-AOAS707. https://projecteuclid.org/euclid.aoas/1404229515

#### References

• Benjamini, Y. and Heller, R. (2008). Screening for partial conjunction hypotheses. Biometrics 64 1215–1222.
• Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 289–300.
• Benjamini, Y. and Hochberg, Y. (2000). On the adaptive control of the false discovery rate in multiple testing with independent statistics. J. Educ. Behav. Statist. 25 60–83.
• Benjamini, Y., Krieger, A. M. and Yekutieli, D. (2006). Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93 491–507.
• de Lichtenberg, U., Jensen, L. J., Fausbøll, A., Jensen, T. S., Bork, P. and Brunak, S. (2005). Comparison of computational methods for the identification of cell cycle-regulated genes. Bioinformatics 21 1164–1171.
• Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
• Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93–103.
• Fisher, S. R. A. (1932). Statistical Methods for Research Workers, 4th ed. Oliver & Boyd, Edinburgh.
• Forbes, S. A., Bindal, N., Bamford, S., Cole, C., Kok, C. Y., Beare, D., Jia, M., Shepherd, R., Leung, K., Menzies, A. et al. (2011). COSMIC: Mining complete cancer genomes in the catalogue of somatic mutations in cancer. Nucleic Acids Res. 39 D945–D950.
• Fritz, B., Schubert, F., Wrobel, G., Schwaenen, C., Wessendorf, S., Nessling, M., Korz, C., Rieker, R. J., Montgomery, K., Kucherlapati, R. et al. (2002). Microarray-based copy number and expression profiling in dedifferentiated and pleomorphic liposarcoma. Cancer Res. 62 2993–2998.
• Ghosh, D. (2011). Generalized Benjamini–Hochberg procedures using spacings. J. Indian Soc. Agricultural Statist. 65 213–220, 262.
• Ghosh, D. (2012). Incorporating the empirical null hypothesis into the Benjamini–Hochberg procedure. Stat. Appl. Genet. Mol. Biol. 11 Art. 11, front matter+19.
• Huang, D. W., Sherman, B. T. and Lempicki, R. A. (2008). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protoc. 4 44–57.
• Huang, D. W., Sherman, B. T. and Lempicki, R. A. (2009). Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37 1–13.
• Jiménez, R. and Yukich, J. E. (2002). Asymptotics for statistical distances based on Voronoi tessellations. J. Theoret. Probab. 15 503–541.
• Jin, J. and Cai, T. T. (2007). Estimating the null and the proportional of nonnull effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
• Kim, J. H., Dhanasekaran, S. M., Mehra, R., Tomlins, S. A., Gu, W., Yu, J., Kumar-Sinha, C., Cao, X., Dash, A., Wang, L. et al. (2007). Integrative analysis of genomic aberrations associated with prostate cancer progression. Cancer Res. 67 8229–8239.
• Loughin, T. M. (2004). A systematic comparison of methods for combining $p$-values from independent tests. Comput. Statist. Data Anal. 47 467–485.
• Muralidharan, O. (2010). An empirical Bayes mixture method for effect size and false discovery rate estimation. Ann. Appl. Stat. 4 422–438.
• Okabe, A., Boots, B., Sugihara, K. and Chiu, S. N. (2010). Spatial Tessellations: Concepts and Applications of Voronoi Diagrams, 2nd ed. Wiley, Chichester.
• Oliva, A., Rosebrock, A., Ferrezuelo, F., Pyne, S., Chen, H., Skiena, S., Futcher, B. and Leatherwood, J. (2005). The cell cycle-regulated genes of Schizosaccharomyces pombe. PLoS Biol. 3 e225.
• Owen, A. B. (2009). Karl Pearson’s meta-analysis revisited. Ann. Statist. 37 3867–3892.
• Phillips, D. and Ghosh, D. (2014a). Supplement to “Testing the disjunction hypothesis using Voronoi diagrams with applications to genetics.” DOI:10.1214/13-AOAS707SUPPA.
• Phillips, D. and Ghosh, D. (2014b). Supplement to “Testing the disjunction hypothesis using Voronoi diagrams with applications to genetics.” DOI:10.1214/13-AOAS707SUPPB.
• Pollack, J. R., Sørlie, T., Perou, C. M., Rees, C. A., Jeffrey, S. S., Lonning, P. E., Tibshirani, R., Botstein, D., Børresen-Dale, A.-L. and Brown, P. O. (2002). Microarray analysis reveals a major direct role of DNA copy number alteration in the transcriptional program of human breast tumors. Proc. Natl. Acad. Sci. USA 99 12963–12968.
• Pounds, S. B. (2006). Estimation and control of multiple testing error rates for microarray studies. Brief. Bioinf. 7 25–36.
• Pounds, S. and Morris, S. W. (2003). Estimating the occurrence of false positives and false negatives in microarray studies by approximating and partitioning the empirical distribution of $p$-values. Bioinformatics 19 1236–1242.
• Pyke, R. (1965). Spacings. J. R. Stat. Soc. Ser. B Stat. Methodol. 27 395–449.
• Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
• Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A. and Williams Jr., R. M. (1949). The American Soldier: Adjustment During Army Life. Stud. Soc. Psychol. World War II 1. Princeton Univ. Press, Princeton.
• Strimmer, K. (2008). A unified approach to false discovery rate estimation. BMC Bioinformatics 9 303.
• Tagnon, H. J., Whitmore, W. F. and Shulman, N. R. (1952). Fibrinolysis in metastatic cancer of the prostate. Cancer 5 9–12.
• Tonon, G., Wong, K.-K., Maulik, G., Brennan, C., Feng, B., Zhang, Y., Khatry, D. B., Protopopov, A., You, M. J., Aguirre, A. J. et al. (2005). High-resolution genomic profiles of human lung cancer. Proc. Natl. Acad. Sci. USA 102 9625–9630.
• Tsafrir, D., Bacolod, M., Selvanayagam, Z., Tsafrir, I., Shia, J., Zeng, Z., Liu, H., Krier, C., Stengel, R. F., Barany, F. et al. (2006). Relationship of gene expression and chromosomal abnormalities in colorectal cancer. Cancer Res. 66 2129–2137.
• Turner, R. (2013). deldir: Delaunay triangulation and Dirichlet (Voronoi) tessellation.
• Wilkinson, B. (1951). A statistical consideration in psychological research. Psychol. Bull. 48 156.
• Zacharski, L. R., Wojtukiewicz, M. Z., Costantini, V., Ornstein, D. L., Memoli, V. A. et al. (1992). Pathways of coagulation/fibrinolysis activation in malignancy. Semin. Thromb. Hemost. 18 104.

#### Supplemental materials

• Supplementary material A: Summarized results of additional simulation studies. We present the summarized results of the proposed procedure’s performance in the challenging situations described in Section 7. Results include estimated FDR and 1-NDR for each of the two settings.
• Supplementary material B: Supplementary code and data. R code including the functions required to perform the procedure described in this paper, to replicate the described simulation studies and to perform the described data analysis. The relevant data sets are also included.