## Electronic Journal of Statistics

### Estimation of false discovery proportion in multiple testing: From normal to chi-squared test statistics

#### Abstract

Multiple testing based on chi-squared test statistics is common in many scientific fields such as genomics research and brain imaging studies. However, the challenges of designing a formal testing procedure when there exists a general dependence structure across the chi-squared test statistics have not been well addressed. To address this gap, we first adopt a latent factor structure ([14]) to construct a testing framework for approximating the false discovery proportion ($\mathrm{FDP}$) for a large number of highly correlated chi-squared test statistics with a finite number of degrees of freedom $k$. The testing framework is then used to simultaneously test $k$ linear constraints in a large dimensional linear factor model with some observable and unobservable common factors; the result is a consistent estimator of the $\mathrm{FDP}$ based on the associated factor-adjusted $p$-values. The practical utility of the method is investigated through extensive simulation studies and an analysis of batch effects in a gene expression study.

#### Article information

Source
Electron. J. Statist., Volume 11, Number 1 (2017), 1048-1091.

Dates
First available in Project Euclid: 31 March 2017

https://projecteuclid.org/euclid.ejs/1490925658

Digital Object Identifier
doi:10.1214/17-EJS1256

Mathematical Reviews number (MathSciNet)
MR3630301

Zentralblatt MATH identifier
1359.62196

#### Citation

Du, Lilun; Zhang, Chunming. Estimation of false discovery proportion in multiple testing: From normal to chi-squared test statistics. Electron. J. Statist. 11 (2017), no. 1, 1048--1091. doi:10.1214/17-EJS1256. https://projecteuclid.org/euclid.ejs/1490925658

#### References

• [1] Ahn, S. C., and Horenstein, A. R. (2013). Eigenvalue ratio test for the number of factors., Econometrica, 81, 1203–1227.
• [2] Azriel, D., and Schwartzman, A. (2015). The Empirical distribution of a large number of correlated normal variables., J. Am. Statist. Ass., 110, 1217–1228.
• [3] Bai, J. (2003). Inferential theory for factor models of large dimensions., Econometrica, 71, 135–171.
• [4] Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing., J. R. Statist. Soc. B, 57, 289–300.
• [5] Benjamini, Y., and Yekutieli, D. (2001). The control of false discovery rate in multiple testing under dependency., Ann. Statist., 29, 1165–1188.
• [6] Bourgon, R., Gentleman, R., and Huber, W. (2010). Independent filtering increases detection power for high-throughput experiments., Proc. Natl. Acad. Sci., 107, 9546–9551.
• [7] Clarke, S., and Hall, P. (2009). Robustness of multiple testing procedures against dependence., Ann. Statist., 37, 332–358.
• [8] Desai, K. H., and Storey, J. D. (2012). Cross-dimensional inference of dependent high-dimensional data., J. Am. Statist. Ass., 107, 135–151.
• [9] Edgar, R., Domrachev, M., and Lash, A. E. (2002). Gene Expression Omnibus: NCBI gene expression and hybridization array data repository., Nucleic Acids Research, 30, 207–210.
• [10] Efron, B. (2007). Correlation and large-scale simultaneous significance testing., J. Am. Statist. Ass., 102, 93–103.
• [11] Efron, B. (2010). Correlated z-values and the accuracy of large-scale statistical estimates., J. Am. Statist. Ass., 105, 1042–1055.
• [12] Fan, J., and Han, X. (2017). Estimation of false discovery proportion with unknown dependence., J. R. Statist. Soc. B, to appear.
• [13] Fan, J., Han, X., and Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence., J. Am. Statist. Ass., 107, 1019–1035.
• [14] Fan, J., Liao, Y., and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements., J. R. Statist. Soc. B, 75, 603–680.
• [15] Fan, J., Zhang, C. M., and Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon., Ann. Statist., 29, 153–193.
• [16] Friguet, C., Kloaereg, M., and Causeur, D. (2009). A factor model approach to multiple testing under dependence., J. Am. Statist. Ass., 104, 1406–1415.
• [17] Gagnon-Bartsch, J. A., and Speed, T. P. (2012). Using control genes to correct for unwanted variation in microarray data., Biostatistics, 13, 539–552.
• [18] Idaghdour, Y., Storey, J. D., Jadallah, S., and Gibson, G. (2008). A genome-wide gene expression signature of environmental geography in leukocytes of Moroccan Amazighs., PLoS genetics, 4, e1000052.
• [19] Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K., and Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data., Nat. Rev. Genet., 11, 733–739.
• [20] Leek, J. T., and Storey, J. D. (2008). A general framework for multiple testing dependence., Proc. Natl. Acad. Sci., 105, 18718–18723.
• [21] Lyons, R. (1988). Strong laws of large numbers for weakly correlated random variables., Michigan Mathematical Journal, 35, 353–359.
• [22] Owen, A. B. (2005). Variance of the number of false discoveries., J. R. Statist. Soc. B, 67, 411–426.
• [23] Sarkar, S. K. (2002). Some results on false discovery rate in stepwise multiple testing procedures., Ann. Statist., 30, 239–257.
• [24] Schwartzman, A. (2012). Comment: FDP vs FDR and the effect of conditioning., J. Am. Statist. Ass., 107, 1039–1041.
• [25] Schwartzman, A., and Lin, X. (2011). The effect of correlation in false discovery rate estimation., Biometrika, 98, 199–214.
• [26] Spielman, R. S., Bastone, L. A., Burdick, J. T., Morley, M., Ewens, W. J., and Cheung, V. G. (2007). Common genetic variants account for differences in gene expression among ethnic groups., Nature Genetics, 39, 226–231.
• [27] Storey, J. D., Taylor, J. E., and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach., J. R. Statist. Soc. B, 66, 187–205.
• [28] Sun, Y., Zhang, N. R., and Owen, A. B. (2012). Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data., Ann. Appl. Statist., 6, 1664–1688.
• [29] Wang, H. (2012). Factor profiled independence screening., Biometrika, 99, 15–28.
• [30] Wu, M. C., Kraft, P., Epstein, M. P., Taylor, D. M., Chanock, S. J., Hunter, D. J., and Lin, X. (2010). Powerful SNP-set analysis for case-control genome-wide association studies., Am. J. Hum. Genet., 86, 929–942.
• [31] Wu, W. (2008). On false discovery control under dependence., Ann. Statist., 36, 364–380.
• [32] Zeggini, E., Weedon, M. N., Lindgren, C. M., Frayling, T. M., et al. (2007). Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science, 316, 1336–1341.
• [33] Zhang, C. M., and Yu, T. (2008). Semiparametric detection of significant activation for brain fMRI., Ann. Statist., 36, 1693–1725.