Annals of Statistics

Confounder adjustment in multiple hypothesis testing

Jingshu Wang, Qingyuan Zhao, Trevor Hastie, and Art B. Owen

Full-text: Open access


We consider large-scale studies in which thousands of significance tests are performed simultaneously. In some of these studies, the multiple testing procedure can be severely biased by latent confounding factors such as batch effects and unmeasured covariates that correlate with both primary variable(s) of interest (e.g., treatment variable, phenotype) and the outcome. Over the past decade, many statistical methods have been proposed to adjust for the confounders in hypothesis testing. We unify these methods in the same framework, generalize them to include multiple primary variables and multiple nuisance variables, and analyze their statistical properties. In particular, we provide theoretical guarantees for RUV-4 [Gagnon-Bartsch, Jacob and Speed (2013)] and LEAPP [Ann. Appl. Stat. 6 (2012) 1664–1688], which correspond to two different identification conditions in the framework: the first requires a set of “negative controls” that are known a priori to follow the null distribution; the second requires the true nonnulls to be sparse. Two different estimators which are based on RUV-4 and LEAPP are then applied to these two scenarios. We show that if the confounding factors are strong, the resulting estimators can be asymptotically as powerful as the oracle estimator which observes the latent confounding factors. For hypothesis testing, we show the asymptotic $z$-tests based on the estimators can control the type I error. Numerical experiments show that the false discovery rate is also controlled by the Benjamini–Hochberg procedure when the sample size is reasonably large.

Article information

Ann. Statist., Volume 45, Number 5 (2017), 1863-1894.

Received: August 2015
Revised: January 2016
First available in Project Euclid: 31 October 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62J15: Paired and multiple comparisons
Secondary: 62H25: Factor analysis and principal components; correspondence analysis

Empirical null surrogate variable analysis unwanted variation batch effect robust regression


Wang, Jingshu; Zhao, Qingyuan; Hastie, Trevor; Owen, Art B. Confounder adjustment in multiple hypothesis testing. Ann. Statist. 45 (2017), no. 5, 1863--1894. doi:10.1214/16-AOS1511.

Export citation


  • [1] Alter, O., Brown, P. O. and Botstein, D. (2000). Singular value decomposition for genome-wide expression data processing and modeling. Proc. Natl. Acad. Sci. USA 97 10101–10106.
  • [2] Anderson, T. W. and Rubin, H. (1956). Statistical inference in factor analysis. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 19541955, Vol. V 111–150. Univ. California Press, Berkeley and Los Angeles.
  • [3] Bai, J. and Li, K. (2012). Statistical analysis of factor models of high dimension. Ann. Statist. 40 436–465.
  • [4] Bai, J. and Li, K. (2014). Theory and methods of panel data models with interactive effects. Ann. Statist. 42 142–170.
  • [5] Bai, J. and Li, K. (2016). Maximum likelihood estimation and inference for approximate factor models of high dimension. Rev. Econ. Stat. 98 298–309.
  • [6] Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191–221.
  • [7] Bai, J. and Ng, S. (2006). Confidence intervals for diffusion index forecasts and inference for factor-augmented regressions. Econometrica 74 1133–1150.
  • [8] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • [9] Benjamini, Y. and Yekutieli, D. (2001). The control of the false discovery rate in multiple testing under dependency. Ann. Statist. 29 1165–1188.
  • [10] Blalock, E. M., Geddes, J. W., Chen, K. C., Porter, N. M., Markesbery, W. R. and Landfield, P. W. (2004). Incipient Alzheimer’s disease: Microarray correlation analyses reveal major transcriptional and tumor suppressor responses. Proc. Natl. Acad. Sci. USA 101 2173–2178.
  • [11] Bollen, K. A. (1989). Structural Equations with Latent Variables. Wiley, New York.
  • [12] Brys, G., Hubert, M. and Struyf, A. (2004). A robust measure of skewness. J. Comput. Graph. Statist. 13 996–1017.
  • [13] Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2012). Latent variable graphical model selection via convex optimization. Ann. Statist. 40 1935–1967.
  • [14] Clarke, S. and Hall, P. (2009). Robustness of multiple testing procedures against dependence. Ann. Statist. 37 332–358.
  • [15] Craig, A., Cloarec, O., Holmes, E., Nicholson, J. K. and Lindon, J. C. (2006). Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal. Chem. 78 2262–2267.
  • [16] Desai, K. H. and Storey, J. D. (2012). Cross-dimensional inference of dependent high-dimensional data. J. Amer. Statist. Assoc. 107 135–151.
  • [17] De La Fuente, A., Bing, N., Hoeschele, I. and Mendes, P. (2004). Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics 20 3565–3574.
  • [18] Efron, B. (2007). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93–103.
  • [19] Efron, B. (2010). Correlated $z$-values and the accuracy of large-scale statistical estimates. J. Amer. Statist. Assoc. 105 1042–1055.
  • [20] Fan, J. and Han, X. (2013). Estimation of false discovery proportion with unknown dependence. Available at arXiv:1305.7007.
  • [21] Fan, J., Han, X. and Gu, W. (2012). Estimating false discovery proportion under arbitrary covariance dependence. J. Amer. Statist. Assoc. 107 1019–1035.
  • [22] Fare, T. L., Coffey, E. M., Dai, H., He, Y. D., Kessler, D. A., Kilian, K. A., Koch, J. E., LeProust, E., Marton, M. J., Meyer, M. R. et al. (2003). Effects of atmospheric ozone on microarray data quality. Anal. Chem. 75 4672–4675.
  • [23] Fisher, R. A. (1935). The Design of Experiments. Oliver & Boyd, Edinburgh.
  • [24] Friguet, C., Kloareg, M. and Causeur, D. (2009). A factor model approach to multiple testing under dependence. J. Amer. Statist. Assoc. 104 1406–1415.
  • [25] Gagnon-Bartsch, J., Jacob, L. and Speed, T. P. (2013). Removing unwanted variation from high dimensional data with negative controls. Technical Report 820, Dept. Statistics, Univ. California, Berkeley, Berkeley, CA.
  • [26] Gagnon-Bartsch, J. A. and Speed, T. P. (2012). Using control genes to correct for unwanted variation in microarray data. Biostatistics 13 539–552.
  • [27] Gasch, A. P., Spellman, P. T., Kao, C. M., Carmel-Harel, O., Eisen, M. B., Storz, G., Botstein, D. and Brown, P. O. (2000). Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11 4241–4257.
  • [28] Greenland, S., Robins, J. M. and Pearl, J. (1999). Confounding and collapsibility in causal inference. Statist. Sci. 14 29–46.
  • [29] Grzebyk, M., Wild, P. and Chouanière, D. (2004). On identification of multi-factor models with correlated residuals. Biometrika 91 141–151.
  • [30] Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U., Speed, T. P. et al. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4 249–264.
  • [31] Jin, J. (2012). Comment: “Estimating false discovery proportion under arbitrary covariance dependence.” [MR3010887] J. Amer. Statist. Assoc. 107 1042–1045.
  • [32] Kish, L. (1959). Some statistical problems in research design. Am. Sociol. Rev. 24 328–338.
  • [33] Korn, E. L., Troendle, J. F., McShane, L. M. and Simon, R. (2004). Controlling the number of false discoveries: Application to high-dimensional genomic data. J. Statist. Plann. Inference 124 379–398.
  • [34] Kuroki, M. and Pearl, J. (2014). Measurement bias and effect restoration in causal inference. Biometrika 101 423–437.
  • [35] Lan, W. and Du, L. (2014). A factor-adjusted multiple testing procedure with application to mutual fund selection. Available at arXiv:1407.5515.
  • [36] Lazar, C., Meganck, S., Taminau, J., Steenhoff, D., Coletta, A., Molter, C., Weiss-Solís, D. Y., Duque, R., Bersini, H. and Nowé, A. (2013). Batch effect removal methods for microarray gene expression data integration: A survey. Brief. Bioinform. 14 469–490.
  • [37] Leek, J. T., Scharpf, R. B., Bravo, H. C., Simcha, D., Langmead, B., Johnson, W. E., Geman, D., Baggerly, K. and Irizarry, R. A. (2010). Tackling the widespread and critical impact of batch effects in high-throughput data. Nat. Rev. Genet. 11 733–739.
  • [38] Leek, J. T. and Storey, J. D. (2007). Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3 1724–1735.
  • [39] Leek, J. T. and Storey, J. D. (2008). A general framework for multiple testing dependence. Proc. Natl. Acad. Sci. USA 105 18718–18723.
  • [40] Li, J. and Zhong, P.-S. (2016). A rate optimal procedure for recovering sparse differences between high-dimensional means under dependence. Ann. Statist. To appear.
  • [41] Lin, D. W., Coleman, I. M., Hawley, S., Huang, C. Y., Dumpit, R., Gifford, D., Kezele, P., Hung, H., Knudsen, B. S., Kristal, A. R. et al. (2006). Influence of surgical manipulation on prostate gene expression: Implications for molecular correlates of treatment effects and disease prognosis. J. Clin. Oncol. 24 3763–3770.
  • [42] Maronna, R. A., Martin, R. D. and Yohai, V. J. (2006). Robust Statistics: Theory and Methods. Wiley, Chichester.
  • [43] Onatski, A. (2010). Determining the number of factors from empirical distribution of eigenvalues. Rev. Econ. Stat. 92 1004–1016.
  • [44] Owen, A. B. (2005). Variance of the number of false discoveries. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 411–426.
  • [45] Owen, A. B. and Wang, J. (2016). Bi-cross-validation for factor analysis. Statist. Sci. 31 119–139.
  • [46] Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge.
  • [47] Perry, P. O. and Pillai, N. S. (2013). Degrees of freedom for combining regression with factor analysis. Preprint. Available at arXiv:1310.7269.
  • [48] Pesaran, M. H. (2004). General diagnostic tests for cross section dependence in panels. Cambridge Working Papers in Economics No. 0435.
  • [49] Price, A. L., Patterson, N. J., Plenge, R. M., Weinblatt, M. E., Shadick, N. A. and Reich, D. (2006). Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38 904–909.
  • [50] Ransohoff, D. F. (2005). Bias as a threat to the validity of cancer molecular-marker research. Nat. Rev. Cancer 5 142–149.
  • [51] Rhodes, D. R. and Chinnaiyan, A. M. (2005). Integrative analysis of the cancer transcriptome. Nat. Genet. 37 S31–S37.
  • [52] Schwartzman, A. (2010). Comment: “Correlated $z$-values and the accuracy of large-scale statistical estimates.” [MR2752597] J. Amer. Statist. Assoc. 105 1059–1063.
  • [53] Schwartzman, A., Dougherty, R. F. and Taylor, J. E. (2008). False discovery rate analysis of brain diffusion direction maps. Ann. Appl. Stat. 2 153–175.
  • [54] She, Y. and Owen, A. B. (2011). Outlier detection using nonconvex penalized regression. J. Amer. Statist. Assoc. 106 626–639.
  • [55] Singh, D., Fox, S. M., Tal-Singer, R., Plumb, J., Bates, S., Broad, P., Riley, J. H. and Celli, B. (2011). Induced sputum genes associated with spirometric and radiological disease severity in COPD ex-smokers. Thorax 66 489–495.
  • [56] Storey, J. D., Taylor, J. E. and Siegmund, D. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 66 187–205.
  • [57] Sun, W. and Cai, T. T. (2009). Large-scale multiple testing under dependence. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 393–424.
  • [58] Sun, Y. (2011). On latent systemic effects in multiple hypotheses. Ph.D. thesis, Stanford University.
  • [59] Sun, Y., Zhang, N. R. and Owen, A. B. (2012). Multiple hypothesis testing adjusted for latent variables, with an application to the AGEMAP gene expression data. Ann. Appl. Stat. 6 1664–1688.
  • [60] Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98 5116–5121.
  • [61] Vawter, M. P., Evans, S., Choudary, P., Tomita, H., Meador-Woodruff, J., Molnar, M., Li, J., Lopez, J. F., Myers, R., Cox, D. et al. (2004). Gender-specific gene expression in post-mortem human brain: Localization to sex chromosomes. Neuropsychopharmacology 29 373–384.
  • [62] Wang, J., Zhao, Q., Hastie, T. and Owen, A. B. (2017). Supplement to “Confounder adjustment in multiple hypothesis testing.” DOI:10.1214/16-AOS1511SUPP.
  • [63] Wang, S., Cui, G. and Li, K. (2015). Factor-augmented regression models with structural change. Econom. Lett. 130 124–127.
  • [64] Yohai, V. J. (1987). High breakdown-point and high efficiency robust estimates for regression. Ann. Statist. 15 642–656.

Supplemental materials