## Statistical Science

### Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects

#### Abstract

In modern high-throughput data analysis, researchers perform a large number of statistical tests, expecting to find perhaps a small fraction of significant effects against a predominantly null background. Higher Criticism (HC) was introduced to determine whether there are any nonzero effects; more recently, it was applied to feature selection, where it provides a method for selecting useful predictive features from a large body of potentially useful features, among which only a rare few will prove truly useful.

In this article, we review the basics of HC in both the testing and feature selection settings. HC is a flexible idea, which adapts easily to new situations; we point out simple adaptions to clique detection and bivariate outlier detection. HC, although still early in its development, is seeing increasing interest from practitioners; we illustrate this with worked examples. HC is computationally effective, which gives it a nice leverage in the increasingly more relevant “Big Data” settings we see today.

We also review the underlying theoretical “ideology” behind HC. The Rare/Weak (RW) model is a theoretical framework simultaneously controlling the size and prevalence of useful/significant items among the useless/null bulk. The RW model shows that HC has important advantages over better known procedures such as False Discovery Rate (FDR) control and Family-wise Error control (FwER), in particular, certain optimality properties. We discuss the rare/weak phase diagram, a way to visualize clearly the class of RW settings where the true signals are so rare or so weak that detection and feature selection are simply impossible, and a way to understand the known optimality properties of HC.

#### Article information

Source
Statist. Sci., Volume 30, Number 1 (2015), 1-25.

Dates
First available in Project Euclid: 4 March 2015

https://projecteuclid.org/euclid.ss/1425492437

Digital Object Identifier
doi:10.1214/14-STS506

Mathematical Reviews number (MathSciNet)
MR3317751

Zentralblatt MATH identifier
1332.62019

#### Citation

Donoho, David; Jin, Jiashun. Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects. Statist. Sci. 30 (2015), no. 1, 1--25. doi:10.1214/14-STS506. https://projecteuclid.org/euclid.ss/1425492437

#### References

• [1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
• [2] Ahdesmäki, M. and Strimmer, K. (2010). Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. Ann. Appl. Stat. 4 503–519.
• [3] Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired Magazine 16.07 June 23 2008.
• [4] Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Statist. 23 193–212.
• [5] Arias-Castro, E., Candès, E. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.
• [6] Arias-Castro, E., Candès, E. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
• [7] Arias-Castro, E. and Wang, M. (2013). Distribution free tests for sparse heterogeneous mixtures. Available at arXiv:1308.0346.
• [8] Balabdaoui, F., Jankowski, H., Pavlides, M., Seregin, A. and Wellner, J. (2011). On the Grenander estimator at zero. Statist. Sinica 21 873–899.
• [9] Benjamini, Y. (2010). Discovering the false discovery rate. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 405–416.
• [10] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289–300.
• [11] Bennett, M. F., Melatos, A., Delaigle, A. and Hall, P. (2012). Reanalysis of $F$-statistics gravitational-wave search with the higher criticism statistics. Astrophys. J. 766 1–10.
• [12] Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominates the Kolmogorov statistic. Z. Wahrsch. Verw. Geb. 47 47–59.
• [13] Bickel, P. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [14] Blomberg, N. (2012). Higher criticism testing for signal detection in rare and weak models. Master thesis, KTH Royal Institute of Technology, Stockholm, Sweden.
• [15] Bogdan, M., Ghosh, J. and Tokdar, T. (2008). A comparison of the Benjamini–Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen 1 211–230. IMS, Beachwood, OH.
• [16] Bogdan, M., Chakrabarti, A., Frommlet, F. and Ghosh, J. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 39 1551–1579.
• [17] Box, M. and Meyer, D. (1986). An analysis for unreplicated fractional factorials. Technometrics 28 11–18.
• [18] Breiman, L. (2001). Random forests. Mach. Learn. 24 5–32.
• [19] Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2 121–167.
• [20] Butucea, C. and Ingster, Y. I. (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 2652–2688.
• [21] Cai, T., Jeng, J. and Jin, J. (2011). Detecting sparse heterogeneous and heteroscedastic mixtures. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 629–662.
• [22] Cai, T. and Jin, J. (2010). Optimal rate of convergence of estimating the null density and the proportion of non-null effects in large-scale multiple testing. Ann. Statist. 38 100–145.
• [23] Cai, T., Jin, J. and Low, M. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 2421–2449.
• [24] Cai, T., Liu, W. and Luo, X. (2010). A constrained $L^{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• [25] Cai, T. and Wu, Y. (2014). Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inform. Theory 60 2217–2232.
• [26] Cayon, L., Banday, A. J. et al. (2006). No higher criticism of the Bianchi-corrected Wilkinson microwave anisotropy probe data. Mon. Not. Roy. Astron. Soc. 369 598–602.
• [27] Cayon, L., Jin, J. and Treaster, A. (2004). Higher criticism statistic: Detecting and identifying non-Gaussianity in the WMAP first year data. Mon. Not. Roy. Astron. Soc. 362 826–832.
• [28] Charbonnier, G. (2012). Inference of gene regulatory network from non independently and identically distributed transcriptomic data. Ph.D. thesis, Univ. d’Évry Val-d’Essonne, France.
• [29] Cruz, M., Cayon, L., Martinez-Gonzalez, E., Vielva, P. and Jin, J. (2007). The non-Gaussian cold spot in the 3 year Wilkinson Microwave Anisotropy Probe data. Astrophys. J. 655 11–20.
• [30] Dai, H., Charnigo, R., Srivastava, T., Talebizadeh, Z. and Qing, S. (2012). Integrating $P$-values for genetic and genomic data analysis. J. Biom. Biostat. 2012 3–7.
• [31] Dasgupta, A., Lahiri, S. and Stoyanov, J. (2014). Sharp fixed $n$ bounds and asymptotic expansions for the means and the median of a Gaussian sample maximum, and applications to Donoho–Jin model. Stat. Methodol. 20 40–62.
• [32] de Uña-Alvarez, J. (2012). The Beta-Binomial SGoF method for multiple dependent tests. Stat. Appl. Genet. Mol. Biol. 11 1544–6115.
• [33] Delaigle, A. and Hall, P. (2009). Higher criticism in the context of unknown distribution, non-independence and classification. In Perspectives in Mathematical Sciences I: Probability and Statistics (N. Sastry, M. Delampady, B. Rajeev and T. S. S. R. K. Rao, eds.) 109–138. World Scientific.
• [34] Delaigle, A., Hall, J. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s t statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 283–301.
• [35] Dettling, M. (2004). BagBoosting for tumor classification with gene expression data. Bioinformatics 20 3583–3593.
• [36] Dettling, M. and Bühlmann, P. (2003). Boosting for tumor classification with gene expression data. Bioinformatics 19 1061–1069.
• [37] De la Cruz, O., Wen, X., Ke, B., Song, M. and Nicolae, D. L. (2010). Gene, region and pathway level analyses in whole-genome studies. Genet. Epidemiol. 34 222–231.
• [38] Donoho, D. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
• [39] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• [40] Donoho, D. and Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105 14790–14795.
• [41] Donoho, D. and Jin, J. (2009). Feature selection by Higher Criticism thresholding: Optimal phase diagram. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4449–4470.
• [42] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
• [43] Efron, B. (2011). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge Univ. Press, Cambridge.
• [44] Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 99 96–104.
• [45] El-Fishawy, P. (2013). Common disease-common variant hypothesis. Encyclopedia of Autism Spectral Disorder 719–720.
• [46] Fan, J. and Fan, Y. (2008). High dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
• [47] Fan, Y., Jin, J. and Yao, Z. (2013). Optimal feature selection by higher criticism in sparse Gaussian graphic model. Ann. Statist. 41 2537–2571.
• [48] Fienberg, S. and Jin, J. (2012). Privacy-preserving data sharing in high dimensional regression and classification settings. J. Privacy and Confidentiality 4 Article 10.
• [49] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
• [50] Gayraud, G. and Ingster, Y. I. (2011). Detection of sparse variable functions. Available at arXiv:1011.6369.
• [51] Ge, Y. and Li, X. (2012). Control of the false discovery proportion for independently tested null hypotheses. J. Probab. Stat. Article ID 320425, 19 pages.
• [52] Genovese, C., Jin, J., Wasserman, L. and Yao, Z. (2012). A comparison of the lasso and marginal regression. J. Mach. Learn. Res. 13 2107–2143.
• [53] Goldstein, D. B. (2009). Common genetic variation and human traits. New England J. Med. 360 1696–1698.
• [54] Golub, T. et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–536.
• [55] Gontscharuk, V., Landwehr, S. and Finner, H. (2012). On the behavior of local levels of higher criticism tests. Electron. J. Stat. 1–27.
• [56] Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., Richards, W. G., Sugarbaker, D. J. and Bueno, R. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62 4963–4967.
• [57] Greenshtein, E. and Park, J. (2012). Robust test for detecting a signal in a high dimensional sparse normal vector. J. Statist. Plann. Inference 142 1445–1456.
• [58] Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. Ann. Statist. 36 381–402.
• [59] Hall, P. and Jin, J. (2009). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
• [60] Hall, P., Pittelkow, Y. and Ghosh, M. (2008). Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 158–173.
• [61] Haupt, J., Castro, R. and Nowak, R. (2008). Adaptive discovery of sparse signals in noise. In 42nd Asilomar Conference on Signals, Systems and Computers 1727–1731.
• [62] Haupt, J., Castro, R. and Nowak, R. (2010). Improved bounds for sparse recovery from adaptive measurements. In Information Theory Proceedings (ISIT) 1565–1567.
• [63] He, S. and Wu, Z. (2011). Gene-based higher criticism methods for large-scale exonic single-nucleotide polymorphism data. In BMC Proceedings 5 (Suppl 9) S65.
• [64] Huang, S. and Jin, J. (2014). Partial correlation screening for estimating large precision matrix, with applications to classifications. Preprint.
• [65] Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distribution. Math. Methods Statist. 6 47–69.
• [66] Ingster, Y. I. (1999). Minimax detection of a signal for $l^{p}_{n}$-balls. Math. Methods Statist. 7 401–428.
• [67] Ingster, Y. I., Pouet, C. and Tsybakov, A. B. (2009). Classification of sparse high-dimensional vectors. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4427–4448.
• [68] Ingster, Y. I., Tsybakov, A. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
• [69] Iyengar, S. K. and Elston, R. C. (2007). The genetic basis of complex traits: Rare variants or “common gene, common disease”? Methods Mol. Biol. 376 71–84.
• [70] Jager, L. and Wellner, J. A. (2004). On the “Poisson boundaries” of the family of weighted Kolmogorov statistics. IMS Monograph 45 319–331.
• [71] Jager, L. and Wellner, J. A. (2007). Goodness-of-fit tests via phi-divergences. Ann. Statist. 35 2018–2053.
• [72] Jeng, J., Cai, T. and Li, H. (2010). Optimal sparse segment identification with application in copy number variation analysis. J. Amer. Statist. Assoc. 105 1156–1166.
• [73] Jeng, J., Cai, T. and Li, H. (2013). Simultaneous discovery of rare and common segment variants. Biometrika 100 157–172.
• [74] Ji, P. and Jin, J. (2011). UPS delivers optimal phase diagram in high dimensional variable selection. Ann. Statist. 40 73–103.
• [75] Jin, J. (2003). Detecting and estimating sparse mixtures. Ph.D. thesis, Dept. Statistics, Stanford Univ., Stanford, CA.
• [76] Jin, J. (2004). Detecting a target in very noisy data from multiple looks. IMS Monograph 45 255–286.
• [77] Jin, J. (2007). Proportion of non-zero normal means: Universal oracle equivalences and uniformly consistent estimators. J. Roy. Statist. Soc. 70 461–493.
• [78] Jin, J. (2009). Impossibility of successful classification when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 106 8859–9964.
• [79] Jin, J. and Cai, T. (2007). Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
• [80] Jin, J. and Wang, W. (2014a). Important features principle component Analysis for high-dimensional clustering. Available at arXiv:1407.5241.
• [81] Jin, J. and Wang, W. (2014b). Optimal feature selection for important features PCA in high dimensional clustering. Preprint.
• [82] Jin, J., Zhang, C.-H. and Zhang, Q. (2014). Optimality of Graphlet Screening in high dimensional variable selection. J. Mach. Learn. Res. 15 2723–2772.
• [83] Jin, J., Starck, J.-L., Donoho, D., Aghanim, N. and Forni, O. (2005). Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests. EURASIP J. Appl. Signal Processing 15 2470–2485.
• [84] Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
• [85] Ke, T., Jin, J. and Fan, J. (2014). Covariate assisted screening and estimation. Ann. Statist. 42 2202–2242.
• [86] Kendall, D. G. and Kendall, W. S. (1980). Alignments in two-dimensional random sets of points. Adv. in Appl. Probab. 12 380–424.
• [87] Klaus, B. and Strimmer, K. (2010). Thresholding for feature selection in genomic: Higher criticism versus false non-discovery rate. In Proceedings of the 7th International Workshop on Computational Systems Biology 59–62. Luxembourg.
• [88] Klaus, B. and Strimmer, K. (2013). Signal identification for rare and weak features: Higher Criticism and false discovery rate. Biostat. 14 129–143.
• [89] Laurent, B., Marteau, C. and Maugis-Rabusseau, C. (2013). Non-asymptotic detection of two-component mixture with unknown means. Available at arXiv:1304.6924.
• [90] Li, J. and Siegmund, D. (2014). Higher criticism: $p$-values and criticism. Preprint.
• [91] Liu, W. and Shao, Q. M. (2013). A Cramér Rao moderate deviation theorem for Hotelling’s $T^{2}$-statistic with applications to global tests. Ann. Statist. 41 296–322.
• [92] Martin, L., Gao, G., Kang, G., Fang, Y. and Woo, J. (2009). Improving the signal-to-noise ratio in genome-wide association studies. Genetic Epidemiology 33 29–32.
• [93] Meinshausen, N. (2008). Hierarchical testing of variable importance. Biometrika 95 265–278.
• [94] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs with the lasso. Ann. Statist. 34 1436–1462.
• [95] Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373–393.
• [96] Mukherjee, R., Pillai, N. and Lin, X. (2013). Hypothesis testing for sparse binary regression. Available at arXiv:1308.0764.
• [97] Muralidharan, O. (2010). Detecting column dependence when rows are correlated and estimating the strength of the row correlation. Electron. J. Stat. 4 1527–1546.
• [98] Neill, D. (2006). Detection of spatial and spatio-temporal Clusters. Ph.D. thesis, School of Computer Science, Carnegie Mellon Univ., Pittsburgh, PA.
• [99] Neill, D. and Lingwall, J. (2007). A nonparametric scan statistic for multivariate disease surveillance. Advances in Disease Surveillance 4 106–106.
• [100] Park, J. and Ghosh, J. (2010). A guided random walk through some high dimensional problems. Sankhya A 72 81–100.
• [101] Parkhomenko, E., Tritchler, D., Lemire, M. et al. (2009). Using a higher criticism statistic to detect modest effects in a genome-wide study of rheumatoid arthritis. BMC Proceedings 3 S40.
• [102] Pires, S., Starck, J.-L., Amara, A., Refregier, A. and Teyssier, R. (2009). Cosmological models discrimination with weak lensing. Astronom. Astrophys. Lib. 505 969–979.
• [103] Roeder, K. and Wasserman, L. (2009). Genome-wide significance levels and weighted hypothesis testing. Statist. Sci. 24 398–413.
• [104] Rohban, M. H., Ishwar, P., Orteny, P., Karl, W. C. and Saligrama, V. (2013). An impossibility result for high dimensional supervised learning. In Information Theory Workshop (ITW), 2013 IEEE, 1–5. IEEE.
• [105] Ruben, H. (1960). Probability content of regions under spherical normal distribution, I. Ann. Statist. 31 598–618.
• [106] Sabatti, C., Service, S., Hartikainen, A. L. et al. (2008). Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nature Genetics 41 35–46.
• [107] Saligrama, V. and Zhao, M. (2012). Local anomaly detection. JMLR W&CP 22 969–983.
• [108] Shorack, G. and Wellner, J. (1986). Empirical Processes with Applications to Statistics. Wiley, New York.
• [109] Suleiman, R. F. R., Mary, D. and Ferrari, A. (2013). Minimax sparse detection based on one-class classifiers. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5553–5557.
• [110] Sun, L. (2011). On the efficiency of genome-wide scans: A multiple hypothesis testing perspective. U.P.B. Sci. Bull. Ser. A 73 19–26.
• [111] Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99 6567–6572.
• [112] Tukey, J. W. (1976). T13 N: The higher criticism. Course Notes. Stat. 411. Princeton Univ.
• [113] Tukey, J. W. (1989). Higher criticism for individual significances in several tables or parts of tables. Internal working paper, Princeton Univ.
• [114] Tukey, J. W. (1994) The problem of multiple comparisons. In The Collected Works of John W. Tukey. Vol. III (H. I. Braun, ed.) 1948–1983. Chapman & Hall, London.
• [115] Vielva, P. (2010). A comprehensive overview of the cold spot. Advances in Astronomy Article ID 592094, 20 p.
• [116] Walther, G. (2013). The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures. In IMS Collections. From Probability to Statistics and Back: High Dimensional Models and Processes 9 317–326. IMS, Beachwood, OH.
• [117] Wehrens, R. and Franceschi, P. (2012). Thresholding for biomarker selection in multivariate data using higher criticism. Mol. Biosyst. 8 2339–2346.
• [118] Wellner, J. A. and Koltchinskii, V. (2003). A note on the asymptotic distribution of Berk–Jones type statistics under the null hypothesis. In High Dimensional Probability III (J. Hoffmann-Jorgensen, M. B. Marcus and J. A. Wellner, eds.) 321–332. Birkhäuser, Basel.
• [119] Wu, M., Sanchez, B. N. and Song, P. (2013). Study design in high-dimensional classification analysis. Preprint.
• [120] Wu, Z., Sun, Y., He, S., Choy, J., Zhao, H. and Jin, J. (2014). Detection boundary and higher criticism approach for sparse and weak genetic effects. Ann. Appl. Stat. 8 824–851.
• [121] Xie, J., Cai, T. T. and Li, H. (2011). Sample size and power analysis for sparse signal recovery in genome-wide association studies. Biometrika 98 273–290.
• [122] Zhong, P., Chen, S. and Xu, M. (2013). Test alternative to higher criticism for high dimensional means under sparsity and column-wise dependence. Ann. Statist. 41 2820–2851.