Statistical Science

Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects

David Donoho and Jiashun Jin

Full-text: Open access


In modern high-throughput data analysis, researchers perform a large number of statistical tests, expecting to find perhaps a small fraction of significant effects against a predominantly null background. Higher Criticism (HC) was introduced to determine whether there are any nonzero effects; more recently, it was applied to feature selection, where it provides a method for selecting useful predictive features from a large body of potentially useful features, among which only a rare few will prove truly useful.

In this article, we review the basics of HC in both the testing and feature selection settings. HC is a flexible idea, which adapts easily to new situations; we point out simple adaptions to clique detection and bivariate outlier detection. HC, although still early in its development, is seeing increasing interest from practitioners; we illustrate this with worked examples. HC is computationally effective, which gives it a nice leverage in the increasingly more relevant “Big Data” settings we see today.

We also review the underlying theoretical “ideology” behind HC. The Rare/Weak (RW) model is a theoretical framework simultaneously controlling the size and prevalence of useful/significant items among the useless/null bulk. The RW model shows that HC has important advantages over better known procedures such as False Discovery Rate (FDR) control and Family-wise Error control (FwER), in particular, certain optimality properties. We discuss the rare/weak phase diagram, a way to visualize clearly the class of RW settings where the true signals are so rare or so weak that detection and feature selection are simply impossible, and a way to understand the known optimality properties of HC.

Article information

Statist. Sci., Volume 30, Number 1 (2015), 1-25.

First available in Project Euclid: 4 March 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Classification control of FDR feature selection Higher Criticism large covariance matrix large-scale inference rare and weak effects phase diagram sparse signal detection


Donoho, David; Jin, Jiashun. Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects. Statist. Sci. 30 (2015), no. 1, 1--25. doi:10.1214/14-STS506.

Export citation


  • [1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
  • [2] Ahdesmäki, M. and Strimmer, K. (2010). Feature selection in omics prediction problems using cat scores and false nondiscovery rate control. Ann. Appl. Stat. 4 503–519.
  • [3] Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired Magazine 16.07 June 23 2008.
  • [4] Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Statist. 23 193–212.
  • [5] Arias-Castro, E., Candès, E. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.
  • [6] Arias-Castro, E., Candès, E. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
  • [7] Arias-Castro, E. and Wang, M. (2013). Distribution free tests for sparse heterogeneous mixtures. Available at arXiv:1308.0346.
  • [8] Balabdaoui, F., Jankowski, H., Pavlides, M., Seregin, A. and Wellner, J. (2011). On the Grenander estimator at zero. Statist. Sinica 21 873–899.
  • [9] Benjamini, Y. (2010). Discovering the false discovery rate. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 405–416.
  • [10] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 57 289–300.
  • [11] Bennett, M. F., Melatos, A., Delaigle, A. and Hall, P. (2012). Reanalysis of $F$-statistics gravitational-wave search with the higher criticism statistics. Astrophys. J. 766 1–10.
  • [12] Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominates the Kolmogorov statistic. Z. Wahrsch. Verw. Geb. 47 47–59.
  • [13] Bickel, P. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [14] Blomberg, N. (2012). Higher criticism testing for signal detection in rare and weak models. Master thesis, KTH Royal Institute of Technology, Stockholm, Sweden.
  • [15] Bogdan, M., Ghosh, J. and Tokdar, T. (2008). A comparison of the Benjamini–Hochberg procedure with some Bayesian rules for multiple testing. In Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen 1 211–230. IMS, Beachwood, OH.
  • [16] Bogdan, M., Chakrabarti, A., Frommlet, F. and Ghosh, J. (2011). Asymptotic Bayes-optimality under sparsity of some multiple testing procedures. Ann. Statist. 39 1551–1579.
  • [17] Box, M. and Meyer, D. (1986). An analysis for unreplicated fractional factorials. Technometrics 28 11–18.
  • [18] Breiman, L. (2001). Random forests. Mach. Learn. 24 5–32.
  • [19] Burges, C. (1998). A tutorial on support vector machines for pattern recognition. Data Min. Knowl. Discov. 2 121–167.
  • [20] Butucea, C. and Ingster, Y. I. (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 2652–2688.
  • [21] Cai, T., Jeng, J. and Jin, J. (2011). Detecting sparse heterogeneous and heteroscedastic mixtures. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 629–662.
  • [22] Cai, T. and Jin, J. (2010). Optimal rate of convergence of estimating the null density and the proportion of non-null effects in large-scale multiple testing. Ann. Statist. 38 100–145.
  • [23] Cai, T., Jin, J. and Low, M. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. 35 2421–2449.
  • [24] Cai, T., Liu, W. and Luo, X. (2010). A constrained $L^{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
  • [25] Cai, T. and Wu, Y. (2014). Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inform. Theory 60 2217–2232.
  • [26] Cayon, L., Banday, A. J. et al. (2006). No higher criticism of the Bianchi-corrected Wilkinson microwave anisotropy probe data. Mon. Not. Roy. Astron. Soc. 369 598–602.
  • [27] Cayon, L., Jin, J. and Treaster, A. (2004). Higher criticism statistic: Detecting and identifying non-Gaussianity in the WMAP first year data. Mon. Not. Roy. Astron. Soc. 362 826–832.
  • [28] Charbonnier, G. (2012). Inference of gene regulatory network from non independently and identically distributed transcriptomic data. Ph.D. thesis, Univ. d’Évry Val-d’Essonne, France.
  • [29] Cruz, M., Cayon, L., Martinez-Gonzalez, E., Vielva, P. and Jin, J. (2007). The non-Gaussian cold spot in the 3 year Wilkinson Microwave Anisotropy Probe data. Astrophys. J. 655 11–20.
  • [30] Dai, H., Charnigo, R., Srivastava, T., Talebizadeh, Z. and Qing, S. (2012). Integrating $P$-values for genetic and genomic data analysis. J. Biom. Biostat. 2012 3–7.
  • [31] Dasgupta, A., Lahiri, S. and Stoyanov, J. (2014). Sharp fixed $n$ bounds and asymptotic expansions for the means and the median of a Gaussian sample maximum, and applications to Donoho–Jin model. Stat. Methodol. 20 40–62.
  • [32] de Uña-Alvarez, J. (2012). The Beta-Binomial SGoF method for multiple dependent tests. Stat. Appl. Genet. Mol. Biol. 11 1544–6115.
  • [33] Delaigle, A. and Hall, P. (2009). Higher criticism in the context of unknown distribution, non-independence and classification. In Perspectives in Mathematical Sciences I: Probability and Statistics (N. Sastry, M. Delampady, B. Rajeev and T. S. S. R. K. Rao, eds.) 109–138. World Scientific.
  • [34] Delaigle, A., Hall, J. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s t statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 283–301.
  • [35] Dettling, M. (2004). BagBoosting for tumor classification with gene expression data. Bioinformatics 20 3583–3593.
  • [36] Dettling, M. and Bühlmann, P. (2003). Boosting for tumor classification with gene expression data. Bioinformatics 19 1061–1069.
  • [37] De la Cruz, O., Wen, X., Ke, B., Song, M. and Nicolae, D. L. (2010). Gene, region and pathway level analyses in whole-genome studies. Genet. Epidemiol. 34 222–231.
  • [38] Donoho, D. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
  • [39] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • [40] Donoho, D. and Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 105 14790–14795.
  • [41] Donoho, D. and Jin, J. (2009). Feature selection by Higher Criticism thresholding: Optimal phase diagram. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4449–4470.
  • [42] Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • [43] Efron, B. (2011). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge Univ. Press, Cambridge.
  • [44] Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 99 96–104.
  • [45] El-Fishawy, P. (2013). Common disease-common variant hypothesis. Encyclopedia of Autism Spectral Disorder 719–720.
  • [46] Fan, J. and Fan, Y. (2008). High dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
  • [47] Fan, Y., Jin, J. and Yao, Z. (2013). Optimal feature selection by higher criticism in sparse Gaussian graphic model. Ann. Statist. 41 2537–2571.
  • [48] Fienberg, S. and Jin, J. (2012). Privacy-preserving data sharing in high dimensional regression and classification settings. J. Privacy and Confidentiality 4 Article 10.
  • [49] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • [50] Gayraud, G. and Ingster, Y. I. (2011). Detection of sparse variable functions. Available at arXiv:1011.6369.
  • [51] Ge, Y. and Li, X. (2012). Control of the false discovery proportion for independently tested null hypotheses. J. Probab. Stat. Article ID 320425, 19 pages.
  • [52] Genovese, C., Jin, J., Wasserman, L. and Yao, Z. (2012). A comparison of the lasso and marginal regression. J. Mach. Learn. Res. 13 2107–2143.
  • [53] Goldstein, D. B. (2009). Common genetic variation and human traits. New England J. Med. 360 1696–1698.
  • [54] Golub, T. et al. (1999). Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531–536.
  • [55] Gontscharuk, V., Landwehr, S. and Finner, H. (2012). On the behavior of local levels of higher criticism tests. Electron. J. Stat. 1–27.
  • [56] Gordon, G. J., Jensen, R. V., Hsiao, L.-L., Gullans, S. R., Blumenstock, J. E., Ramaswamy, S., Richards, W. G., Sugarbaker, D. J. and Bueno, R. (2002). Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. 62 4963–4967.
  • [57] Greenshtein, E. and Park, J. (2012). Robust test for detecting a signal in a high dimensional sparse normal vector. J. Statist. Plann. Inference 142 1445–1456.
  • [58] Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. Ann. Statist. 36 381–402.
  • [59] Hall, P. and Jin, J. (2009). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • [60] Hall, P., Pittelkow, Y. and Ghosh, M. (2008). Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 158–173.
  • [61] Haupt, J., Castro, R. and Nowak, R. (2008). Adaptive discovery of sparse signals in noise. In 42nd Asilomar Conference on Signals, Systems and Computers 1727–1731.
  • [62] Haupt, J., Castro, R. and Nowak, R. (2010). Improved bounds for sparse recovery from adaptive measurements. In Information Theory Proceedings (ISIT) 1565–1567.
  • [63] He, S. and Wu, Z. (2011). Gene-based higher criticism methods for large-scale exonic single-nucleotide polymorphism data. In BMC Proceedings 5 (Suppl 9) S65.
  • [64] Huang, S. and Jin, J. (2014). Partial correlation screening for estimating large precision matrix, with applications to classifications. Preprint.
  • [65] Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distribution. Math. Methods Statist. 6 47–69.
  • [66] Ingster, Y. I. (1999). Minimax detection of a signal for $l^{p}_{n}$-balls. Math. Methods Statist. 7 401–428.
  • [67] Ingster, Y. I., Pouet, C. and Tsybakov, A. B. (2009). Classification of sparse high-dimensional vectors. Philos. Trans. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 367 4427–4448.
  • [68] Ingster, Y. I., Tsybakov, A. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
  • [69] Iyengar, S. K. and Elston, R. C. (2007). The genetic basis of complex traits: Rare variants or “common gene, common disease”? Methods Mol. Biol. 376 71–84.
  • [70] Jager, L. and Wellner, J. A. (2004). On the “Poisson boundaries” of the family of weighted Kolmogorov statistics. IMS Monograph 45 319–331.
  • [71] Jager, L. and Wellner, J. A. (2007). Goodness-of-fit tests via phi-divergences. Ann. Statist. 35 2018–2053.
  • [72] Jeng, J., Cai, T. and Li, H. (2010). Optimal sparse segment identification with application in copy number variation analysis. J. Amer. Statist. Assoc. 105 1156–1166.
  • [73] Jeng, J., Cai, T. and Li, H. (2013). Simultaneous discovery of rare and common segment variants. Biometrika 100 157–172.
  • [74] Ji, P. and Jin, J. (2011). UPS delivers optimal phase diagram in high dimensional variable selection. Ann. Statist. 40 73–103.
  • [75] Jin, J. (2003). Detecting and estimating sparse mixtures. Ph.D. thesis, Dept. Statistics, Stanford Univ., Stanford, CA.
  • [76] Jin, J. (2004). Detecting a target in very noisy data from multiple looks. IMS Monograph 45 255–286.
  • [77] Jin, J. (2007). Proportion of non-zero normal means: Universal oracle equivalences and uniformly consistent estimators. J. Roy. Statist. Soc. 70 461–493.
  • [78] Jin, J. (2009). Impossibility of successful classification when useful features are rare and weak. Proc. Natl. Acad. Sci. USA 106 8859–9964.
  • [79] Jin, J. and Cai, T. (2007). Estimating the null and the proportion of nonnull effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
  • [80] Jin, J. and Wang, W. (2014a). Important features principle component Analysis for high-dimensional clustering. Available at arXiv:1407.5241.
  • [81] Jin, J. and Wang, W. (2014b). Optimal feature selection for important features PCA in high dimensional clustering. Preprint.
  • [82] Jin, J., Zhang, C.-H. and Zhang, Q. (2014). Optimality of Graphlet Screening in high dimensional variable selection. J. Mach. Learn. Res. 15 2723–2772.
  • [83] Jin, J., Starck, J.-L., Donoho, D., Aghanim, N. and Forni, O. (2005). Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests. EURASIP J. Appl. Signal Processing 15 2470–2485.
  • [84] Johnstone, I. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [85] Ke, T., Jin, J. and Fan, J. (2014). Covariate assisted screening and estimation. Ann. Statist. 42 2202–2242.
  • [86] Kendall, D. G. and Kendall, W. S. (1980). Alignments in two-dimensional random sets of points. Adv. in Appl. Probab. 12 380–424.
  • [87] Klaus, B. and Strimmer, K. (2010). Thresholding for feature selection in genomic: Higher criticism versus false non-discovery rate. In Proceedings of the 7th International Workshop on Computational Systems Biology 59–62. Luxembourg.
  • [88] Klaus, B. and Strimmer, K. (2013). Signal identification for rare and weak features: Higher Criticism and false discovery rate. Biostat. 14 129–143.
  • [89] Laurent, B., Marteau, C. and Maugis-Rabusseau, C. (2013). Non-asymptotic detection of two-component mixture with unknown means. Available at arXiv:1304.6924.
  • [90] Li, J. and Siegmund, D. (2014). Higher criticism: $p$-values and criticism. Preprint.
  • [91] Liu, W. and Shao, Q. M. (2013). A Cramér Rao moderate deviation theorem for Hotelling’s $T^{2}$-statistic with applications to global tests. Ann. Statist. 41 296–322.
  • [92] Martin, L., Gao, G., Kang, G., Fang, Y. and Woo, J. (2009). Improving the signal-to-noise ratio in genome-wide association studies. Genetic Epidemiology 33 29–32.
  • [93] Meinshausen, N. (2008). Hierarchical testing of variable importance. Biometrika 95 265–278.
  • [94] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs with the lasso. Ann. Statist. 34 1436–1462.
  • [95] Meinshausen, N. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses. Ann. Statist. 34 373–393.
  • [96] Mukherjee, R., Pillai, N. and Lin, X. (2013). Hypothesis testing for sparse binary regression. Available at arXiv:1308.0764.
  • [97] Muralidharan, O. (2010). Detecting column dependence when rows are correlated and estimating the strength of the row correlation. Electron. J. Stat. 4 1527–1546.
  • [98] Neill, D. (2006). Detection of spatial and spatio-temporal Clusters. Ph.D. thesis, School of Computer Science, Carnegie Mellon Univ., Pittsburgh, PA.
  • [99] Neill, D. and Lingwall, J. (2007). A nonparametric scan statistic for multivariate disease surveillance. Advances in Disease Surveillance 4 106–106.
  • [100] Park, J. and Ghosh, J. (2010). A guided random walk through some high dimensional problems. Sankhya A 72 81–100.
  • [101] Parkhomenko, E., Tritchler, D., Lemire, M. et al. (2009). Using a higher criticism statistic to detect modest effects in a genome-wide study of rheumatoid arthritis. BMC Proceedings 3 S40.
  • [102] Pires, S., Starck, J.-L., Amara, A., Refregier, A. and Teyssier, R. (2009). Cosmological models discrimination with weak lensing. Astronom. Astrophys. Lib. 505 969–979.
  • [103] Roeder, K. and Wasserman, L. (2009). Genome-wide significance levels and weighted hypothesis testing. Statist. Sci. 24 398–413.
  • [104] Rohban, M. H., Ishwar, P., Orteny, P., Karl, W. C. and Saligrama, V. (2013). An impossibility result for high dimensional supervised learning. In Information Theory Workshop (ITW), 2013 IEEE, 1–5. IEEE.
  • [105] Ruben, H. (1960). Probability content of regions under spherical normal distribution, I. Ann. Statist. 31 598–618.
  • [106] Sabatti, C., Service, S., Hartikainen, A. L. et al. (2008). Genome-wide association analysis of metabolic traits in a birth cohort from a founder population. Nature Genetics 41 35–46.
  • [107] Saligrama, V. and Zhao, M. (2012). Local anomaly detection. JMLR W&CP 22 969–983.
  • [108] Shorack, G. and Wellner, J. (1986). Empirical Processes with Applications to Statistics. Wiley, New York.
  • [109] Suleiman, R. F. R., Mary, D. and Ferrari, A. (2013). Minimax sparse detection based on one-class classifiers. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 5553–5557.
  • [110] Sun, L. (2011). On the efficiency of genome-wide scans: A multiple hypothesis testing perspective. U.P.B. Sci. Bull. Ser. A 73 19–26.
  • [111] Tibshirani, R., Hastie, T., Narasimhan, B. and Chu, G. (2002). Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. USA 99 6567–6572.
  • [112] Tukey, J. W. (1976). T13 N: The higher criticism. Course Notes. Stat. 411. Princeton Univ.
  • [113] Tukey, J. W. (1989). Higher criticism for individual significances in several tables or parts of tables. Internal working paper, Princeton Univ.
  • [114] Tukey, J. W. (1994) The problem of multiple comparisons. In The Collected Works of John W. Tukey. Vol. III (H. I. Braun, ed.) 1948–1983. Chapman & Hall, London.
  • [115] Vielva, P. (2010). A comprehensive overview of the cold spot. Advances in Astronomy Article ID 592094, 20 p.
  • [116] Walther, G. (2013). The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures. In IMS Collections. From Probability to Statistics and Back: High Dimensional Models and Processes 9 317–326. IMS, Beachwood, OH.
  • [117] Wehrens, R. and Franceschi, P. (2012). Thresholding for biomarker selection in multivariate data using higher criticism. Mol. Biosyst. 8 2339–2346.
  • [118] Wellner, J. A. and Koltchinskii, V. (2003). A note on the asymptotic distribution of Berk–Jones type statistics under the null hypothesis. In High Dimensional Probability III (J. Hoffmann-Jorgensen, M. B. Marcus and J. A. Wellner, eds.) 321–332. Birkhäuser, Basel.
  • [119] Wu, M., Sanchez, B. N. and Song, P. (2013). Study design in high-dimensional classification analysis. Preprint.
  • [120] Wu, Z., Sun, Y., He, S., Choy, J., Zhao, H. and Jin, J. (2014). Detection boundary and higher criticism approach for sparse and weak genetic effects. Ann. Appl. Stat. 8 824–851.
  • [121] Xie, J., Cai, T. T. and Li, H. (2011). Sample size and power analysis for sparse signal recovery in genome-wide association studies. Biometrika 98 273–290.
  • [122] Zhong, P., Chen, S. and Xu, M. (2013). Test alternative to higher criticism for high dimensional means under sparsity and column-wise dependence. Ann. Statist. 41 2820–2851.