Annals of Statistics

Higher criticism for detecting sparse heterogeneous mixtures

David Donoho and Jiashun Jin

Full-text: Open access


Higher criticism, or second-level significance testing, is a multiple-comparisons concept mentioned in passing by Tukey. It concerns a situation where there are many independent tests of significance and one is interested in rejecting the joint null hypothesis. Tukey suggested comparing the fraction of observed significances at a given α-level to the expected fraction under the joint null. In fact, he suggested standardizing the difference of the two quantities and forming a z-score; the resulting z-score tests the significance of the body of significance tests.

We consider a generalization, where we maximize this z-score over a range of significance levels 0<α≤α0. We are able to show that the resulting higher criticism statistic is effective at resolving a very subtle testing problem: testing whether n normal means are all zero versus the alternative that a small fraction is nonzero.

The subtlety of this “sparse normal means” testing problem can be seen from work of Ingster and Jin, who studied such problems in great detail. In their studies, they identified an interesting range of cases where the small fraction of nonzero means is so small that the alternative hypothesis exhibits little noticeable effect on the distribution of the p-values either for the bulk of the tests or for the few most highly significant tests. In this range, when the amplitude of nonzero means is calibrated with the fraction of nonzero means, the likelihood ratio test for a precisely specified alternative would still succeed in separating the two hypotheses.

We show that the higher criticism is successful throughout the same region of amplitude sparsity where the likelihood ratio test would succeed. Since it does not require a specification of the alternative, this shows that higher criticism is in a sense optimally adaptive to unknown sparsity and size of the nonnull effects. While our theoretical work is largely asymptotic, we provide simulations in finite samples and suggest some possible applications. We also show that higher critcism works well over a range of non-Gaussian cases.

Article information

Ann. Statist., Volume 32, Number 3 (2004), 962-994.

First available in Project Euclid: 24 May 2004

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G10: Hypothesis testing
Secondary: 62G32: Statistics of extreme values; tail inference 62G20: Asymptotic properties

Multiple comparsions combining many p-values sparse normal means thresholding normalized empirical process


Donoho, David; Jin, Jiashun. Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 (2004), no. 3, 962--994. doi:10.1214/009053604000000265.

Export citation


  • Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2000). Adapting to unknown sparsity by controlling the false discovery rate. Technical report 2000-19, Dept. Statistics, Stanford Univ.
  • Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Statist. 23 193–212.
  • Becker, B. J. (1994). Combining significance levels. In The Handbook of Research Synthesis (H. Cooper and L. Hedges, eds.) Chap. 15. Russell Sage Foundation, New York.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistic. Z. Wahrsch. Verw. Gebiete 47 47–59.
  • Bickel, P. J. and Chernoff, H. (1993). Asymptotic distribution of the likelihood ratio statistic in a prototypical nonregular problem. In Statistics and Probability: A Raghu Raj Bahadur Festschrift (J. K. Ghosh, S. K. Mitra, K. R. Parthasarathy and B. L. S. Prakasa Rao, eds.) 83–96. Wiley Eastern, New Delhi.
  • Borovkov, A. A. and Sycheva, N. M. (1968). On some asymptotically optimal nonparametric tests. Teor. Verojatnost. i Primenen. 13 385–418.
  • Borovkov, A. A. and Sycheva, N. M. (1970). On asymptotically optimal nonparametric criteria. In Nonparametric Techniques in Statistical Inference (M. L. Puri, ed.) 259–266. Cambridge Univ. Press.
  • Box, G. E. P. and Tiao, G. C. (1973). Bayesian Inference in Statistical Analysis. Addison–Wesley, Reading, MA.
  • Brožek, J. and Tiede, K. (1952). Reliable and questionable significance in a series of statistical tests. Psychological Bull. 49 339–341.
  • Darling, D. A. and Erdös, P. (1956). A limit theorem for the maximum of normalized sums of independent random variables. Duke Math. J. 23 143–155.
  • Eicker, F. (1979). The asymptotic distribution of the suprema of the standardized empirical processes. Ann. Statist. 7 116–138.
  • Fisher, R. A. (1932). Statistical Methods for Research Workers, 4th ed. Oliver and Boyd, Edinburg.
  • Hartigan, J. A. (1985). A failure of likelihood asymptotics for normal mixtures. In Proc. Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer (L. M. Le Cam and R. A. Olshen, eds.) 2 807–810. Wadsworth, Monterey, CA.
  • Hochberg, Y. and Tamhane, A. C. (1987). Multiple Comparison Procedures. Wiley, New York.
  • Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distribution. Math. Methods Statist. 6 47–69.
  • Ingster, Y. I. (1999). Minimax detection of a signal for $l^p_n$-balls. Math. Methods Statist. 7 401–428.
  • Ingster, Y. I. (2002). Adaptive detection of a signal of growing dimension, I, II. Math. Methods Statist. 10 395–421; 11 37–68.
  • Ingster, Y. I. and Lepski, O. (2002). On multichannel signal detection. Preprint.
  • Ingster, Y. I. and Suslina, I. A. (2000). Minimax nonparametric hypothesis testing for ellipsoids and Besov bodies. ESAIM Probab. Statist. (electronic) 4 53–135.
  • Ingster, Y. I. and Suslina, I. A. (2004). On multichannel detection of a signal of known shape. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov (POMI). To appear.
  • Jaeschke, D. (1979). The asymptotic distribution of the supremum of the standardized empirical distribution function on subintervals. Ann. Statist. 7 108–115.
  • Jin, J. (2002). Detection boundary for sparse mixtures. Unpublished manuscript.
  • Johnson, N. L., Kotz, S. and Balakrishan, N. (1995). Continuous Univariate Distribution, 2nd ed. 2. Wiley, New York.
  • Kendall, D. G. and Kendall, W. S. (1980). Alignments in two-dimensional random sets of points. Adv. in Appl. Probab. 12 380–424.
  • Miller, R. G., Jr. (1966). Simultaneous Statistical Inference. McGraw–Hill, New York.
  • Shorack, G. R. and Wellner, J. A. (1986). Empirical Processes with Applications to Statistics. Wiley, New York.
  • Simoncelli, E. P. (1999). Modeling the joint statistics of images in the wavelet domain. In Proc. SPIE 3813 188–195. SPIE–-The International Society for Optical Engineering, Bellingham, WA.
  • Subbotin, M. T. (1923). On the law of frequency of errors. Mat. Sb. 31 296–301.
  • Tukey, J. W. (1965). Which part of the sample contains the information? Proc. Natl. Acad. Sci. U.S.A. 153 127–134.
  • Tukey, J. W. (1976). T13 N: The higher criticism. Course Notes, Statistics 411, Princeton Univ.
  • Tukey, J. W. (1989). Higher criticism for individual significances in several tables or parts of tables. Working Paper, Princeton Univ.
  • Tukey, J. W. (1953). The problem of multiple comparisons. In The Collected Works of John W. Tukey VIII. Multiple Comparisons: 1948–1983 (H. I. Braun, ed.) 1–300. Chapman and Hall, New York.
  • Wellner, J. A. (1978). Limit theorems for the ratio of the empirical distribution function to the true distribution function. Z. Wahrsch. Verw. Gebiete 45 73–88.
  • Wellner, J. A. and Koltchinskii, V. (2004). A note on the asymptotic distribution of Berk–Jones type statistics under the null hypothesis. In High Dimensional Probability III (T. Hoffmann-Jørgensen, M. B. Marcus and J. A. Wellner, eds.) 321–332. Birhäuser, Basel.