The Annals of Statistics

Properties of higher criticism under strong dependence

Peter Hall and Jiashun Jin

Full-text: Open access

Abstract

The problem of signal detection using sparse, faint information is closely related to a variety of contemporary statistical problems, including the control of false-discovery rate, and classification using very high-dimensional data. Each problem can be solved by conducting a large number of simultaneous hypothesis tests, the properties of which are readily accessed under the assumption of independence. In this paper we address the case of dependent data, in the context of higher criticism methods for signal detection. Short-range dependence has no first-order impact on performance, but the situation changes dramatically under strong dependence. There, although higher criticism can continue to perform well, it can be bettered using methods based on differences of signal values or on the maximum of the data. The relatively inferior performance of higher criticism in such cases can be explained in terms of the fact that, under strong dependence, the higher criticism statistic behaves as though the data were partitioned into very large blocks, with all but a single representative of each block being eliminated from the dataset.

Article information

Source
Ann. Statist., Volume 36, Number 1 (2008), 381-402.

Dates
First available in Project Euclid: 1 February 2008

Permanent link to this document
https://projecteuclid.org/euclid.aos/1201877306

Digital Object Identifier
doi:10.1214/009053607000000767

Mathematical Reviews number (MathSciNet)
MR2387976

Zentralblatt MATH identifier
1139.62049

Subjects
Primary: 62G10: Hypothesis testing 62M10: Time series, auto-correlation, regression, etc. [See also 91B84]
Secondary: 62G32: Statistics of extreme values; tail inference 62G20: Asymptotic properties

Keywords
Correlation dependent data faint information Gaussian process signal detection simultaneous hypothesis testing sparsity

Citation

Hall, Peter; Jin, Jiashun. Properties of higher criticism under strong dependence. Ann. Statist. 36 (2008), no. 1, 381--402. doi:10.1214/009053607000000767. https://projecteuclid.org/euclid.aos/1201877306


Export citation

References

  • Anon (2005). A new method for early detection of disease outbreaks. Science Daily 23rd February. Available at http://www.sciencedaily.com/releases/2005/02/050218130731.htm.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Bernhard, G., Klein, M. and Hommel, G. (2004). Global and multiple test procedures using ordered p-values—a review. Statist. Papers 45 1–14.
  • Cai, T., Jin, J. and Low, M. (2007). Estimation and confidence sets for sparse normal mixtures. Ann. Statist. To appear.
  • Cayon, L., Jin, J. and Treaster, A. (2005). Higher criticism statistic: Detecting and identifying non-Gaussianity in the WMAP first year data. Mon. Not. Roy. Astron. Soc. 362 826–832.
  • Delaigle, A. and Hall, P. (2007). Using thresholding methods to extend higher criticism classification to non-normal, dependent vector components. In preparation.
  • Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • Donoho, D. and Jin, J. (2006). Asymptotic minimaxity of false discovery rate thresholding for sparse exponential data. Ann. Statist. 34 2980–3018.
  • Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 73–103.
  • Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control. Ann. Statist. 32 1035–1061.
  • Hall, P., Pittelkow, Y. and Ghosh, M. (2007). Relative performance of classifiers for high-dimensional data and small sample sizes. J. Roy. Statist. Soc. Ser. B. To appear.
  • Hochberg, Y. and Tahame, A. C. (1987). Multiple Comparison Procedures. Wiley, New York.
  • Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distribution. Math. Methods Statist. 6 47–69.
  • Ingster, Y. I. (1999). Minimax detection of a signal for lpn-balls. Math. Methods Statist. 7 401–428.
  • Jager, L. and Wellner, J. (2007). Goodness-of fit tests via phi-divergences. Ann. Statist. To appear.
  • Jin, J. (2004). Detecting a target in very noisy data from multiple looks. In A Festschrift to Honor Herman Rubin. IMS Lecture Notes Monogr. Ser. 45 255–286. Inst. Math. Statist., Beachwood, OH.
  • Jin, J. (2007). Proportion of nonzero normal means: Universal oracle equivalences and uniformly consistent estimations. J. Roy. Statist. Soc. Ser. B. To appear.
  • Jin, J. and Cai, T. (2006). Estimating the null and the proportion of non-null effects in large-scale multiple comparisons. J. Amer. Statist. Assoc. 102 495–506.
  • Jin, J., Starck, J.-L., Donoho, D., Aghanim, N. and Forni, O. (2005). Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests. EURASIP J. Appl. Sig. Proc. 15 2470–2485.
  • Jin, J., Peng, J. and Wang, P. (2007). Estimating the proportion of non-null effects, with applications to CGH data. Manuscript.
  • Knuth, D. E. (1969). The Art of Computer Programming. 1. Fundamental Algorithms. Addison-Wesley, Reading, MA.
  • Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • Meinshausen, M. and Rice, J. (2006). Estimating the proportion of false null hypotheses among a large number of independent tested hypotheses. Ann. Statist. 34 373–393.
  • Pigeot, I. (2000). Basic concepts of multiple tests—A survey. Statist. Papers 41 3–36.
  • Storey, J. D., Dai, J. Y. and Leek, J. T. (2005). The optimal discovery procedure for large-scale significance testing, with applications to comparative microarray experiments. Biostatistics 8 414–432.
  • Swanepoel, J. W. H. (1999). The limiting behavior of a modified maximal symmetric 2s-spacing with applications. Ann. Statist. 27 24–35.
  • Tukey, J. W. (1976). T13 N: The higher criticism. Course Notes, Statistics 411, Princeton Univ.
  • Wood, A. T. A. and Chan, G. (1994). Simulation of stationary Gaussian processes in [0, 1]d. J. Comput. Graph. Statist. 3 409–432.