The Annals of Statistics

Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism

Ery Arias-Castro, Emmanuel J. Candès, and Yaniv Plan

Full-text: Open access

Abstract

Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under the assumption that the coefficient vector is sparse, a common situation in modern high-dimensional settings. Suppose we have p covariates and that under the alternative, the response only depends upon the order of p1−α of those, 0 ≤ α ≤ 1. Under moderate sparsity levels, that is, 0 ≤ α ≤ 1/2, we show that ANOVA is essentially optimal under some conditions on the design. This is no longer the case under strong sparsity constraints, that is, α > 1/2. In such settings, a multiple comparison procedure is often preferred and we establish its optimality when α ≥ 3/4. However, these two very popular methods are suboptimal, and sometimes powerless, under moderately strong sparsity where 1/2 < α < 3/4. We suggest a method based on the higher criticism that is powerful in the whole range α > 1/2. This optimality property is true for a variety of designs, including the classical (balanced) multi-way designs and more modern “p > n” designs arising in genetics and signal processing. In addition to the standard fixed effects model, we establish similar results for a random effects model where the nonzero coefficients of the regression vector are normally distributed.

Article information

Source
Ann. Statist., Volume 39, Number 5 (2011), 2533-2556.

Dates
First available in Project Euclid: 30 November 2011

Permanent link to this document
https://projecteuclid.org/euclid.aos/1322663467

Digital Object Identifier
doi:10.1214/11-AOS910

Mathematical Reviews number (MathSciNet)
MR2906877

Zentralblatt MATH identifier
1231.62136

Subjects
Primary: 62G10: Hypothesis testing 94A13: Detection theory
Secondary: 62G20: Asymptotic properties

Keywords
Detecting a sparse signal analysis of variance higher criticism minimax detection incoherence random matrices suprema of Gaussian processes compressive sensing

Citation

Arias-Castro, Ery; Candès, Emmanuel J.; Plan, Yaniv. Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 (2011), no. 5, 2533--2556. doi:10.1214/11-AOS910. https://projecteuclid.org/euclid.aos/1322663467


Export citation

References

  • [1] Akritas, M. G. and Papadatos, N. (2004). Heteroscedastic one-way ANOVA and lack-of-fit tests. J. Amer. Statist. Assoc. 99 368–382.
  • [2] Arias-Castro, E., Candès, E. J. and Plan, Y. Supplement to “Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism.” DOI:10.1214/11-AOS910SUPP.
  • [3] Berman, S. M. (1964). Limit theorems for the maximum term in stationary sequences. Ann. Math. Statist. 35 502–516.
  • [4] Candès, E. J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52 489–509.
  • [5] Candes, E. J. and Tao, T. (2006). Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Trans. Inform. Theory 52 5406–5425.
  • [6] Castagna, J. P., Sun, S. and Siegfried, R. W. (2003). Instantaneous spectral analysis: Detection of low-frequency shadows associated with hydrocarbons. The Leading Edge 22 120–127.
  • [7] Churchill, G. (2002). Fundamentals of experimental design for cDNA microarrays. Nature Genetics 32 490–495.
  • [8] Deo, C. M. (1972). Some limit theorems for maxima of absolute values of Gaussian sequences. Sankhyā Ser. A 34 289–292.
  • [9] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • [10] Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
  • [11] Donoho, D. L. and Huo, X. (2001). Uncertainty principles and ideal atomic decomposition. IEEE Trans. Inform. Theory 47 2845–2862.
  • [12] Duarte, M., Davenport, M., Wakin, M. and Baraniuk, R. (2006). Sparse signal detection from incoherent projections. In 2006 IEEE International Conference on Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings 3 III–III.
  • [13] Dudoit, S., Shaffer, J. P. and Boldrick, J. C. (2003). Multiple hypothesis testing in microarray experiments. Statist. Sci. 18 71–103.
  • [14] Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • [15] Fisher, R. A. (1973). Statistical Methods for Research Workers, 14th ed.—revised and enlarged. Hafner, New York.
  • [16] Goeman, J. J., van de Geer, S. A. and van Houwelingen, H. C. (2006). Testing against a high dimensional alternative. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 477–493.
  • [17] Gribonval, R. and Bacry, E. (2003). Harmonic decomposition of audio signals with matching pursuit. IEEE Trans. Signal Process. 51 101–111.
  • [18] Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. Ann. Statist. 36 381–402.
  • [19] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • [20] Haupt, J. and Nowak, R. (2007). Compressive sampling for signal detection. In IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) 3 III-1509–III-1512.
  • [21] Honig, M. (2009). Advances in Multiuser Detection. Wiley, Hoboken, NJ.
  • [22] Ingster, Y. I. (1998). Minimax detection of a signal for ln-balls. Math. Methods Statist. 7 401–428 (1999).
  • [23] Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
  • [24] James, D., Clymer, B. D. and Schmalbrock, P. (2001). Texture detection of simulated microcalcification susceptibility effects in magnetic resonance imaging of breasts. Journal of Magnetic Resonance Imaging 13 876–881.
  • [25] Jin, J. (2003). Detecting and estimating sparse mixtures. Ph.D. thesis, Stanford Univ.
  • [26] Kerr, M., Martin, M. and Churchill, G. (2000). Analysis of variance for gene expression microarray data. J. Comput. Biol. 7 819–837.
  • [27] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
  • [28] Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • [29] Lemmens, P. W. H. and Seidel, J. J. (1973). Equiangular lines. J. Algebra 24 494–512.
  • [30] Mallat, S. (2009). A Wavelet Tour of Signal Processing: The Sparse Way, 3rd ed. Academic Press, Amsterdam.
  • [31] Mallat, S. and Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41 3397–3415.
  • [32] Mathew, T. and Sinha, B. K. (1988). Optimum tests for fixed effects and variance components in balanced models. J. Amer. Statist. Assoc. 83 133–135.
  • [33] McCarthy, M., Abecasis, G., Cardon, L., Goldstein, D., Little, J., Ioannidis, J. and Hirschhorn, J. (2008). Genome-wide association studies for complex traits: Consensus, uncertainty and challenges. Nature Reviews Genetics 9 356–369.
  • [34] Meng, J., Li, H. and Han, Z. (2009). Sparse event detection in wireless sensor networks using compressive sensing. In 43rd Annual Conference on Information Sciences and Systems (CISS), 2009 181–185.
  • [35] Montgomery, D. C. (2009). Design and Analysis of Experiments, 7th ed. Wiley, Hoboken, NJ.
  • [36] Piantadosi, S. (2005). Clinical Trials: A Methodologic Perspective, 2nd ed. Wiley, Hoboken, NJ.
  • [37] Slonim, D. (2002). From patterns to pathways: Gene expression data analysis comes of age. Nature Genetics 32 502–508.
  • [38] Strohmer, T. and Heath, R. W., Jr. (2003). Grassmannian frames with applications to coding and communication. Appl. Comput. Harmon. Anal. 14 257–275.
  • [39] Willer, C., Sanna, S., Jackson, A., Scuteri, A., Bonnycastle, L., Clarke, R., Heath, S., Timpson, N., Najjar, S. and Stringham, H. et al. (2008). Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nature Genetics 40 161–169.
  • [40] Zhang, G., Zhang, S. and Wang, Y. (2000). Application of adaptive time-frequency decomposition in ultrasonic NDE of highly-scattering materials. Ultrasonics 38 961–964.

Supplemental materials

  • Supplementary material: Supplement to “Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism”. In the supplement, we prove the results stated in the paper. Though the method of proof has the same structure as the corresponding situation in the classical setting with identity design matrix, extra care is required to deal with dependencies.