## Electronic Journal of Statistics

### Detectability of nonparametric signals: higher criticism versus likelihood ratio

#### Abstract

We study the signal detection problem in high dimensional noise data (possibly) containing rare and weak signals. Log-likelihood ratio (LLR) tests depend on unknown parameters, but they are needed to judge the quality of detection tests since they determine the detection regions. The popular Tukey’s higher criticism (HC) test was shown to achieve the same completely detectable region as the LLR test does for different (mainly) parametric models. We present a novel technique to prove this result for very general signal models, including even nonparametric $p$-value models. Moreover, we address the following questions which are still pending since the initial paper of Donoho and Jin: What happens on the border of the completely detectable region, the so-called detection boundary? Does HC keep its optimality there? In particular, we give a complete answer for the heteroscedastic normal mixture model. As a byproduct, we give some new insights about the LLR test’s behaviour on the detection boundary by discussing, among others, Pitmans’s asymptotic efficiency as an application of Le Cam’s theory.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 4094-4137.

Dates
First available in Project Euclid: 13 December 2018

https://projecteuclid.org/euclid.ejs/1544670253

Digital Object Identifier
doi:10.1214/18-EJS1502

Mathematical Reviews number (MathSciNet)
MR3890763

Zentralblatt MATH identifier
07003238

Subjects
Primary: 62G10: Hypothesis testing 62G20: Asymptotic properties
Secondary: 62G32: Statistics of extreme values; tail inference

#### Citation

Ditzhaus, Marc; Janssen, Arnold. Detectability of nonparametric signals: higher criticism versus likelihood ratio. Electron. J. Statist. 12 (2018), no. 2, 4094--4137. doi:10.1214/18-EJS1502. https://projecteuclid.org/euclid.ejs/1544670253

#### References

• [1] Arias-Castro, E. and Candès, E. J. and Plan, Y. (2015). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism., Ann. Statist. 39, no.5, 2533–2556.
• [2] Arias-Castro, E. and Wang, M. (2015). The sparse Poisson means model., Electron. J. Stat. 9, no. 2, 2170–2201.
• [3] Arias-Castro, E. and Wang, M. (2017). Distribution-free tests for sparse heterogeneous mixtures., TEST 26, no. 1, 71–94.
• [4] Billingsley, P. (1999)., Convergence of Probability Measures, 2nd ed. Wiley, New York.
• [5] Cai, T., Jeng, J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedastic mixtures., J. R. Stat. Soc. Ser. B Stat. Methodol. 73, no. 5, 629–662.
• [6] Cai, T. and Wu, Y. (2014). Optimal Detection of Sparse Mixtures Against a Given Null Distribution., IEEE Trans. Inform. Theory 60, no. 4, 2217-2232.
• [7] Cayon, L., Jin, J. and Treaster, A. (2004). Higher Criticism statisitc: Detecting and identifying non-Gaussianity in the WMAP first year data., Mon. Not. Roy. Astron. Soc. 362, 826–832.
• [8] Dai, H., Charnigo, R., Srivastava, T., Talebizadeh, Z. and Qing, S. (2012). Integrating P-values for genetic and genomic data analysis., J. Biom. Biostat., 3–7.
• [9] Delaigle, A., Hall, J. & Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s t statistic., J. R. Stat. Soc. Ser. B Stat. Methodol. 73, 283–301.
• [10] Ditzhaus, M. (2017)., The power of tests for signal detection under high-dimensional data. PhD-thesis, Heinrich-Heine-University Duesseldorf. https://docserv.uni-duesseldorf.de/servlets/DocumentServlet?id=42808
• [11] Ditzhaus, M. (2018)., Signal detection via Phi-divergences for general mixtures. arXiv:1803.06519v1.
• [12] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures., Ann. Statist. 32, no. 3, 962–994.
• [13] Donoho, D. and Jin, J. (2015). Higher Criticism for Large-Scale Inference, Especially for Rare and Weak Effects., Statist. Sci. 30 , no. 1, 1–25.
• [14] Eicker, F. (1979). The asymptotic distribution of the suprema of the standardized empirical processes., Ann. Stat. 7, 116–138.
• [15] Goldstein, D.B. (2009). Common genetic variation and human traits., New England J. Med. 360, 1696–1698.
• [16] Gnedenko, B.V. and Kolmogorov, A.N. (1954)., Limit distribution for sums of independent random variables, Addison–Wesley, Reading, MA. Translated and annotated by K. L. Chung.
• [17] Hájek, J., Šidák, Z. and Sen, P. K. (1999)., Theory of rank tests. Probability and Mathematical Statistics, second edition. Academic Press, Inc., San Diego, CA.
• [18] Hall, P., Pittelkow, Y. & Ghosh, M. (2008). Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes., J. R. Stat. Soc. Ser. B Stat. Methodol. 70, 158–173.
• [19] Ingster, Y. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions., Math. Methods Statist. 6, no. 1, 47–69.
• [20] Ingster, Y. I. and Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression., Electron. J. Stat. 4, 1476–1526.
• [21] Iyengar, S. K. and Elston, R.C. (2007). The genetic basis of complex traits: Rare vvariant or "common gene, common disease"?, Methods Mol. Biol. 376, 71–84.
• [22] Jaeschke, D. (1979). The asymptotic distribution of the suprema of the standardized empirical distribution function on subintervals., Ann. Stat. 7, no. 1, 108–115.
• [23] Jager, L. and Wellner, J. (2007). Goodness-of-fit tests via phi-divergences., Ann. Stat. 35, no. 5, 2008–2053.
• [24] Janssen, A., Milbrodt, H. and Strasser, H. (1985)., Infinitely divisible statistical experiments. Lecture notes in Statistic 27, Springer-Verlag, Berlin.
• [25] Janssen, A (1990). Statistical experiments with non-regular densities. In: Janssen, A. and Mason, D. M., Non-Standard Rank Tests. Lecture Notes Stat., 65, 183–240.
• [26] Jin, J. (2004). Detecting a target in very noisy data from multiple looks., A festschrift for Herman Rubin, 255–286, IMS Lecture Notes Monogr. Ser., 45, Inst. Math. Statist., Beachwood, OH.
• [27] Jin, J., Stark, J.-L., Donoho, D., Aghanim, N. and Forni, O. (2005). Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests., J. Appl. Signal Processing 15, 2470–2485.
• [28] Khmaladze, E.V. (1998). Goodness of fit tests for Chimeric, alternatives.Statist. Neerlandica 52, no. 1, 90–111.
• [29] Khmaladze, E. and Shinjikashvili, E. (2001). Calculation of noncrossing probabilities for Poisson processes and its corollaries., Adv. in Appl. Probab. 33, 702–716.
• [30] Kulldorff, M., Heffernan, R., Hartman, J., Assuncao, R. and Mostashari, F. (2005). A space-time permutation scan statistic for disease outbreak detection., PLoS Med 2, no. 3, e59.
• [31] Le Cam, L. (1986)., Asymptotic methods in statistical decision theory. Springer Series in Statistics. Springer-Verlag, New York.
• [32] Le Cam, L. and Yang, G. L. (2000)., Asymptotics in statistics. Second edition. Springer Series in Statistics. Springer Verlag, New York.
• [33] Mukherjee, R., Pillai, N. S. and Lin, X (2015). Hypothesis testing for high-dimensional sparse binary regression., Ann. Statist. 43, no. 1, 352–381.
• [34] Neill, D. and Lingwall, J. (2007). A nonparametric scan statistic for multivariate disease surveillance., Advances in Disease Surveillance 4, 106–116.
• [35] Saligrama, V. and Zhao, M. (2012). Local anomaly detection., JMLR W& CP 22, 969–983.
• [36] Strasser, H. (1985)., Mathematical Theory of Statistics, De Gruyter, Berlin/New York.
• [37] Tukey, J. W. (1976)., T13 N: The higher Criticism. Coures Notes. Stat 411. Princetion Univ.
• [38] Tukey, J. W. (1989)., Higher Criticism for individual significances in serveral tables or parts of tables. Internal working paper, Princeton Univ.
• [39] Tukey, J. W. (1994)., The Collected Works of John W. Tukey: Multiple Comparisons, Volume VIII. Chapman and Hall, London.