Electronic Journal of Statistics

Detection of sparse mixtures: higher criticism and scan statistic

Ery Arias-Castro and Andrew Ying

Full-text: Open access

Abstract

We consider the problem of detecting a sparse mixture as studied by Ingster (1997) and Donoho and Jin (2004). We consider a wide array of base distributions. In particular, we study the situation when the base distribution has polynomial tails, a situation that has not received much attention in the literature. Perhaps surprisingly, we find that in the context of such a power-law distribution, the higher criticism does not achieve the detection boundary. However, the scan statistic does.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 208-230.

Dates
Received: February 2018
First available in Project Euclid: 16 January 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1547607852

Digital Object Identifier
doi:10.1214/18-EJS1512

Keywords
Sparse mixtures contamination model rare effects normal means model higher criticism scan statistic

Rights
Creative Commons Attribution 4.0 International License.

Citation

Arias-Castro, Ery; Ying, Andrew. Detection of sparse mixtures: higher criticism and scan statistic. Electron. J. Statist. 13 (2019), no. 1, 208--230. doi:10.1214/18-EJS1512. https://projecteuclid.org/euclid.ejs/1547607852


Export citation

References

  • Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes., The Annals of Mathematical Statistics 193–212.
  • Arias-Castro, E. and Chen, S. (2017). Distribution-free multiple testing., Electronic Journal of Statistics 11 1983–2001.
  • Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods., IEEE Transactions on Information Theory 51 2402–2425.
  • Arias-Castro, E. and Wang, M. (2017). Distribution-free tests for sparse heterogeneous mixtures., TEST 26 71–94.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing., Journal of the Royal Statistical Society. Series B (Methodological) 57 289–300.
  • Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistics., Probability Theory and Related Fields 47 47–59.
  • Cai, T. T., Jeng, X. J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedastic mixtures., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 629–662.
  • Cai, T. T., Jin, J. and Low, M. G. (2007). Estimation and confidence sets for sparse normal mixtures., The Annals of Statistics 35 2421–2449.
  • Cai, T. T. and Wu, Y. (2014). Optimal Detection of Sparse Mixtures Against a Given Null Distribution., IEEE Transactions on Information Theory 60 2217–2232.
  • Chen, S. and Arias-Castro, E. (2017). Sequential Multiple Testing., arXiv preprint arXiv:1705.10190.
  • Chen, S., Ying, A. and Arias-Castro, E. (2018). A Scan Procedure for Multiple Testing., arXiv preprint arXiv:1808.00631.
  • Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures., The Annals of Statistics 32 962–994.
  • Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 499–517.
  • Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control., The Annals of Statistics 1035–1061.
  • Gibbons, J. D. and Chakraborti, S. (2011)., Nonparametric Statistical Inference. Springer.
  • Huber, P. J. and Ronchetti, E. M. (2009)., Robust Statistics. John Wiley & Sons.
  • Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions., Mathematical Methods of Statistics 6 47–69.
  • Jaeschke, D. (1979). The Asymptotic Distribution of the Supremum of the Standardized Empirical Distribution Function on Subintervals., The Annals of Statistics 7 108–115.
  • Jin, J., Starck, J.-L., Donoho, D. L., Aghanim, N. and Forni, O. (2005). Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests., EURASIP Journal on Advances in Signal Processing 2005 297184.
  • Kabluchko, Z. (2011). Extremes of the standardized Gaussian noise., Stochastic Processes and their Applications 121 515–533.
  • Kulldorff, M. (1997). A spatial scan statistic., Communications in Statistics: Theory and Methods 26 1481–1496.
  • Moscovich, A., Nadler, B. and Spiegelman, C. (2016). On the exact Berk-Jones statistics and their $p$-value calculation., Electronic Journal of Statistics 10 2329–2354.
  • Naus, J. I. (1965). The distribution of the size of the maximum cluster of points on a line., Journal of the American Statistical Association 60 532–538.
  • Sharpnack, J. and Arias-Castro, E. (2016). Exact asymptotics for the scan statistic and fast alternatives., Electronic Journal of Statistics 10 2641–2684.
  • Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A. and Williams Jr, R. M. (1949)., The American Soldier, Vol 1: Adjustment During Army Life. Princeton University Press.
  • Tippett, L. H. C. (1931)., Methods of Statistics. Williams Norgate: London.