Electronic Journal of Statistics

Detection of sparse mixtures: higher criticism and scan statistic

Ery Arias-Castro and Andrew Ying

Full-text: Open access

Abstract

We consider the problem of detecting a sparse mixture as studied by Ingster (1997) and Donoho and Jin (2004). We consider a wide array of base distributions. In particular, we study the situation when the base distribution has polynomial tails, a situation that has not received much attention in the literature. Perhaps surprisingly, we find that in the context of such a power-law distribution, the higher criticism does not achieve the detection boundary. However, the scan statistic does.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 208-230.

Dates
Received: February 2018
First available in Project Euclid: 16 January 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1547607852

Digital Object Identifier
doi:10.1214/18-EJS1512

Mathematical Reviews number (MathSciNet)
MR3899951

Zentralblatt MATH identifier
1411.62161

Keywords
Sparse mixtures contamination model rare effects normal means model higher criticism scan statistic

Rights
Creative Commons Attribution 4.0 International License.

Citation

Arias-Castro, Ery; Ying, Andrew. Detection of sparse mixtures: higher criticism and scan statistic. Electron. J. Statist. 13 (2019), no. 1, 208--230. doi:10.1214/18-EJS1512. https://projecteuclid.org/euclid.ejs/1547607852


Export citation

References

  • Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes., The Annals of Mathematical Statistics 193–212.
  • Arias-Castro, E. and Chen, S. (2017). Distribution-free multiple testing., Electronic Journal of Statistics 11 1983–2001.
  • Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods., IEEE Transactions on Information Theory 51 2402–2425.
  • Arias-Castro, E. and Wang, M. (2017). Distribution-free tests for sparse heterogeneous mixtures., TEST 26 71–94.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing., Journal of the Royal Statistical Society. Series B (Methodological) 57 289–300.
  • Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistics., Probability Theory and Related Fields 47 47–59.
  • Cai, T. T., Jeng, X. J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedastic mixtures., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73 629–662.
  • Cai, T. T., Jin, J. and Low, M. G. (2007). Estimation and confidence sets for sparse normal mixtures., The Annals of Statistics 35 2421–2449.
  • Cai, T. T. and Wu, Y. (2014). Optimal Detection of Sparse Mixtures Against a Given Null Distribution., IEEE Transactions on Information Theory 60 2217–2232.
  • Chen, S. and Arias-Castro, E. (2017). Sequential Multiple Testing., arXiv preprint arXiv:1705.10190.
  • Chen, S., Ying, A. and Arias-Castro, E. (2018). A Scan Procedure for Multiple Testing., arXiv preprint arXiv:1808.00631.
  • Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures., The Annals of Statistics 32 962–994.
  • Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure., Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 499–517.
  • Genovese, C. and Wasserman, L. (2004). A stochastic process approach to false discovery control., The Annals of Statistics 1035–1061.
  • Gibbons, J. D. and Chakraborti, S. (2011)., Nonparametric Statistical Inference. Springer.
  • Huber, P. J. and Ronchetti, E. M. (2009)., Robust Statistics. John Wiley & Sons.
  • Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions., Mathematical Methods of Statistics 6 47–69.
  • Jaeschke, D. (1979). The Asymptotic Distribution of the Supremum of the Standardized Empirical Distribution Function on Subintervals., The Annals of Statistics 7 108–115.
  • Jin, J., Starck, J.-L., Donoho, D. L., Aghanim, N. and Forni, O. (2005). Cosmological non-Gaussian signature detection: Comparing performance of different statistical tests., EURASIP Journal on Advances in Signal Processing 2005 297184.
  • Kabluchko, Z. (2011). Extremes of the standardized Gaussian noise., Stochastic Processes and their Applications 121 515–533.
  • Kulldorff, M. (1997). A spatial scan statistic., Communications in Statistics: Theory and Methods 26 1481–1496.
  • Moscovich, A., Nadler, B. and Spiegelman, C. (2016). On the exact Berk-Jones statistics and their $p$-value calculation., Electronic Journal of Statistics 10 2329–2354.
  • Naus, J. I. (1965). The distribution of the size of the maximum cluster of points on a line., Journal of the American Statistical Association 60 532–538.
  • Sharpnack, J. and Arias-Castro, E. (2016). Exact asymptotics for the scan statistic and fast alternatives., Electronic Journal of Statistics 10 2641–2684.
  • Stouffer, S. A., Suchman, E. A., DeVinney, L. C., Star, S. A. and Williams Jr, R. M. (1949)., The American Soldier, Vol 1: Adjustment During Army Life. Princeton University Press.
  • Tippett, L. H. C. (1931)., Methods of Statistics. Williams Norgate: London.