The Annals of Statistics

Optimal detection of multi-sample aligned sparse signals

Hock Peng Chan and Guenther Walther

Full-text: Open access

Abstract

We describe, in the detection of multi-sample aligned sparse signals, the critical boundary separating detectable from nondetectable signals, and construct tests that achieve optimal detectability: penalized versions of the Berk–Jones and the higher-criticism test statistics evaluated over pooled scans, and an average likelihood ratio over the critical boundary. We show in our results an inter-play between the scale of the sequence length to signal length ratio, and the sparseness of the signals. In particular the difficulty of the detection problem is not noticeably affected unless this ratio grows exponentially with the number of sequences. We also recover the multiscale and sparse mixture testing problems as illustrative special cases.

Article information

Source
Ann. Statist., Volume 43, Number 5 (2015), 1865-1895.

Dates
Received: December 2014
Revised: February 2015
First available in Project Euclid: 3 August 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1438606847

Digital Object Identifier
doi:10.1214/15-AOS1328

Mathematical Reviews number (MathSciNet)
MR3375870

Zentralblatt MATH identifier
1327.62250

Subjects
Primary: 62G08: Nonparametric regression 62G10: Hypothesis testing

Keywords
Average likelihood ratio Berk–Jones higher criticism optimal detection scan statistic sparse mixture

Citation

Chan, Hock Peng; Walther, Guenther. Optimal detection of multi-sample aligned sparse signals. Ann. Statist. 43 (2015), no. 5, 1865--1895. doi:10.1214/15-AOS1328. https://projecteuclid.org/euclid.aos/1438606847


Export citation

References

  • [1] Arias-Castro, E., Donoho, D. L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402–2425.
  • [2] Arias-Castro, E., Donoho, D. L. and Huo, X. (2006). Adaptive multiscale detection of filamentary structures in a background of uniform random points. Ann. Statist. 34 326–349.
  • [3] Arias-Castro, E. and Wang, M. (2013). Distribution-free tests for sparse heteroscedastic mixtures. Preprint.
  • [4] Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistics. Z. Wahrsch. Verw. Gebiete 47 47–59.
  • [5] Cai, T. T., Jeng, X. J. and Jin, J. (2011). Optimal detection of heterogeneous and heteroscedastic mixtures. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 629–662.
  • [6] Cai, T. T. and Wu, Y. (2014). Optimal detection of sparse mixtures against a given null distribution. IEEE Trans. Inform. Theory 60 2217–2232.
  • [7] The Cancer Genome Atlas (2008). Comprehensive genomic characterization defines human gliobastoma genes losses and core pathways. Nature 455 1061–1068.
  • [8] Chan, H. P. and Walther, G. (2013). Detection with the scan and the average likelihood ratio. Statist. Sinica 23 409–428.
  • [9] Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • [10] Dümbgen, L. and Spokoiny, V. G. (2001). Multiscale testing of qualitative hypotheses. Ann. Statist. 29 124–152.
  • [11] Efron, B. and Zhang, N. R. (2011). False discovery rates and copy number variation. Biometrika 98 251–271.
  • [12] Glaz, J., Pozdnyakov, V. and Wallenstein, S., eds. (2009). Scan Statistics: Methods and Applications. Birkhäuser, Boston, MA.
  • [13] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • [14] Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47–69.
  • [15] Ingster, Y. I. (1998). Minimax detection of a signal for $l^{n}$-balls. Math. Methods Statist. 7 401–428.
  • [16] Jager, L. and Wellner, J. A. (2007). Goodness-of-fit tests via phi-divergences. Ann. Statist. 35 2018–2053.
  • [17] Jeng, X. J., Cai, T. T. and Li, H. (2013). Simultaneous discovery of rare and common segment variants. Biometrika 100 157–172.
  • [18] Lai, W. R., Johnson, M. D., Kucherlapati, R. and Park, P. J. (2005). Comparative analysis of algorithms for identifying amplifications and deletions in array CGH data. Bioinformatics 21 3763–3770.
  • [19] Lepski, O. V. and Tsybakov, A. B. (2000). Asymptotically exact nonparametric hypothesis testing in sup-norm and at a fixed point. Probab. Theory Related Fields 117 17–48.
  • [20] Mei, Y. (2010). Efficient scalable schemes for monitoring a large number of data streams. Biometrika 97 419–433.
  • [21] Owen, A. B. (1995). Nonparametric likelihood confidence bands for a distribution function. J. Amer. Statist. Assoc. 90 516–521.
  • [22] Rivera, C. and Walther, G. (2013). Optimal detection of a jump in the intensity of a Poisson process or in a density with likelihood ratio statistics. Scand. J. Stat. 40 752–769.
  • [23] Rohde, A. (2008). Adaptive goodness-of-fit tests based on signed ranks. Ann. Statist. 36 1346–1374.
  • [24] Siegmund, D., Yakir, B. and Zhang, N. R. (2011). Detecting simultaneous variant intervals in aligned sequences. Ann. Appl. Stat. 5 645–668.
  • [25] Tartakovsky, A. G. and Veeravalli, V. V. (2008). Asymptotically optimal quickest change detection in distributed sensor systems. Sequential Anal. 27 441–475.
  • [26] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 1010–1033.
  • [27] Walther, G. (2013). The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures. In From Probability to Statistics and Back: High-Dimensional Models and Processes. Inst. Math. Stat. (IMS) Collect. 9 317–326. IMS, Beachwood, OH.
  • [28] Xie, Y. and Siegmund, D. (2013). Sequential multi-sensor change-point detection. Ann. Statist. 41 670–692.
  • [29] Zhang, N. R., Siegmund, D. O., Ji, H. and Li, J. Z. (2010). Detecting simultaneous changepoints in multiple sequences. Biometrika 97 631–645.