Electronic Journal of Statistics

Exact asymptotics for the scan statistic and fast alternatives

James Sharpnack and Ery Arias-Castro

Full-text: Open access

Abstract

We consider the problem of detecting a rectangle of activation in a grid of sensors in $d$-dimensions with noisy measurements. This has applications to massive surveillance projects and anomaly detection in large datasets in which one detects anomalously high measurements over rectangular regions, or more generally, blobs. Recently, the asymptotic distribution of a multiscale scan statistic was established in [18] under the null hypothesis, using non-constant boundary crossing probabilities for locally-stationary Gaussian random fields derived in [8]. Using a similar approach, we derive the exact asymptotic level and power of four variants of the scan statistic: an oracle scan that knows the dimensions of the activation rectangle; the multiscale scan statistic just mentioned; the adaptive variant; and an $\epsilon $-net approximation to the latter, in the spirit of [3]. This approximate scan runs in time near-linear in the size of the grid and achieves the same asymptotic level and power as the adaptive scan, and has a poly-logarithmic time parallel implementation. We complement our theory with some numerical experiments, and make some practical recommendations.

Article information

Source
Electron. J. Statist., Volume 10, Number 2 (2016), 2641-2684.

Dates
Received: December 2015
First available in Project Euclid: 12 September 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1473685450

Digital Object Identifier
doi:10.1214/16-EJS1188

Mathematical Reviews number (MathSciNet)
MR3546971

Zentralblatt MATH identifier
1345.62078

Subjects
Primary: 62G10: Hypothesis testing 62M40: Random fields; image analysis 60G32

Keywords
Sensor networks image processing multiscale detection scan statistic suprema of Gaussian random fields

Citation

Sharpnack, James; Arias-Castro, Ery. Exact asymptotics for the scan statistic and fast alternatives. Electron. J. Statist. 10 (2016), no. 2, 2641--2684. doi:10.1214/16-EJS1188. https://projecteuclid.org/euclid.ejs/1473685450


Export citation

References

  • [1] Amărioarei, A. and C. Preda (2015). Approximation for the distribution of three-dimensional discrete scan statistic., Methodology and Computing in Applied Probability 17(3), 565–578.
  • [2] Arias-Castro, E., E. J. Candès, and A. Durand (2011). Detection of an anomalous cluster in a network., Ann. Statist. 39(1), 278–304.
  • [3] Arias-Castro, E., D. Donoho, and X. Huo (2005). Near-optimal detection of geometric objects by fast multiscale methods., IEEE Trans. Inform. Theory 51(7), 2402–2425.
  • [4] Arratia, R., L. Goldstein, L. Gordon, et al. (1989). Two moments suffice for poisson approximations: the chen-stein method., The Annals of Probability 17(1), 9–25.
  • [5] Boutsikas, M. V. and M. V. Koutras (2006). On the asymptotic distribution of the discrete scan statistic., J. Appl. Probab. 43(4), 1137–1154.
  • [6] Brennan, S. M., A. M. Mielke, D. C. Torney, and A. B. Maccabe (2004). Radiation detection with distributed sensor networks., Computer 37(8), 57–59.
  • [7] Caron, Y., P. Makris, and N. Vincent (2002). A method for detecting artificial objects in natural environments. In, Proceedings 16th International Conference on Pattern Recognition, Volume 1, pp. 600–603. IEEE Comput. Soc.
  • [8] Chan, H. P. and T. L. Lai (2006). Maxima of asymptotically gaussian random fields and moderate deviation approximations to boundary crossing probabilities of sums of random variables with multidimensional indices., The Annals of Probability 34(1), 80–121.
  • [9] Culler, D., D. Estrin, and M. Srivastava (2004). Overview of sensor networks., IEEE Computer 37(8), 41–49.
  • [10] Desolneux, A., L. Moisan, and J.-M. Morel (2003). Maximal meaningful events and applications to image analysis., Ann. Statist. 31(6), 1822–1851.
  • [11] Duczmal, L., M. Kulldorff, and L. Huang (2006). Evaluation of spatial scan statistics for irregularly shaped clusters., Journal of Computational & Graphical Statistics 15(2), 428–442.
  • [12] Glaz, J., J. Naus, and S. Wallenstein (2001)., Scan statistics. Springer Series in Statistics. New York: Springer-Verlag.
  • [13] Glaz, J. and Z. Zhang (2004). Multiple window discrete scan statistics., Journal of Applied Statistics 31(8), 967–980.
  • [14] Haiman, G. and C. Preda (2006). Estimation for the distribution of two-dimensional discrete scan statistics., Methodology and Computing in Applied Probability 8(3), 373–382.
  • [15] Heffernan, R., F. Mostashari, D. Das, A. Karpati, M. Kulldorff, and D. Weiss (2004). Syndromic surveillance in public health practice, New York City., Emerging Infectious Diseases 10(5), 858–864.
  • [16] James, D., B. D. Clymer, and P. Schmalbrock (2001). Texture detection of simulated microcalcification susceptibility effects in magnetic resonance imaging of breasts., Journal of Magnetic Resonance Imaging 13(6), 876–881.
  • [17] Jiang, T. (2002). Maxima of partial sums indexed by geometrical structures., Ann. Probab. 30(4), 1854–1892.
  • [18] Kabluchko, Z. (2011). Extremes of the standardized gaussian noise., Stochastic Processes and their Applications 121(3), 515–533.
  • [19] Kulldorff, M. (1997). A spatial scan statistic., Comm. Statist. Theory Methods 26(6), 1481–1496.
  • [20] Kulldorff, M., L. Huang, L. Pickle, and L. Duczmal (2006). An elliptic spatial scan statistic., Stat Med 25(22), 3929–43.
  • [21] Kulldorff, M. and Information Management Services, Inc. Satscan™ v9.4: Software for the spatial and space-time scan statistics., http://www.satscan.org/.
  • [22] Marcus, M. B. and J. Rosen (2006)., Markov processes, Gaussian processes, and local times. Number 100. Cambridge University Press.
  • [23] McInerney, T. and D. Terzopoulos (1996). Deformable models in medical image analysis: a survey., Medical Image Analysis 1(2), 91–108.
  • [24] Moon, N., E. Bullitt, K. van Leemput, and G. Gerig (2002). Automatic brain and tumor segmentation. In, MICCAI ’02: Proceedings of the 5th International Conference on Medical Image Computing and Computer-Assisted Intervention-Part I, London, UK, pp. 372–379. Springer-Verlag.
  • [25] Naus, J. I. (1965). The distribution of the size of the maximum cluster of points on a line., J. Amer. Statist. Assoc. 60, 532–538.
  • [26] Naus, J. I. and S. Wallenstein (2004). Multiple window and cluster size scan procedures., Methodology and Computing in Applied Probability 6(4), 389–400.
  • [27] Pickands, J. (1969). Upcrossing probabilities for stationary gaussian processes., Transactions of the American Mathematical Society 145, 51–73.
  • [28] Pozdnyakov, V., J. Glaz, M. Kulldorff, and J. M. Steele (2005). A martingale approach to scan statistics., Annals of the Institute of Statistical Mathematics 57(1), 21–37.
  • [29] Pozo, D., F. Olmo, and L. Alados-Arboledas (1997). Fire detection and growth monitoring using a multitemporal technique on AVHRR mid-infrared and thermal channels., Remote Sensing of Environment 60(2), 111–120.
  • [30] Qualls, C. and H. Watanabe (1973). Asymptotic properties of gaussian random fields., Transactions of the American Mathematical Society 177, 155–171.
  • [31] Rotz, L. and J. Hughes (2004). Advances in detecting and responding to threats from bioterrorism and emerging infectious disease., Nature Medicine, S130–S136.
  • [32] Siegmund, D. and E. S. Venkatraman (1995). Using the generalized likelihood ratio statistic for sequential detection of a change-point., Ann. Statist. 23(1), 255–271.
  • [33] Wagner, M., F. Tsui, J. Espino, V. Dato, D. Sittig, R. Caruana, L. Mcginnis, D. Deerfield, M. Druzdzel, and D. Fridsma (2001). The emerging science of very early detection of disease outbreaks., Journal of Public Health Management and Practice 7(6), 51–59.
  • [34] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics., Ann. Statist. 38(2), 1010–1033.
  • [35] Wang, X. and J. Glaz (2014). Variable window scan statistics for normal data., Communications in Statistics-Theory and Methods 43(10–12), 2489–2504.