The Annals of Statistics

Detection of an anomalous cluster in a network

Ery Arias-Castro, Emmanuel J. Candès, and Arnaud Durand

Full-text: Open access

Abstract

We consider the problem of detecting whether or not, in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide between the following two hypotheses: under the null, the variables are i.i.d. standard normal; under the alternative, there is a cluster of variables that are i.i.d. normal with positive mean and unit variance, while the rest are i.i.d. standard normal. We also address surveillance settings where each sensor in the network collects information over time. The resulting model is similar, now with a time series attached to each node. We again observe the process over time and want to decide between the null, where all the variables are i.i.d. standard normal, and the alternative, where there is an emerging cluster of i.i.d. normal variables with positive mean and unit variance. The growth models used to represent the emerging cluster are quite general and, in particular, include cellular automata used in modeling epidemics. In both settings, we consider classes of clusters that are quite general, for which we obtain a lower bound on their respective minimax detection rate and show that some form of scan statistic, by far the most popular method in practice, achieves that same rate to within a logarithmic factor. Our results are not limited to the normal location model, but generalize to any one-parameter exponential family when the anomalous clusters are large enough.

Article information

Source
Ann. Statist., Volume 39, Number 1 (2011), 278-304.

Dates
First available in Project Euclid: 3 December 2010

Permanent link to this document
https://projecteuclid.org/euclid.aos/1291388376

Digital Object Identifier
doi:10.1214/10-AOS839

Mathematical Reviews number (MathSciNet)
MR2797847

Zentralblatt MATH identifier
1209.62097

Subjects
Primary: 62C20: Minimax procedures 62G10: Hypothesis testing
Secondary: 82B20: Lattice systems (Ising, dimer, Potts, etc.) and systems on graphs

Keywords
Detecting a cluster of nodes in a network minimax detection Bayesian detection scan statistic generalized likelihood ratio test disease outbreak detection sensor networks Richardson’s model cellular automata

Citation

Arias-Castro, Ery; Candès, Emmanuel J.; Durand, Arnaud. Detection of an anomalous cluster in a network. Ann. Statist. 39 (2011), no. 1, 278--304. doi:10.1214/10-AOS839. https://projecteuclid.org/euclid.aos/1291388376


Export citation

References

  • [1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
  • [2] Agur, S., Diekmann, O., Heesterbeek, H., Cushing, J., Gyllenberg, M., Kimmel, M., Milner, F., Jagers, P. and Kostova, T., eds. (1999). Epidemiology, Cellular Automata and Evolution. Elsevier, Oxford.
  • [3] Akyildiz, I., Su, W., Sankarasubramaniam, Y. and Cayirci, E. (2002). A survey on sensor networks. IEEE Communications Magazine 40 102–114.
  • [4] Aldosari, S. and Moura, J. (2004). Detection in decentralized sensor networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 2004 (ICASSP’04) 2 277–280.
  • [5] Arias-Castro, E., Candès, E. J. and Durand, A. (2009). Detection of an abnormal cluster in a network. In Proc. 57th Session of the International Statistical Institute, Durban, South Africa.
  • [6] Arias-Castro, E., Candès, E. J. and Durand, A. (2010). Supplement to “Detection of an anomalous cluster in a network.” DOI: 10.1214/10-AOS839SUPP.
  • [7] Arias-Castro, E., Candès, E. J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
  • [8] Arias-Castro, E., Donoho, D. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402–2425.
  • [9] Arias-Castro, E., Donoho, D. and Huo, X. (2006). Adaptive multiscale detection of filamentary structures in a background of uniform random points. Ann. Statist. 34 326–349.
  • [10] Arias-Castro, E., Efros, B. and Levi, O. (2010). Networks of polynomial pieces with application to the analysis of point clouds and images. J. Approx. Theory 162 94–130.
  • [11] Arora, A., Dutta, P., Bapat, S., Kulathumani, V., Zhang, H., Naik, V., Mittal, V., Cao, H., Demirbas, M., Gouda, M., Choi, Y., Herman, T., Kulkarni, S., Arumugam, U., Nesterenko, M., Vora, A. and Miyashita, M. (2004). A line in the sand: A wireless sensor network for target detection, classification and tracking. Comput. Networks 46 605–634.
  • [12] Bohman, T. and Gravner, J. (1999). Random threshold growth dynamics. Random Structures Algorithms 15 93–111.
  • [13] Boutsikas, M. V. and Koutras, M. V. (2006). On the asymptotic distribution of the discrete scan statistic. J. Appl. Probab. 43 1137–1154.
  • [14] Braams, J., Pruim, J., Freling, N., Nikkels, P., Roodenburg, J., Boering, G., Vaalburg, W. and Vermey, A. (1995). Detection of lymph node metastases of squamous-cell cancer of the head and neck with FDG-PET and MRI. Journal of Nuclear Medicine 36 211.
  • [15] Brennan, S. M., Mielke, A. M., Torney, D. C. and Maccabe, A. B. (2004). Radiation detection with distributed sensor networks. IEEE Computer 37 57–59.
  • [16] Brodsky, B. and Darkhovsky, B. (1993). Nonparametric Methods in Change-Point Problems. Mathematics and Its Applications 243. Kluwer Academic, Dordrecht.
  • [17] Candès, E. J., Charlton, P. R. and Helgason, H. (2008). Detecting highly oscillatory signals by chirplet path pursuit. Appl. Comput. Harmon. Anal. 24 14–40.
  • [18] Caron, Y., Makris, P. and Vincent, N. (2002). A method for detecting artificial objects in natural environments. In Proceedings 16th International Conference on Pattern Recognition 1 10600. IEEE Comput. Soc., New York.
  • [19] Cui, Y., Wei, Q., Park, H. and Lieber, C. (2001). Nanowire nanosensors for highly sensitive and selective detection of biological and chemical species. Science 293 1289–1292.
  • [20] Culler, D., Estrin, D. and Srivastava, M. (2004). Overview of sensor networks. Computer 37 41–49.
  • [21] DasGupta, B., Hespanha, J. P., Riehl, J. and Sontag, E. (2006). Honey-pot constrained searching with local sensory information. Nonlinear Anal. 65 1773–1793.
  • [22] Dembo, A., Gandolfi, A. and Kesten, H. (2001). Greedy lattice animals: Negative values and unconstrained maxima. Ann. Probab. 29 205–241.
  • [23] Demirbaş, K. (1987). Maneuvering target tracking with hypothesis testing. IEEE Trans. Aerospace Electron. Systems 23 757–766.
  • [24] Desolneux, A., Moisan, L. and Morel, J.-M. (2003). Maximal meaningful events and applications to image analysis. Ann. Statist. 31 1822–1851.
  • [25] Duczmal, L., Kulldorff, M. and Huang, L. (2006). Evaluation of spatial scan statistics for irregularly shaped clusters. J. Comput. Graph. Statist. 15 428–442.
  • [26] Dudley, R. M. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Funct. Anal. 1 290–330.
  • [27] Fitch, J., Raber, E. and Imbro, D. (2003). Technology challenges in responding to biological or chemical attacks in the civilian sector. Science 302 1350–1354.
  • [28] Geelhood, B., Ely, J., Hansen, R., Kouzes, R., Schweppe, J. and Warner, R. (2003). Overview of portal monitoring at border crossings. 2003 IEEE Nuclear Science Symposium Conference Record 1 513–517.
  • [29] Geman, D. and Jedynak, B. (1996). An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell. 18 1–14.
  • [30] Glaz, J., Naus, J. and Wallenstein, S. (2001). Scan Statistics. Springer, New York.
  • [31] Gravner, J. and Griffeath, D. (2006). Random growth models with polygonal shapes. Ann. Probab. 34 181–218.
  • [32] Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. Ann. Statist. 36 381–402.
  • [33] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
  • [34] Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M. and Weiss, D. (2004). Syndromic surveillance in public health practice, New York City. Emerging Infectious Diseases 10 858–864.
  • [35] Hills, R. (2001). Sensing for danger. Sci. Technol. Rev. Available at https://www.llnl.gov/str/JulAug01/Hills.html.
  • [36] Husby, O. and Rue, H. (2004). Estimating blood vessel areas in ultrasound images using a deformable template model. Stat. Model. 4 211–226.
  • [37] Ilachinski, A. (2001). Cellular Automata: A Discrete Universe. World Scientific, River Edge, NJ.
  • [38] Jain, A., Zhong, Y. and Dubuisson-Jolly, M. (1998). Deformable template models: A review. Signal Processing 71 109–129.
  • [39] James, D., Clymer, B. D. and Schmalbrock, P. (2001). Texture detection of simulated microcalcification susceptibility effects in magnetic resonance imaging of breasts. Journal of Magnetic Resonance Imaging 13 876–881.
  • [40] Jiang, T. (2002). Maxima of partial sums indexed by geometrical structures. Ann. Probab. 30 1854–1892.
  • [41] Klarner, D. (1967). Cell growth problems. Canad. J. Math. 19 851–863.
  • [42] Kolmogorov, A. N. and Tihomirov, V. M. (1961). ɛ-entropy and ɛ-capacity of sets in functional space. Amer. Math. Soc. Transl. (2) 17 277–364.
  • [43] Kulldorff, M. (1997). A spatial scan statistic. Comm. Statist. Theory Methods 26 1481–1496.
  • [44] Kulldorff, M. (2001). Prospective time periodic geographical disease surveillance using a scan statistic. J. Roy. Statist. Soc. Ser. A 164 61–72.
  • [45] Kulldorff, M., Fang, Z. and Walsh, S. J. (2003). A tree-based scan statistic for database disease surveillance. Biometrics 59 323–331.
  • [46] Kulldorff, M., Heffernan, R., Hartman, J., Assuncao, R. and Mostashari, F. (2005). A space–time permutation scan statistic for disease outbreak detection. PLOS Medicine 2 216.
  • [47] Kulldorff, M., Huang, L., Pickle, L. and Duczmal, L. (2006). An elliptic spatial scan statistic. Stat. Med. 25 3929–43.
  • [48] Lawler, G., Bramson, M. and Griffeath, D. (1992). Internal diffusion limited aggregation. Ann. Probab. 20 2117–2140.
  • [49] Lexa, M., Rozell, C., Sinanovic, S. and Johnson, D. (2004). To cooperate or not to cooperate: Detection strategies in sensor networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 2004 (ICASSP’04) 3 841–844.
  • [50] Li, D., Wong, K., Hu, Y. H. and Sayeed, A. (2002). Detection, classification and tracking of targets. IEEE Signal Processing Magazine 19 17–29.
  • [51] McInerney, T. and Terzopoulos, D. (1996). Deformable models in medical image analysis: A survey. Medical Image Analysis 1 91–108.
  • [52] Mei, Y. (2008). Asymptotic optimality theory for decentralized sequential hypothesis testing in sensor networks. IEEE Trans. Inform. Theory 54 2072–2089.
  • [53] Moon, N., Bullitt, E., van Leemput, K. and Gerig, G. (2002). Automatic brain and tumor segmentation. In MICCAI’02: Proceedings of the 5th International Conference on Medical Image Computing and Computer-Assisted Intervention—Part I 372–379. Springer, London.
  • [54] Patil, G. P., Balbus, J., Biging, G., Jaja, J., Myers, W. L. and Taillie, C. (2004). Multiscale advanced raster map analysis system: Definition, design and development. Environ. Ecol. Stat. 11 113–138.
  • [55] Patwari, N. and Hero, A. (2003). Hierarchical censoring for distributed detection in wireless sensor networks. In Proceedings of 2003 IEEE International Conference on Acoustics, Speech and Signal Processing 2003 (ICASSP’03) 4 848–851.
  • [56] Penrose, M. (2003). Random Geometric Graphs. Oxford Studies in Probability 5. Oxford Univ. Press, Oxford.
  • [57] Perone Pacifico, M., Genovese, C., Verdinelli, I. and Wasserman, L. (2004). False discovery control for random fields. J. Amer. Statist. Assoc. 99 1002–1014.
  • [58] Pozo, D., Olmo, F. and Alados-Arboledas, L. (1997). Fire detection and growth monitoring using a multitemporal technique on AVHRR mid-infrared and thermal channels. Remote Sensing of Environment 60 111–120.
  • [59] Richardson, D. (1973). Random growth in a tessellation. Proc. Cambridge Philos. Soc. 74 515–528.
  • [60] Rotz, L. and Hughes, J. (2004). Advances in detecting and responding to threats from bioterrorism and emerging infectious disease. Nature Medicine S130–S136.
  • [61] Schiff, J. L. (2008). Cellular Automata. Wiley, Hoboken, NJ.
  • [62] Şendur, L., Maxim, V., Whitcher, B. and Bullmore, E. (2005). Multiple hypothesis mapping of functional MRI data in orthogonal and complex wavelet domains. IEEE Trans. Signal Process. 53 3413–3426.
  • [63] Shen, X., Huang, H.-C. and Cressie, N. (2002). Nonparametric hypothesis testing for a spatial signal. J. Amer. Statist. Assoc. 97 1122–1140.
  • [64] Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer, New York.
  • [65] Strickland, R. and Hahn, H. Wavelet transform methods for object detection and recovery. IEEE Trans. Image Process. 6 724–735.
  • [66] Szor, P. (2005). The Art of Computer Virus Research and Defense. Addison-Wesley Professional.
  • [67] Talagrand, M. (2005). The Generic Chaining. Springer, Berlin.
  • [68] Tan, H. and Zhang, Y. (2006). An energy minimization process for extracting eye feature based on deformable template. Lecture Notes in Computer Science 3852 663. Springer, Berlin.
  • [69] Thomopoulos, S., Viswanathan, R. and Bougoulias, D. (1989). Optimal distributed decision fusion. IEEE Transactions on Aerospace and Electronic Systems 25 761–765.
  • [70] Wagner, M., Tsui, F., Espino, J., Dato, V., Sittig, D., Caruana, R., Mcginnis, L., Deerfield, D., Druzdzel, M. and Fridsma, D. (2001). The emerging science of very early detection of disease outbreaks. Journal of Public Health Management and Practice 7 51–59.
  • [71] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 1010–1033.
  • [72] Xu, C. and Prince, J. (1998). Snakes, shapes and gradient vector flow. IEEE Trans. Image Process. 7 359–369.
  • [73] Yu, L., Yuan, L., Qu, G. and Ephremides, A. (2006). Energy-driven detection scheme with guaranteed accuracy. Processing of the Fifth International Conference on Information in Sensor Networks 2006 (IPSN’2006) 284–291.
  • [74] Zhao, F. and Guibas, L. (2004). Wireless Sensor Networks: An Information Processing Approach. Morgan Kaufmann, San Francisco.
  • [75] Zhong, Y., Jain, A. and Dubuisson-Jolly, M.-P. (2000). Object tracking using deformable templates. IEEE Trans. Pattern Anal. Mach. Intell. 2 544–549.

Supplemental materials

  • Supplementary material: Technical Arguments. In the supplementary file [6], we prove the results stated here. It is divided into three sections. In the first section, we state and prove general lower bounds on the minimax rate and upper bounds on the detection rate achieved by an ɛ-scan statistic. We do this for the normal location model first and extend these results to a general one-parameter exponential family. In the second section, we gather a number of results on volumes and node counts. In the third and last section, we prove the main results.