The Annals of Statistics

Detection of an anomalous cluster in a network

Ery Arias-Castro, Emmanuel J. Candès, and Arnaud Durand
Source: Ann. Statist. Volume 39, Number 1 (2011), 278-304.

Abstract

We consider the problem of detecting whether or not, in a given sensor network, there is a cluster of sensors which exhibit an “unusual behavior.” Formally, suppose we are given a set of nodes and attach a random variable to each node. We observe a realization of this process and want to decide between the following two hypotheses: under the null, the variables are i.i.d. standard normal; under the alternative, there is a cluster of variables that are i.i.d. normal with positive mean and unit variance, while the rest are i.i.d. standard normal. We also address surveillance settings where each sensor in the network collects information over time. The resulting model is similar, now with a time series attached to each node. We again observe the process over time and want to decide between the null, where all the variables are i.i.d. standard normal, and the alternative, where there is an emerging cluster of i.i.d. normal variables with positive mean and unit variance. The growth models used to represent the emerging cluster are quite general and, in particular, include cellular automata used in modeling epidemics. In both settings, we consider classes of clusters that are quite general, for which we obtain a lower bound on their respective minimax detection rate and show that some form of scan statistic, by far the most popular method in practice, achieves that same rate to within a logarithmic factor. Our results are not limited to the normal location model, but generalize to any one-parameter exponential family when the anomalous clusters are large enough.

First Page: Show Hide

Related Works:

Primary Subjects: 62C20, 62G10
Secondary Subjects: 82B20
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1291388376
Digital Object Identifier: doi:10.1214/10-AOS839
Zentralblatt MATH identifier: 05874496
Mathematical Reviews number (MathSciNet): MR2797847

References

[1] Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
Mathematical Reviews (MathSciNet): MR2722464
Zentralblatt MATH: 1200.62059
Digital Object Identifier: doi:10.1214/10-AOS817
Project Euclid: euclid.aos/1283175989
[2] Agur, S., Diekmann, O., Heesterbeek, H., Cushing, J., Gyllenberg, M., Kimmel, M., Milner, F., Jagers, P. and Kostova, T., eds. (1999). Epidemiology, Cellular Automata and Evolution. Elsevier, Oxford.
Mathematical Reviews (MathSciNet): MR1683061
[3] Akyildiz, I., Su, W., Sankarasubramaniam, Y. and Cayirci, E. (2002). A survey on sensor networks. IEEE Communications Magazine 40 102–114.
[4] Aldosari, S. and Moura, J. (2004). Detection in decentralized sensor networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 2004 (ICASSP’04) 2 277–280.
[5] Arias-Castro, E., Candès, E. J. and Durand, A. (2009). Detection of an abnormal cluster in a network. In Proc. 57th Session of the International Statistical Institute, Durban, South Africa.
[6] Arias-Castro, E., Candès, E. J. and Durand, A. (2010). Supplement to “Detection of an anomalous cluster in a network.” DOI: 10.1214/10-AOS839SUPP.
[7] Arias-Castro, E., Candès, E. J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
Mathematical Reviews (MathSciNet): MR2435454
Zentralblatt MATH: 1143.62006
Digital Object Identifier: doi:10.1214/07-AOS526
Project Euclid: euclid.aos/1216237298
[8] Arias-Castro, E., Donoho, D. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402–2425.
Mathematical Reviews (MathSciNet): MR2246369
Digital Object Identifier: doi:10.1109/TIT.2005.850056
[9] Arias-Castro, E., Donoho, D. and Huo, X. (2006). Adaptive multiscale detection of filamentary structures in a background of uniform random points. Ann. Statist. 34 326–349.
Mathematical Reviews (MathSciNet): MR2275244
Zentralblatt MATH: 1091.62095
Digital Object Identifier: doi:10.1214/009053605000000787
Project Euclid: euclid.aos/1146576265
[10] Arias-Castro, E., Efros, B. and Levi, O. (2010). Networks of polynomial pieces with application to the analysis of point clouds and images. J. Approx. Theory 162 94–130.
Mathematical Reviews (MathSciNet): MR2565828
Zentralblatt MATH: 1190.42015
Digital Object Identifier: doi:10.1016/j.jat.2009.03.007
[11] Arora, A., Dutta, P., Bapat, S., Kulathumani, V., Zhang, H., Naik, V., Mittal, V., Cao, H., Demirbas, M., Gouda, M., Choi, Y., Herman, T., Kulkarni, S., Arumugam, U., Nesterenko, M., Vora, A. and Miyashita, M. (2004). A line in the sand: A wireless sensor network for target detection, classification and tracking. Comput. Networks 46 605–634.
[12] Bohman, T. and Gravner, J. (1999). Random threshold growth dynamics. Random Structures Algorithms 15 93–111.
Mathematical Reviews (MathSciNet): MR1698409
[13] Boutsikas, M. V. and Koutras, M. V. (2006). On the asymptotic distribution of the discrete scan statistic. J. Appl. Probab. 43 1137–1154.
Mathematical Reviews (MathSciNet): MR2274642
Zentralblatt MATH: 1130.62017
Digital Object Identifier: doi:10.1239/jap/1165505213
Project Euclid: euclid.jap/1165505213
[14] Braams, J., Pruim, J., Freling, N., Nikkels, P., Roodenburg, J., Boering, G., Vaalburg, W. and Vermey, A. (1995). Detection of lymph node metastases of squamous-cell cancer of the head and neck with FDG-PET and MRI. Journal of Nuclear Medicine 36 211.
[15] Brennan, S. M., Mielke, A. M., Torney, D. C. and Maccabe, A. B. (2004). Radiation detection with distributed sensor networks. IEEE Computer 37 57–59.
[16] Brodsky, B. and Darkhovsky, B. (1993). Nonparametric Methods in Change-Point Problems. Mathematics and Its Applications 243. Kluwer Academic, Dordrecht.
Mathematical Reviews (MathSciNet): MR1228205
[17] Candès, E. J., Charlton, P. R. and Helgason, H. (2008). Detecting highly oscillatory signals by chirplet path pursuit. Appl. Comput. Harmon. Anal. 24 14–40.
Mathematical Reviews (MathSciNet): MR2379113
Digital Object Identifier: doi:10.1016/j.acha.2007.04.003
[18] Caron, Y., Makris, P. and Vincent, N. (2002). A method for detecting artificial objects in natural environments. In Proceedings 16th International Conference on Pattern Recognition 1 10600. IEEE Comput. Soc., New York.
[19] Cui, Y., Wei, Q., Park, H. and Lieber, C. (2001). Nanowire nanosensors for highly sensitive and selective detection of biological and chemical species. Science 293 1289–1292.
[20] Culler, D., Estrin, D. and Srivastava, M. (2004). Overview of sensor networks. Computer 37 41–49.
[21] DasGupta, B., Hespanha, J. P., Riehl, J. and Sontag, E. (2006). Honey-pot constrained searching with local sensory information. Nonlinear Anal. 65 1773–1793.
Mathematical Reviews (MathSciNet): MR2252129
[22] Dembo, A., Gandolfi, A. and Kesten, H. (2001). Greedy lattice animals: Negative values and unconstrained maxima. Ann. Probab. 29 205–241.
Mathematical Reviews (MathSciNet): MR1825148
Zentralblatt MATH: 1016.60048
Digital Object Identifier: doi:10.1214/aop/1008956328
Project Euclid: euclid.aop/1008956328
[23] Demirbaş, K. (1987). Maneuvering target tracking with hypothesis testing. IEEE Trans. Aerospace Electron. Systems 23 757–766.
Mathematical Reviews (MathSciNet): MR926001
Digital Object Identifier: doi:10.1109/TAES.1987.310912
[24] Desolneux, A., Moisan, L. and Morel, J.-M. (2003). Maximal meaningful events and applications to image analysis. Ann. Statist. 31 1822–1851.
Mathematical Reviews (MathSciNet): MR2036391
Zentralblatt MATH: 1046.62104
Digital Object Identifier: doi:10.1214/aos/1074290328
Project Euclid: euclid.aos/1074290328
[25] Duczmal, L., Kulldorff, M. and Huang, L. (2006). Evaluation of spatial scan statistics for irregularly shaped clusters. J. Comput. Graph. Statist. 15 428–442.
Mathematical Reviews (MathSciNet): MR2256152
Digital Object Identifier: doi:10.1198/106186006X112396
[26] Dudley, R. M. (1967). The sizes of compact subsets of Hilbert space and continuity of Gaussian processes. J. Funct. Anal. 1 290–330.
Mathematical Reviews (MathSciNet): MR220340
Digital Object Identifier: doi:10.1016/0022-1236(67)90017-1
[27] Fitch, J., Raber, E. and Imbro, D. (2003). Technology challenges in responding to biological or chemical attacks in the civilian sector. Science 302 1350–1354.
[28] Geelhood, B., Ely, J., Hansen, R., Kouzes, R., Schweppe, J. and Warner, R. (2003). Overview of portal monitoring at border crossings. 2003 IEEE Nuclear Science Symposium Conference Record 1 513–517.
[29] Geman, D. and Jedynak, B. (1996). An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell. 18 1–14.
[30] Glaz, J., Naus, J. and Wallenstein, S. (2001). Scan Statistics. Springer, New York.
Mathematical Reviews (MathSciNet): MR1869112
[31] Gravner, J. and Griffeath, D. (2006). Random growth models with polygonal shapes. Ann. Probab. 34 181–218.
Mathematical Reviews (MathSciNet): MR2206346
Zentralblatt MATH: 1090.60077
Digital Object Identifier: doi:10.1214/009117905000000512
Project Euclid: euclid.aop/1140191536
[32] Hall, P. and Jin, J. (2008). Properties of higher criticism under strong dependence. Ann. Statist. 36 381–402.
Mathematical Reviews (MathSciNet): MR2387976
Zentralblatt MATH: 1139.62049
Digital Object Identifier: doi:10.1214/009053607000000767
Project Euclid: euclid.aos/1201877306
[33] Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
Mathematical Reviews (MathSciNet): MR2662357
Zentralblatt MATH: 1189.62080
Digital Object Identifier: doi:10.1214/09-AOS764
Project Euclid: euclid.aos/1269452652
[34] Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M. and Weiss, D. (2004). Syndromic surveillance in public health practice, New York City. Emerging Infectious Diseases 10 858–864.
[35] Hills, R. (2001). Sensing for danger. Sci. Technol. Rev. Available at https://www.llnl.gov/str/JulAug01/Hills.html.
[36] Husby, O. and Rue, H. (2004). Estimating blood vessel areas in ultrasound images using a deformable template model. Stat. Model. 4 211–226.
Mathematical Reviews (MathSciNet): MR2062101
Digital Object Identifier: doi:10.1191/1471082X04st074oa
[37] Ilachinski, A. (2001). Cellular Automata: A Discrete Universe. World Scientific, River Edge, NJ.
Mathematical Reviews (MathSciNet): MR1849342
[38] Jain, A., Zhong, Y. and Dubuisson-Jolly, M. (1998). Deformable template models: A review. Signal Processing 71 109–129.
[39] James, D., Clymer, B. D. and Schmalbrock, P. (2001). Texture detection of simulated microcalcification susceptibility effects in magnetic resonance imaging of breasts. Journal of Magnetic Resonance Imaging 13 876–881.
[40] Jiang, T. (2002). Maxima of partial sums indexed by geometrical structures. Ann. Probab. 30 1854–1892.
Mathematical Reviews (MathSciNet): MR1944008
Zentralblatt MATH: 1014.60024
Digital Object Identifier: doi:10.1214/aop/1039548374
Project Euclid: euclid.aop/1039548374
[41] Klarner, D. (1967). Cell growth problems. Canad. J. Math. 19 851–863.
Mathematical Reviews (MathSciNet): MR214489
Digital Object Identifier: doi:10.4153/CJM-1967-080-4
[42] Kolmogorov, A. N. and Tihomirov, V. M. (1961). ɛ-entropy and ɛ-capacity of sets in functional space. Amer. Math. Soc. Transl. (2) 17 277–364.
Mathematical Reviews (MathSciNet): MR124720
[43] Kulldorff, M. (1997). A spatial scan statistic. Comm. Statist. Theory Methods 26 1481–1496.
Mathematical Reviews (MathSciNet): MR1456844
Zentralblatt MATH: 0920.62116
Digital Object Identifier: doi:10.1080/03610929708831995
[44] Kulldorff, M. (2001). Prospective time periodic geographical disease surveillance using a scan statistic. J. Roy. Statist. Soc. Ser. A 164 61–72.
Mathematical Reviews (MathSciNet): MR1819022
Zentralblatt MATH: 1002.62517
Digital Object Identifier: doi:10.1111/1467-985X.00186
[45] Kulldorff, M., Fang, Z. and Walsh, S. J. (2003). A tree-based scan statistic for database disease surveillance. Biometrics 59 323–331.
Mathematical Reviews (MathSciNet): MR1987399
Digital Object Identifier: doi:10.1111/1541-0420.00039
[46] Kulldorff, M., Heffernan, R., Hartman, J., Assuncao, R. and Mostashari, F. (2005). A space–time permutation scan statistic for disease outbreak detection. PLOS Medicine 2 216.
[47] Kulldorff, M., Huang, L., Pickle, L. and Duczmal, L. (2006). An elliptic spatial scan statistic. Stat. Med. 25 3929–43.
Mathematical Reviews (MathSciNet): MR2297401
Digital Object Identifier: doi:10.1002/sim.2490
[48] Lawler, G., Bramson, M. and Griffeath, D. (1992). Internal diffusion limited aggregation. Ann. Probab. 20 2117–2140.
Mathematical Reviews (MathSciNet): MR1188055
Digital Object Identifier: doi:10.1214/aop/1176989542
Project Euclid: euclid.aop/1176989542
[49] Lexa, M., Rozell, C., Sinanovic, S. and Johnson, D. (2004). To cooperate or not to cooperate: Detection strategies in sensor networks. In Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 2004 (ICASSP’04) 3 841–844.
[50] Li, D., Wong, K., Hu, Y. H. and Sayeed, A. (2002). Detection, classification and tracking of targets. IEEE Signal Processing Magazine 19 17–29.
[51] McInerney, T. and Terzopoulos, D. (1996). Deformable models in medical image analysis: A survey. Medical Image Analysis 1 91–108.
[52] Mei, Y. (2008). Asymptotic optimality theory for decentralized sequential hypothesis testing in sensor networks. IEEE Trans. Inform. Theory 54 2072–2089.
Mathematical Reviews (MathSciNet): MR2450850
Digital Object Identifier: doi:10.1109/TIT.2008.920217
[53] Moon, N., Bullitt, E., van Leemput, K. and Gerig, G. (2002). Automatic brain and tumor segmentation. In MICCAI’02: Proceedings of the 5th International Conference on Medical Image Computing and Computer-Assisted Intervention—Part I 372–379. Springer, London.
[54] Patil, G. P., Balbus, J., Biging, G., Jaja, J., Myers, W. L. and Taillie, C. (2004). Multiscale advanced raster map analysis system: Definition, design and development. Environ. Ecol. Stat. 11 113–138.
Mathematical Reviews (MathSciNet): MR2086391
Digital Object Identifier: doi:10.1023/B:EEST.0000027205.77490.8c
[55] Patwari, N. and Hero, A. (2003). Hierarchical censoring for distributed detection in wireless sensor networks. In Proceedings of 2003 IEEE International Conference on Acoustics, Speech and Signal Processing 2003 (ICASSP’03) 4 848–851.
[56] Penrose, M. (2003). Random Geometric Graphs. Oxford Studies in Probability 5. Oxford Univ. Press, Oxford.
Mathematical Reviews (MathSciNet): MR1986198
[57] Perone Pacifico, M., Genovese, C., Verdinelli, I. and Wasserman, L. (2004). False discovery control for random fields. J. Amer. Statist. Assoc. 99 1002–1014.
Mathematical Reviews (MathSciNet): MR2109490
Zentralblatt MATH: 1055.62105
Digital Object Identifier: doi:10.1198/0162145000001655
[58] Pozo, D., Olmo, F. and Alados-Arboledas, L. (1997). Fire detection and growth monitoring using a multitemporal technique on AVHRR mid-infrared and thermal channels. Remote Sensing of Environment 60 111–120.
[59] Richardson, D. (1973). Random growth in a tessellation. Proc. Cambridge Philos. Soc. 74 515–528.
Mathematical Reviews (MathSciNet): MR329079
Digital Object Identifier: doi:10.1017/S0305004100077288
[60] Rotz, L. and Hughes, J. (2004). Advances in detecting and responding to threats from bioterrorism and emerging infectious disease. Nature Medicine S130–S136.
[61] Schiff, J. L. (2008). Cellular Automata. Wiley, Hoboken, NJ.
Mathematical Reviews (MathSciNet): MR2367301
Zentralblatt MATH: 1142.68052
[62] Şendur, L., Maxim, V., Whitcher, B. and Bullmore, E. (2005). Multiple hypothesis mapping of functional MRI data in orthogonal and complex wavelet domains. IEEE Trans. Signal Process. 53 3413–3426.
Mathematical Reviews (MathSciNet): MR2213561
Digital Object Identifier: doi:10.1109/TSP.2005.853098
[63] Shen, X., Huang, H.-C. and Cressie, N. (2002). Nonparametric hypothesis testing for a spatial signal. J. Amer. Statist. Assoc. 97 1122–1140.
Mathematical Reviews (MathSciNet): MR1951265
Zentralblatt MATH: 1041.62038
Digital Object Identifier: doi:10.1198/016214502388618933
[64] Siegmund, D. (1985). Sequential Analysis: Tests and Confidence Intervals. Springer, New York.
Mathematical Reviews (MathSciNet): MR799155
[65] Strickland, R. and Hahn, H. Wavelet transform methods for object detection and recovery. IEEE Trans. Image Process. 6 724–735.
[66] Szor, P. (2005). The Art of Computer Virus Research and Defense. Addison-Wesley Professional.
[67] Talagrand, M. (2005). The Generic Chaining. Springer, Berlin.
Mathematical Reviews (MathSciNet): MR2133757
[68] Tan, H. and Zhang, Y. (2006). An energy minimization process for extracting eye feature based on deformable template. Lecture Notes in Computer Science 3852 663. Springer, Berlin.
[69] Thomopoulos, S., Viswanathan, R. and Bougoulias, D. (1989). Optimal distributed decision fusion. IEEE Transactions on Aerospace and Electronic Systems 25 761–765.
[70] Wagner, M., Tsui, F., Espino, J., Dato, V., Sittig, D., Caruana, R., Mcginnis, L., Deerfield, D., Druzdzel, M. and Fridsma, D. (2001). The emerging science of very early detection of disease outbreaks. Journal of Public Health Management and Practice 7 51–59.
[71] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 1010–1033.
Mathematical Reviews (MathSciNet): MR2604703
Zentralblatt MATH: 1183.62076
Digital Object Identifier: doi:10.1214/09-AOS732
Project Euclid: euclid.aos/1266586621
[72] Xu, C. and Prince, J. (1998). Snakes, shapes and gradient vector flow. IEEE Trans. Image Process. 7 359–369.
Mathematical Reviews (MathSciNet): MR1669528
Zentralblatt MATH: 0973.94003
Digital Object Identifier: doi:10.1109/83.661186
[73] Yu, L., Yuan, L., Qu, G. and Ephremides, A. (2006). Energy-driven detection scheme with guaranteed accuracy. Processing of the Fifth International Conference on Information in Sensor Networks 2006 (IPSN’2006) 284–291.
[74] Zhao, F. and Guibas, L. (2004). Wireless Sensor Networks: An Information Processing Approach. Morgan Kaufmann, San Francisco.
[75] Zhong, Y., Jain, A. and Dubuisson-Jolly, M.-P. (2000). Object tracking using deformable templates. IEEE Trans. Pattern Anal. Mach. Intell. 2 544–549.

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?