## Bernoulli

• Bernoulli
• Volume 19, Number 2 (2013), 676-719.

### Cluster detection in networks using percolation

#### Abstract

We consider the task of detecting a salient cluster in a sensor network, that is, an undirected graph with a random variable attached to each node. Motivated by recent research in environmental statistics and the drive to compete with the reigning scan statistic, we explore alternatives based on the percolative properties of the network. The first method is based on the size of the largest connected component after removing the nodes in the network with a value below a given threshold. The second method is the upper level set scan test introduced by Patil and Taillie [Statist. Sci. 18 (2003) 457–465]. We establish the performance of these methods in an asymptotic decision- theoretic framework in which the network size increases. These tests have two advantages over the more conventional scan statistic: they do not require previous information about cluster shape, and they are computationally more feasible. We make abundant use of percolation theory to derive our theoretical results, and complement our theory with some numerical experiments.

#### Article information

Source
Bernoulli, Volume 19, Number 2 (2013), 676-719.

Dates
First available in Project Euclid: 13 March 2013

https://projecteuclid.org/euclid.bj/1363192043

Digital Object Identifier
doi:10.3150/11-BEJ412

Mathematical Reviews number (MathSciNet)
MR3037169

Zentralblatt MATH identifier
06168768

#### Citation

Arias-Castro, Ery; Grimmett, Geoffrey R. Cluster detection in networks using percolation. Bernoulli 19 (2013), no. 2, 676--719. doi:10.3150/11-BEJ412. https://projecteuclid.org/euclid.bj/1363192043

#### References

• [1] Arias-Castro, E., Candès, E.J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.
• [2] Arias-Castro, E., Candès, E.J., Helgason, H. and Zeitouni, O. (2008). Searching for a trail of evidence in a maze. Ann. Statist. 36 1726–1757.
• [3] Arias-Castro, E., Donoho, D.L. and Huo, X. (2005). Near-optimal detection of geometric objects by fast multiscale methods. IEEE Trans. Inform. Theory 51 2402–2425.
• [4] Balakrishnan, N. and Koutras, M.V. (2002). Runs and Scans with Applications. Wiley Series in Probability and Statistics. New York: Wiley-Interscience.
• [5] Bodineau, T., Ioffe, D. and Velenik, Y. (2001). Winterbottom construction for finite range ferromagnetic models: An $\mathbb{L}_{1}$-approach. J. Stat. Phys. 105 93–131.
• [6] Borgs, C., Chayes, J.T., Kesten, H. and Spencer, J. (2001). The birth of the infinite cluster: Finite-size scaling in percolation. Comm. Math. Phys. 224 153–204. Dedicated to Joel L. Lebowitz.
• [7] Brown, L.D. (1986). Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory. Institute of Mathematical Statistics Lecture Notes—Monograph Series 9. Hayward, CA: IMS.
• [8] Cerf, R. (2006). The Wulff Crystal in Ising and Percolation Models. Lecture Notes in Math. 1878. Berlin: Springer. Lectures from the 34th Summer School on Probability Theory held in Saint-Flour, July 6–24, 2004, with a foreword by Jean Picard.
• [9] Chen, J. and Huo, X. (2006). Distribution of the length of the longest significance run on a Bernoulli net and its applications. J. Amer. Statist. Assoc. 101 321–331.
• [10] Cormen, T.H., Leiserson, C.E., Rivest, R.L. and Stein, C. (2009). Introduction to Algorithms, 3rd ed. Cambridge, MA: MIT Press.
• [11] Csardi, G. The igraph library. Available at http://igraph.sourceforge.net.
• [12] Culler, D., Estrin, D. and Srivastava, M. (2004). Overview of sensor networks. IEEE Computer 37 41–49.
• [13] DasGupta, B., Hespanha, J.P., Riehl, J. and Sontag, E. (2006). Honey-pot constrained searching with local sensory information. Nonlinear Anal. 65 1773–1793.
• [14] Davies, P.L., Langovoy, M. and Wittich, O. (2010). Detection of objects in noisy images based on percolation theory. Unpublished manuscript.
• [15] Dembo, A. and Zeitouni, O. (2010). Large Deviations Techniques and Applications. Stochastic Modelling and Applied Probability 38. Berlin: Springer. Corrected reprint of the second (1998) edition.
• [16] Duczmal, L. and Assunção, R. (2004). A simulated annealing strategy for the detection of arbitrarily shaped spatial clusters. Comput. Statist. Data Anal. 45 269–286.
• [17] Erdős, P. and Rényi, A. (1970). On a new law of large numbers. J. Analyse Math. 23 103–111.
• [18] Falconer, K.J. and Grimmett, G.R. (1992). On the geometry of random Cantor sets and fractal percolation. J. Theoret. Probab. 5 465–485.
• [19] Feng, X., Deng, Y. and Blöte, H.W.J. (2008). Percolation transitions in two dimensions. Phys. Rev. E 78 031136.
• [20] Geman, D. and Jedynak, B. (1996). An active testing model for tracking roads in satellite images. IEEE Trans. Pattern Anal. Mach. Intell. 18 1–14.
• [21] Grimmett, G. (1999). Percolation, 2nd ed. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 321. Berlin: Springer.
• [22] Grimmett, G.R. (1985). The largest components in a random lattice. Studia Sci. Math. Hungar. 20 325–331.
• [23] Hara, T., van der Hofstad, R. and Slade, G. (2003). Critical two-point functions and the lace expansion for spread-out high-dimensional percolation and related models. Ann. Probab. 31 349–408.
• [24] Heffernan, R., Mostashari, F., Das, D., Karpati, A., Kulldorff, M. and Weiss, D. (2004). Syndromic surveillance in public health practice, New York City. Emerging Infect. Dis. 10 858–864.
• [25] Hills, R. (2001). Sensing for danger. Science & Technology Review 11–17.
• [26] Hobolth, A., Pedersen, J. and Jensen, E.B.V. (2002). A deformable template model, with special reference to elliptical templates. J. Math. Imaging Vision 17 131–137. Special issue on statistics of shapes and textures.
• [27] Jain, A.K., Zhong, Y. and Dubuisson-Jolly, M.P. (1998). Deformable template models: A review. Signal Processing 71 109–129.
• [28] Kesten, H. and Zhang, Y. (1990). The probability of a large finite cluster in supercritical Bernoulli percolation. Ann. Probab. 18 537–555.
• [29] Kulldorff, M. (1997). A spatial scan statistic. Comm. Statist. Theory Methods 26 1481–1496.
• [30] Kulldorff, M. (2001). Prospective time periodic geographical disease surveillance using a scan statistic. J. Roy. Statist. Soc. Ser. A 164 61–72.
• [31] Kulldorff, M., Fang, Z. and Walsh, S.J. (2003). A tree-based scan statistic for database disease surveillance. Biometrics 59 323–331.
• [32] Kulldorff, M., Huang, L., Pickle, L. and Duczmal, L. (2006). An elliptic spatial scan statistic. Stat. Med. 25 3929–3943.
• [33] Kulldorff, M. and Nagarwalla, N. (1995). Spatial disease clusters: Detection and inference. Stat. Med. 14 799–810.
• [34] Langovoy, M. and Wittich, O. (2011). Multiple testing, uncertainty and realistic pictures. Technical report, EURANDOM.
• [35] Li, D., Wong, K.D., Hu, Y.H. and Sayeed, A.M. (2002). Detection, classification, and tracking of targets. Signal Processing Magazine, IEEE 19 17–29.
• [36] McInerney, T. and Terzopoulos, D. (1996). Deformable models in medical image analysis: A survey. Med. Image Anal. 1 91–108.
• [37] Patil, G.P., Balbus, J., Biging, G., Jaja, J., Myers, W.L. and Taillie, C. (2004). Multiscale advanced raster map analysis system: Definition, design and development. Environ. Ecol. Stat. 11 113–138.
• [38] Patil, G.P., Joshi, S.W. and Koli, R.E. (2010). PULSE, progressive upper level set scan statistic for geospatial hotspot detection. Environ. Ecol. Stat. 17 149–182.
• [39] Patil, G.P., Modarres, R., Myers, W.L. and Patankar, P. (2006). Spatially constrained clustering and upper level set scan hotspot detection in surveillance geoinformatics. Environ. Ecol. Stat. 13 365–377.
• [40] Patil, G.P., Modarres, R. and Patankar, P. (2005). The ULS software, version 1.0. Center for Statistical Ecology and Environmental Statistics, Dept. Statistics, Pennsylvania State Univ.
• [41] Patil, G.P. and Taillie, C. (2003). Geographic and network surveillance via scan statistics for critical area detection. Statist. Sci. 18 457–465.
• [42] Patil, G.P. and Taillie, C. (2004). Upper level set scan statistic for detecting arbitrarily shaped hotspots. Environ. Ecol. Stat. 11 183–197.
• [43] Penrose, M.D. (2001). A central limit theorem with applications to percolation, epidemics and Boolean models. Ann. Probab. 29 1515–1546.
• [44] Penrose, M.D. and Pisztora, A. (1996). Large deviations for discrete and continuous percolation. Adv. in Appl. Probab. 28 29–52.
• [45] Perone Pacifico, M., Genovese, C., Verdinelli, I. and Wasserman, L. (2004). False discovery control for random fields. J. Amer. Statist. Assoc. 99 1002–1014.
• [46] Pisztora, A. (1996). Surface order large deviations for Ising, Potts and percolation models. Probab. Theory Related Fields 104 427–466.
• [47] Pozo, D., Olmo, F.J. and Alados-Arboledas, L. (1997). Fire detection and growth monitoring using a multitemporal technique on AVHRR mid-infrared and thermal channels. Remote Sensing of Environment 60 111–120.
• [48] R Core Team. The R project for statistical computing. Available at http://www.r-project.org.
• [49] Rotz, L.D. and Hughes, J.M. (2004). Advances in detecting and responding to threats from bioterrorism and emerging infectious disease. Nat. Med. 10 S130–S136.
• [50] Smirnov, S. and Werner, W. (2001). Critical exponents for two-dimensional percolation. Math. Res. Lett. 8 729–744.
• [51] Tango, T. and Takahashi, K. (2005). A flexibly shaped spatial scan statistic for detecting clusters. Int. J. Health Geogr. 4 11.
• [52] van der Hofstad, R. and Redig, F. (2006). Maximal clusters in non-critical percolation and related models. J. Stat. Phys. 122 671–703.
• [53] Wagner, M.M., Tsui, F.C., Espino, J.U., Dato, V.M., Sittig, D.F., Caruana, R.A., McGinnis, L.F., Deerfield, D.W., Druzdzel, M.J. and Fridsma, D.B. (2001). The emerging science of very early detection of disease outbreaks. J. Public Health Manag. Pract. 7 51–59.
• [54] Walther, G. (2010). Optimal and fast detection of spatial clusters with scan statistics. Ann. Statist. 38 1010–1033.