The Annals of Applied Probability

Community detection in sparse random networks

Nicolas Verzelen and Ery Arias-Castro

Full-text: Open access

Abstract

We consider the problem of detecting a tight community in a sparse random network. This is formalized as testing for the existence of a dense random subgraph in a random graph. Under the null hypothesis, the graph is a realization of an Erdős–Rényi graph on $N$ vertices and with connection probability $p_{0}$; under the alternative, there is an unknown subgraph on $n$ vertices where the connection probability is $p_{1}>p_{0}$. In Arias-Castro and Verzelen [Ann. Statist. 42 (2014) 940–969], we focused on the asymptotically dense regime where $p_{0}$ is large enough that $np_{0}>(n/N)^{o(1)}$. We consider here the asymptotically sparse regime where $p_{0}$ is small enough that $np_{0}<(n/N)^{c_{0}}$ for some $c_{0}>0$. As before, we derive information theoretic lower bounds, and also establish the performance of various tests. Compared to our previous work [Ann. Statist. 42 (2014) 940–969], the arguments for the lower bounds are based on the same technology, but are substantially more technical in the details; also, the methods we study are different: besides a variant of the scan statistic, we study other tests statistics such as the size of the largest connected component, the number of triangles, and the number of subtrees of a given size. Our detection bounds are sharp, except in the Poisson regime where we were not able to fully characterize the constant arising in the bound.

Article information

Source
Ann. Appl. Probab., Volume 25, Number 6 (2015), 3465-3510.

Dates
Received: August 2013
Revised: September 2014
First available in Project Euclid: 1 October 2015

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1443703780

Digital Object Identifier
doi:10.1214/14-AAP1080

Mathematical Reviews number (MathSciNet)
MR3404642

Zentralblatt MATH identifier
1326.05145

Subjects
Primary: 05C80: Random graphs [See also 60B20] 62C20: Minimax procedures

Keywords
Community detection detecting a dense subgraph minimax hypothesis testing Erdős–Rényi random graph planted subgraph problem scan statistic largest connected component

Citation

Verzelen, Nicolas; Arias-Castro, Ery. Community detection in sparse random networks. Ann. Appl. Probab. 25 (2015), no. 6, 3465--3510. doi:10.1214/14-AAP1080. https://projecteuclid.org/euclid.aoap/1443703780


Export citation

References

  • Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. In Proceedings of the Eighth International Conference “Random Structures and Algorithms” (Poznan, 1997) 13 457–466.
  • Arias-Castro, E. and Verzelen, N. (2014). Community detection in dense random networks. Ann. Statist. 42 940–969.
  • Berthet, Q. and Rigollet, P. (2013). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
  • Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • Boucheron, S., Bousquet, O., Lugosi, G. and Massart, P. (2005). Moment inequalities for functions of independent random variables. Ann. Probab. 33 514–560.
  • Butucea, C. and Ingster, Y. I. (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli 19 2652–2688.
  • Dekel, Y., Gurel-Gurevich, O. and Peres, Y. (2011). Finding hidden cliques in linear time with high probability. In ANALCO11—Workshop on Analytic Algorithmics and Combinatorics 67–75. SIAM, Philadelphia, PA.
  • Feige, U. and Ron, D. (2010). Finding hidden cliques in linear time. In 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10) 189–203. Assoc. Discrete Math. Theor. Comput. Sci., Nancy.
  • Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486 75–174.
  • Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821–7826 (electronic).
  • Heard, N. A., Weston, D. J., Platanioti, K. and Hand, D. J. (2010). Bayesian anomaly detection methods for social networks. Ann. Appl. Stat. 4 645–662.
  • Lancichinetti, A. and Fortunato, S. (2009). Community detection algorithms: A comparative analysis. Phys. Rev. E 80 056117.
  • Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • Maslov, S., Sneppen, K. and Zaliznyak, A. (2004). Detection of topological patterns in complex networks: Correlation profile of the internet. Physica A: Statistical Mechanics and Its Applications 333 529–540.
  • Mongiovì, M., Bogdanov, P., Ranca, R., Papalexakis, E. E., Faloutsos, C. and Singh, A. K. (2013). NetSpot: Spotting significant anomalous regions on dynamic networks. In SIAM International Conference on Data Mining, Austin, TX 28–36. SIAM, Philadelphia, PA.
  • Mossel, E., Neeman, J. and Sly, A. (2012). Stochastic block models and reconstruction. Available at arXiv:1202.1499.
  • Newman, M. E. J. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577–8582.
  • Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E 69 026113.
  • Park, Y., Priebe, C. E. and Youssef, A. (2013). Anomaly detection in time series of graphs using fusion of graph invariants. IEEE J. Sel. Top. Signal Process. 7 67–75.
  • Pittel, B. and Wormald, N. C. (2005). Counting connected graphs inside-out. J. Combin. Theory Ser. B 93 127–172.
  • Reichardt, J. and Bornholdt, S. (2006). Statistical mechanics of community detection. Phys. Rev. E (3) 74 016110, 14.
  • Robinson, R. W. and Wormald, N. C. (1992). Almost all cubic graphs are Hamiltonian. Random Structures Algorithms 3 117–125.
  • Robinson, R. W. and Wormald, N. C. (1994). Almost all regular graphs are Hamiltonian. Random Structures Algorithms 5 363–374.
  • Rukhin, A. and Priebe, C. E. (2012). On the limiting distribution of a graph scan statistic. Comm. Statist. Theory Methods 41 1151–1170.
  • Sun, X. and Nobel, A. B. (2008). On the size and recovery of submatrices of ones in a random binary matrix. J. Mach. Learn. Res. 9 2431–2453.
  • Takács, L. (1988). On the limit distribution of the number of cycles in a random graph. J. Appl. Probab. 25A 359–376.
  • Van der Hofstad, R. (2012). Random Graphs and Complex Networks. Available at http://www.win.tue.nl/~rhofstad/NotesRGCN.pdf.
  • Verzelen, N. and Arias-Castro, E. (2013). Community detection in sparse random networks. Available at arXiv:1308.2955.
  • Wang, B., Phillips, J. M., Schreiber, R., Wilkinson, D. M., Mishra, N. and Tarjan, R. (2008). Spatial scan statistics for graph clustering. In SIAM International Conference on Data Mining, Atlanta, GA 727–738. SIAM, Philadelphia, PA.
  • Wormald, N. C. (1999). Models of random regular graphs. In Surveys in Combinatorics, 1999 (Canterbury). London Mathematical Society Lecture Note Series 267 239–298. Cambridge Univ. Press, Cambridge.