The Annals of Statistics

Minimax rates of community detection in stochastic block models

Abstract

Recently, network analysis has gained more and more attention in statistics, as well as in computer science, probability and applied mathematics. Community detection for the stochastic block model (SBM) is probably the most studied topic in network analysis. Many methodologies have been proposed. Some beautiful and significant phase transition results are obtained in various settings. In this paper, we provide a general minimax theory for community detection. It gives minimax rates of the mis-match ratio for a wide rage of settings including homogeneous and inhomogeneous SBMs, dense and sparse networks, finite and growing number of communities. The minimax rates are exponential, different from polynomial rates we often see in statistical literature. An immediate consequence of the result is to establish threshold phenomenon for strong consistency (exact recovery) as well as weak consistency (partial recovery). We obtain the upper bound by a range of penalized likelihood-type approaches. The lower bound is achieved by a novel reduction from a global mis-match ratio to a local clustering problem for one node through an exchangeability property.

Article information

Source
Ann. Statist. Volume 44, Number 5 (2016), 2252-2280.

Dates
Revised: December 2015
First available in Project Euclid: 12 September 2016

https://projecteuclid.org/euclid.aos/1473685275

Digital Object Identifier
doi:10.1214/15-AOS1428

Mathematical Reviews number (MathSciNet)
MR3546450

Zentralblatt MATH identifier
06654468

Subjects
Primary: 60G05: Foundations of stochastic processes

Citation

Zhang, Anderson Y.; Zhou, Harrison H. Minimax rates of community detection in stochastic block models. Ann. Statist. 44 (2016), no. 5, 2252--2280. doi:10.1214/15-AOS1428. https://projecteuclid.org/euclid.aos/1473685275.

References

• [1] Abbe, E. and Sandon, C. (2015). Community detection in general stochastic block models: Fundamental limits and efficient recovery algorithms. Preprint. Available at arXiv:1503.00609.
• [2] Albert, R., Jeong, H. and Barabási, A.-L. (1999). Hawoong Jeong, and Albert-László Barabási. Internet: Diameter of the world-wide web. Nature 401 130–131.
• [3] Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097–2122.
• [4] Barabási, A. and Oltvai, Z. N. (2004). Network biology: Understanding the cell’s functional organization. Nat. Rev. Genet. 5 101–113.
• [5] Bickel, P., Choi, D., Chang, X. and Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels. Ann. Statist. 41 1922–1943.
• [6] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
• [7] Cai, T. T. and Li, X. (2015). Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Statist. 43 1027–1059.
• [8] Chen, Y. and Xu, J. (2014). Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. Preprint. Available at arXiv:1402.1267.
• [9] Chin, P., Rao, A. and Vu, V. (2015). Stochastic block model and community detection in the sparse graphs: A spectral algorithm with optimal rate of recovery. Preprint. Available at arXiv:1501.05021.
• [10] Easley, D. and Kleinberg, J. (2010). Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge Univ. Press, Cambridge.
• [11] Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2015). Achieving optimal misclassification proportion in stochastic block model. Preprint. Available at arXiv:1505.03772.
• [12] Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2016). Optimal community detection in degree-corrected block model. Manuscript.
• [13] Girvan, M. and Newman, M. E. J. (2002). Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99 7821–7826 (electronic).
• [14] Hagen, L. and Kahng, A. B. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 11 1074–1085.
• [15] Hajek, B., Wu, Y. and Xu, J. (2015). Achieving exact cluster recovery threshold via semidefinite programming: Extensions. Preprint. Available at arXiv:1502.07738.
• [16] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
• [17] Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237.
• [18] Lovász, L. (2012). Large Networks and Graph Limits. American Mathematical Society Colloquium Publications 60. Amer. Math. Soc., Providence, RI.
• [19] Massoulié, L. (2014). Community detection thresholds and the weak Ramanujan property. In STOC’14—Proceedings of the 2014 ACM Symposium on Theory of Computing 694–703. ACM, New York.
• [20] McSherry, F. (2001). Spectral partitioning of random graphs. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) 529–537. IEEE Computer Soc., Los Alamitos, CA.
• [21] Mossel, E., Neeman, J. and Sly, A. (2012). Stochastic block models and reconstruction. Preprint. Available at arXiv:1202.1499.
• [22] Mossel, E., Neeman, J. and Sly, A. (2013). A proof of the block model threshold conjecture. Preprint. Available at arXiv:1311.4115.
• [23] Mossel, E., Neeman, J. and Sly, A. (2014). Consistency thresholds for binary symmetric block models. Preprint. Available at arXiv:1407.1591.
• [24] Newman, M. E. J. (2003). The structure and function of complex networks. SIAM Rev. 45 167–256 (electronic).
• [25] Newman, M. E. J. (2010). Networks: An Introduction. Oxford Univ. Press, Oxford.
• [26] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
• [27] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 888–905.
• [28] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
• [29] Van Mieghem, P. (2006). Performance Analysis of Communications Networks and Systems. Cambridge Univ. Press, Cambridge.
• [30] Wasserman, S. (1994). Social Network Analysis: Methods and Applications 8. Cambridge Univ. Press, Cambridge.
• [31] Zhang, A. Y. and Zhou, H. H. (2016). Supplement to “Minimax rates of community detection in stochastic block models.” DOI:10.1214/15-AOS1428SUPP.
• [32] Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.

Supplemental materials

• Supplement to “Mimimax rates of community detection in stochastic block models”. In the supplement [31], we provide proofs of Lemma 5.2, Propositions 5.1 and 5.2. We also provide proofs for Theorems 2.1 and 3.1, which extend the minimax results of Theorems 2.2 and 3.2 to a larger parameter space $\Theta$. In addition, we state and prove the asymptotic equivalence of $I$.