The Annals of Statistics

Consistency of spectral clustering in stochastic block models

Jing Lei and Alessandro Rinaldo

Full-text: Open access

Abstract

We analyze the performance of spectral clustering for community extraction in stochastic block models. We show that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as $\log n$, with $n$ the number of nodes. This result applies to some popular polynomial time spectral clustering algorithms and is further extended to degree corrected stochastic block models using a spherical $k$-median spectral clustering method. A key component of our analysis is a combinatorial bound on the spectrum of binary random matrices, which is sharper than the conventional matrix Bernstein inequality and may be of independent interest.

Article information

Source
Ann. Statist., Volume 43, Number 1 (2015), 215-237.

Dates
First available in Project Euclid: 9 December 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1418135620

Digital Object Identifier
doi:10.1214/14-AOS1274

Mathematical Reviews number (MathSciNet)
MR3285605

Zentralblatt MATH identifier
1308.62041

Subjects
Primary: 62F12: Asymptotic properties of estimators

Keywords
Network data stochastic block model spectral clustering sparsity

Citation

Lei, Jing; Rinaldo, Alessandro. Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 (2015), no. 1, 215--237. doi:10.1214/14-AOS1274. https://projecteuclid.org/euclid.aos/1418135620


Export citation

References

  • Aloise, D., Deshpande, A., Hansen, P. and Popat, P. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning 75 245–248.
  • Alon, N. and Spencer, J. H. (2004). The Probabilistic Method, 2nd ed. Wiley, Hoboken.
  • Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2012). Pseudo-likelihood methods for community detection in large sparse networks. Preprint. Available at arXiv:1207.2340.
  • Anandkumar, A., Ge, R., Hsu, D. and Kakade, S. M. (2013). A tensor spectral approach to learning mixed membership community models. Preprint. Available at arXiv:1302.2684.
  • Awasthi, P. and Sheffet, O. (2012). Improved spectral-norm bounds for clustering. In Approximation, Randomization, and Combinatorial Optimization. Lecture Notes in Computer Science 7408 37–49. Springer, Heidelberg.
  • Balakrishnan, S., Xu, M., Krishnamurthy, A. and Singh, A. (2011). Noise thresholds for spectral clustering. In Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira and K. Q. Weinberger, eds.) 954–962. Curran Associates, Red Hook, NY.
  • Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
  • Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6 1847–1899.
  • Channarond, A., Daudin, J.-J. and Robin, S. (2012). Classification and estimation in the stochastic blockmodel based on the empirical degrees. Electron. J. Stat. 6 2574–2601.
  • Charikar, M., Guha, S., Tardos, É. and Shmoys, D. B. (1999). A constant-factor approximation algorithm for the $k$-median problem. In Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing 1–10. ACM, New York, NY.
  • Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. JMLR: Workshop and Conference Proceedings 2012 35.1–35.23.
  • Chen, Y., Sanghavi, S. and Xu, H. (2012). Clustering sparse graphs. In Advances in Neural Information Processing Systems 25 (F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger, eds.) 2204–2212. Curran Associates, Red Hook, NY.
  • Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika 99 273–284.
  • Chung, F. and Radcliffe, M. (2011). On the spectra of general random graphs. Electron. J. Combin. 18 Paper 215, 14.
  • Coja-Oghlan, A. (2010). Graph partitioning via adaptive spectral techniques. Combin. Probab. Comput. 19 227–284.
  • Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E (3) 84 066106.
  • Deshpande, Y. and Montanari, A. (2013). Finding hidden cliques of size $\sqrt{N/e}$ in nearly linear time. Preprint. Available at arXiv:1304.7047.
  • Feige, U. and Ofek, E. (2005). Spectral techniques applied to sparse random graphs. Random Structures Algorithms 27 251–275.
  • Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl. 34 23–39.
  • Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends® in Machine Learning 2 129–233.
  • Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
  • Jin, J. (2012). Fast community detection by SCORE. Available at arXiv:1211.5803.
  • Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10.
  • Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer, New York.
  • Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L. and Zhang, P. (2013). Spectral redemption in clustering sparse networks. Proc. Natl. Acad. Sci. USA 110 20935–20940.
  • Kumar, A. and Kannan, R. (2010). Clustering with spectral norm and the $k$-means algorithm. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science FOCS 299–308. IEEE, Los Alamitos, CA.
  • Kumar, A., Sabharwal, Y. and Sen, S. (2004). A simple linear time $(1+\varepsilon)$-approximation algorithm for $k$-means clustering in any dimensions. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science 454–462. IEEE Computer Society, Washington, DC.
  • Lei, J. and Rinaldo, A. (2014). Supplement to “Consistency of spectral clustering in stochastic block models.” DOI:10.1214/14-AOS1274SUPP.
  • Li, S. and Svensson, O. (2013). Approximating k-median via pseudo-approximation. In Proceedings of the 45th Annual ACM Symposium on Symposium on Theory of Computing 901–910. ACM, New York.
  • Lu, L. and Peng, X. (2012). Spectra of edge-independent random graphs. Preprint. Available at arXiv:1204.6207.
  • Lyzinski, V., Sussman, D., Tang, M., Athreya, A. and Priebe, C. (2013). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Preprint. Available at arXiv:1310.0532.
  • Massoulie, L. (2013). Community detection thresholds and the weak Ramanujan property. Preprint. Available at arXiv:1311.3085.
  • McSherry, F. (2001). Spectral partitioning of random graphs. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) 529–537. IEEE, Los Alamitos, CA.
  • Mossel, E., Neeman, J. and Sly, A. (2012). Stochastic block models and reconstruction. Preprint. Available at arXiv:1202.1499.
  • Mossel, E., Neeman, J. and Sly, A. (2013). A proof of the block model threshold conjecture. Preprint. Available at arXiv:1311.4115.
  • Newman, M. E. J. (2010). Networks: An Introduction. Oxford Univ. Press, Oxford.
  • Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E (3) 69 026113.
  • Ng, A. Y., Jordan, M. I., Weiss, Y. et al. (2002). On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2 849–856.
  • Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Preprint. Available at arXiv:1309.4111.
  • Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
  • Sarkar, P. and Bickel, P. (2013). Role of normalization in spectral clustering for stochastic blockmodels. Preprint. Available at arXiv:1310.1495.
  • Sussman, D. L., Tang, M., Fishkind, D. E. and Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119–1128.
  • Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.
  • von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395–416.
  • Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 2905–2947.
  • Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.

Supplemental materials