The Annals of Statistics
- Ann. Statist.
- Volume 43, Number 1 (2015), 215-237.
Consistency of spectral clustering in stochastic block models
Jing Lei and Alessandro Rinaldo
Full-text: Open access
Abstract
We analyze the performance of spectral clustering for community extraction in stochastic block models. We show that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as $\log n$, with $n$ the number of nodes. This result applies to some popular polynomial time spectral clustering algorithms and is further extended to degree corrected stochastic block models using a spherical $k$-median spectral clustering method. A key component of our analysis is a combinatorial bound on the spectrum of binary random matrices, which is sharper than the conventional matrix Bernstein inequality and may be of independent interest.
Article information
Source
Ann. Statist., Volume 43, Number 1 (2015), 215-237.
Dates
First available in Project Euclid: 9 December 2014
Permanent link to this document
https://projecteuclid.org/euclid.aos/1418135620
Digital Object Identifier
doi:10.1214/14-AOS1274
Mathematical Reviews number (MathSciNet)
MR3285605
Zentralblatt MATH identifier
1308.62041
Subjects
Primary: 62F12: Asymptotic properties of estimators
Keywords
Network data stochastic block model spectral clustering sparsity
Citation
Lei, Jing; Rinaldo, Alessandro. Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 (2015), no. 1, 215--237. doi:10.1214/14-AOS1274. https://projecteuclid.org/euclid.aos/1418135620
References
- Aloise, D., Deshpande, A., Hansen, P. and Popat, P. (2009). NP-hardness of Euclidean sum-of-squares clustering. Machine Learning 75 245–248.
- Alon, N. and Spencer, J. H. (2004). The Probabilistic Method, 2nd ed. Wiley, Hoboken.
- Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2012). Pseudo-likelihood methods for community detection in large sparse networks. Preprint. Available at arXiv:1207.2340.
- Anandkumar, A., Ge, R., Hsu, D. and Kakade, S. M. (2013). A tensor spectral approach to learning mixed membership community models. Preprint. Available at arXiv:1302.2684.
- Awasthi, P. and Sheffet, O. (2012). Improved spectral-norm bounds for clustering. In Approximation, Randomization, and Combinatorial Optimization. Lecture Notes in Computer Science 7408 37–49. Springer, Heidelberg.Mathematical Reviews (MathSciNet): MR3003539
Digital Object Identifier: doi:10.1007/978-3-642-32512-0_4 - Balakrishnan, S., Xu, M., Krishnamurthy, A. and Singh, A. (2011). Noise thresholds for spectral clustering. In Advances in Neural Information Processing Systems 24 (J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira and K. Q. Weinberger, eds.) 954–962. Curran Associates, Red Hook, NY.
- Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.Mathematical Reviews (MathSciNet): MR1477662
- Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
- Celisse, A., Daudin, J.-J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model. Electron. J. Stat. 6 1847–1899.Mathematical Reviews (MathSciNet): MR2988467
Digital Object Identifier: doi:10.1214/12-EJS729
Project Euclid: euclid.ejs/1349355605 - Channarond, A., Daudin, J.-J. and Robin, S. (2012). Classification and estimation in the stochastic blockmodel based on the empirical degrees. Electron. J. Stat. 6 2574–2601.Mathematical Reviews (MathSciNet): MR3020277
Digital Object Identifier: doi:10.1214/12-EJS753
Project Euclid: euclid.ejs/1357913089 - Charikar, M., Guha, S., Tardos, É. and Shmoys, D. B. (1999). A constant-factor approximation algorithm for the $k$-median problem. In Proceedings of the Thirty-First Annual ACM Symposium on Theory of Computing 1–10. ACM, New York, NY.
- Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. JMLR: Workshop and Conference Proceedings 2012 35.1–35.23.
- Chen, Y., Sanghavi, S. and Xu, H. (2012). Clustering sparse graphs. In Advances in Neural Information Processing Systems 25 (F. Pereira, C. J. C. Burges, L. Bottou and K. Q. Weinberger, eds.) 2204–2212. Curran Associates, Red Hook, NY.
- Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes. Biometrika 99 273–284.
- Chung, F. and Radcliffe, M. (2011). On the spectra of general random graphs. Electron. J. Combin. 18 Paper 215, 14.Mathematical Reviews (MathSciNet): MR2853072
- Coja-Oghlan, A. (2010). Graph partitioning via adaptive spectral techniques. Combin. Probab. Comput. 19 227–284.Mathematical Reviews (MathSciNet): MR2593622
Digital Object Identifier: doi:10.1017/S0963548309990514 - Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E (3) 84 066106.
- Deshpande, Y. and Montanari, A. (2013). Finding hidden cliques of size $\sqrt{N/e}$ in nearly linear time. Preprint. Available at arXiv:1304.7047.
- Feige, U. and Ofek, E. (2005). Spectral techniques applied to sparse random graphs. Random Structures Algorithms 27 251–275.Mathematical Reviews (MathSciNet): MR2155709
- Fishkind, D. E., Sussman, D. L., Tang, M., Vogelstein, J. T. and Priebe, C. E. (2013). Consistent adjacency-spectral partitioning for the stochastic block model when the model parameters are unknown. SIAM J. Matrix Anal. Appl. 34 23–39.
- Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models. Foundations and Trends® in Machine Learning 2 129–233.
- Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.Mathematical Reviews (MathSciNet): MR718088
Digital Object Identifier: doi:10.1016/0378-8733(83)90021-7 - Jin, J. (2012). Fast community detection by SCORE. Available at arXiv:1211.5803.Mathematical Reviews (MathSciNet): MR3285600
Digital Object Identifier: doi:10.1214/14-AOS1265
Project Euclid: euclid.aos/1416322036 - Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10.Mathematical Reviews (MathSciNet): MR2788206
Digital Object Identifier: doi:10.1103/PhysRevE.83.016107 - Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer, New York.Mathematical Reviews (MathSciNet): MR2724362
- Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L. and Zhang, P. (2013). Spectral redemption in clustering sparse networks. Proc. Natl. Acad. Sci. USA 110 20935–20940.
- Kumar, A. and Kannan, R. (2010). Clustering with spectral norm and the $k$-means algorithm. In Proceedings of the 2010 IEEE 51st Annual Symposium on Foundations of Computer Science FOCS 299–308. IEEE, Los Alamitos, CA.Mathematical Reviews (MathSciNet): MR3025203
- Kumar, A., Sabharwal, Y. and Sen, S. (2004). A simple linear time $(1+\varepsilon)$-approximation algorithm for $k$-means clustering in any dimensions. In Proceedings of the 45th Annual IEEE Symposium on Foundations of Computer Science 454–462. IEEE Computer Society, Washington, DC.
- Lei, J. and Rinaldo, A. (2014). Supplement to “Consistency of spectral clustering in stochastic block models.” DOI:10.1214/14-AOS1274SUPP.Mathematical Reviews (MathSciNet): MR3285605
Digital Object Identifier: doi:10.1214/14-AOS1274
Project Euclid: euclid.aos/1418135620 - Li, S. and Svensson, O. (2013). Approximating k-median via pseudo-approximation. In Proceedings of the 45th Annual ACM Symposium on Symposium on Theory of Computing 901–910. ACM, New York.Mathematical Reviews (MathSciNet): MR3210852
- Lu, L. and Peng, X. (2012). Spectra of edge-independent random graphs. Preprint. Available at arXiv:1204.6207.Mathematical Reviews (MathSciNet): MR3158266
- Lyzinski, V., Sussman, D., Tang, M., Athreya, A. and Priebe, C. (2013). Perfect clustering for stochastic blockmodel graphs via adjacency spectral embedding. Preprint. Available at arXiv:1310.0532.Mathematical Reviews (MathSciNet): MR3299126
Digital Object Identifier: doi:10.1214/14-EJS978
Project Euclid: euclid.ejs/1420815881 - Massoulie, L. (2013). Community detection thresholds and the weak Ramanujan property. Preprint. Available at arXiv:1311.3085.
- McSherry, F. (2001). Spectral partitioning of random graphs. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) 529–537. IEEE, Los Alamitos, CA.Mathematical Reviews (MathSciNet): MR1948742
- Mossel, E., Neeman, J. and Sly, A. (2012). Stochastic block models and reconstruction. Preprint. Available at arXiv:1202.1499.
- Mossel, E., Neeman, J. and Sly, A. (2013). A proof of the block model threshold conjecture. Preprint. Available at arXiv:1311.4115.
- Newman, M. E. J. (2010). Networks: An Introduction. Oxford Univ. Press, Oxford.Mathematical Reviews (MathSciNet): MR2676073
- Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E (3) 69 026113.
- Ng, A. Y., Jordan, M. I., Weiss, Y. et al. (2002). On spectral clustering: Analysis and an algorithm. Adv. Neural Inf. Process. Syst. 2 849–856.
- Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. Preprint. Available at arXiv:1309.4111.
- Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.Mathematical Reviews (MathSciNet): MR2893856
Digital Object Identifier: doi:10.1214/11-AOS887
Project Euclid: euclid.aos/1314190618 - Sarkar, P. and Bickel, P. (2013). Role of normalization in spectral clustering for stochastic blockmodels. Preprint. Available at arXiv:1310.1495.
- Sussman, D. L., Tang, M., Fishkind, D. E. and Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119–1128.Mathematical Reviews (MathSciNet): MR3010899
Digital Object Identifier: doi:10.1080/01621459.2012.699795 - Tropp, J. A. (2012). User-friendly tail bounds for sums of random matrices. Found. Comput. Math. 12 389–434.Mathematical Reviews (MathSciNet): MR2946459
Digital Object Identifier: doi:10.1007/s10208-011-9099-z - von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395–416.Mathematical Reviews (MathSciNet): MR2409803
Digital Object Identifier: doi:10.1007/s11222-007-9033-z - Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 2905–2947.Mathematical Reviews (MathSciNet): MR3161452
Digital Object Identifier: doi:10.1214/13-AOS1151
Project Euclid: euclid.aos/1388545673 - Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.Mathematical Reviews (MathSciNet): MR3059083
Digital Object Identifier: doi:10.1214/12-AOS1036
Project Euclid: euclid.aos/1358951382
Supplemental materials
- Supplementary material: Supplement to “Consistency of spectral clustering in sparse stochastic block models”. The supplementary file contains a proof of Theorem 5.2.Digital Object Identifier: doi:10.1214/14-AOS1274SUPP

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Fast community detection by SCORE
Jin, Jiashun, The Annals of Statistics, 2015 - Robust and computationally feasible community detection in the presence of arbitrary outlier nodes
Cai, T. Tony and Li, Xiaodong, The Annals of Statistics, 2015 - A goodness-of-fit test for stochastic block models
Lei, Jing, The Annals of Statistics, 2016
- Fast community detection by SCORE
Jin, Jiashun, The Annals of Statistics, 2015 - Robust and computationally feasible community detection in the presence of arbitrary outlier nodes
Cai, T. Tony and Li, Xiaodong, The Annals of Statistics, 2015 - A goodness-of-fit test for stochastic block models
Lei, Jing, The Annals of Statistics, 2016 - Convexified modularity maximization for degree-corrected stochastic block models
Chen, Yudong, Li, Xiaodong, and Xu, Jiaming, The Annals of Statistics, 2018 - Novel sampling design for respondent-driven sampling
Khabbazian, Mohammad, Hanlon, Bret, Russek, Zoe, and Rohe, Karl, Electronic Journal of Statistics, 2017 - Pseudo-likelihood methods for community detection in large sparse networks
Amini, Arash A., Chen, Aiyou, Bickel, Peter J., and Levina, Elizaveta, The Annals of Statistics, 2013 - Spectral clustering and the high-dimensional stochastic blockmodel
Rohe, Karl, Chatterjee, Sourav, and Yu, Bin, The Annals of Statistics, 2011 - Likelihood-based model selection for stochastic block models
Wang, Y. X. Rachel and Bickel, Peter J., The Annals of Statistics, 2017 - Consistency of community detection in networks under degree-corrected stochastic block models
Zhao, Yunpeng, Levina, Elizaveta, and Zhu, Ji, The Annals of Statistics, 2012 - Community detection in degree-corrected block models
Gao, Chao, Ma, Zongming, Zhang, Anderson Y., and Zhou, Harrison H., The Annals of Statistics, 2018