## The Annals of Statistics

### Convexified modularity maximization for degree-corrected stochastic block models

#### Abstract

The stochastic block model (SBM), a popular framework for studying community detection in networks, is limited by the assumption that all nodes in the same community are statistically equivalent and have equal expected degrees. The degree-corrected stochastic block model (DCSBM) is a natural extension of SBM that allows for degree heterogeneity within communities. To find the communities under DCSBM, this paper proposes a convexified modularity maximization approach, which is based on a convex programming relaxation of the classical (generalized) modularity maximization formulation, followed by a novel doubly-weighted $\ell_{1}$-norm $k$-medoids procedure. We establish nonasymptotic theoretical guarantees for approximate and perfect clustering, both of which build on a new degree-corrected density gap condition. Our approximate clustering results are insensitive to the minimum degree, and hold even in sparse regime with bounded average degrees. In the special case of SBM, our theoretical guarantees match the best-known results of computationally feasible algorithms. Numerically, we provide an efficient implementation of our algorithm, which is applied to both synthetic and real-world networks. Experiment results show that our method enjoys competitive performance compared to the state of the art in the literature.

#### Article information

Source
Ann. Statist., Volume 46, Number 4 (2018), 1573-1602.

Dates
Revised: April 2017
First available in Project Euclid: 27 June 2018

https://projecteuclid.org/euclid.aos/1530086426

Digital Object Identifier
doi:10.1214/17-AOS1595

Mathematical Reviews number (MathSciNet)
MR3819110

Zentralblatt MATH identifier
06936471

#### Citation

Chen, Yudong; Li, Xiaodong; Xu, Jiaming. Convexified modularity maximization for degree-corrected stochastic block models. Ann. Statist. 46 (2018), no. 4, 1573--1602. doi:10.1214/17-AOS1595. https://projecteuclid.org/euclid.aos/1530086426

#### References

• [1] Abbe, E., Bandeira, A. S. and Hall, G. (2016). Exact recovery in the stochastic block model. IEEE Trans. Inform. Theory 62 471–487.
• [2] Adamic, A. and Glance, N. (2005). The political blogosphere and the 2004 US election: Divided they blog. In Proceedings of the 3rd International Workshop on Link Discovery 36–43. ACM, New York.
• [3] Agarwal, N., Bandeira, A. S., Koiliaris, K. and Kolla, A. (2015). Multisection in the stochastic block model using semidefinite programming. Preprint. Available at arXiv:1507.02323.
• [4] Ames, B. P. W. and Vavasis, S. A. (2014). Convex optimization for the planted $k$-disjoint-clique problem. Math. Program. 143 299–337.
• [5] Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097–2122.
• [6] Amini, A. A. and Levina, E. (2014). On semidefinite relaxations for the block model. Available at arXiv:1406.5647.
• [7] Bandeira, A. S. (2015). Random Laplacian matrices and convex relaxations. Preprint. Available at arXiv:1504.03987.
• [8] Bordenave, C., Lelarge, M. and Massoulié, L. (2015). Non-backtracking spectrum of random graphs: Community detection and non-regular Ramanujan graphs. Preprint. Available at arXiv:1501.06087.
• [9] Cai, T. T. and Li, X. (2015). Robust and computationally feasible community detection in the presence of arbitrary outlier nodes. Ann. Statist. 43 1027–1059.
• [10] Charikar, M., Guha, S., Tardos, É. and Shmoys, D. B. (1999). A constant-factor approximation algorithm for the $k$-median problem (extended abstract). In Annual ACM Symposium on Theory of Computing (Atlanta, GA, 1999) 1–10. ACM, New York.
• [11] Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. In Proceedings of the 25th Annual Conference on Learning Theory (COLT) 35.1–35.23.
• [12] Chen, Y., Li, X. and Xu, J. (2018). Supplement to “Convexified modularity maximization for degree-corrected stochastic block models.” DOI:10.1214/17-AOS1595SUPP.
• [13] Chen, Y., Sanghavi, S. and Xu, H. (2012). Clustering sparse graphs. Adv. Neural Inf. Process. Syst. 2213–2221.
• [14] Chen, Y. and Xu, J. (2014). Statistical-computational phase transitions in planted models: The high-dimensional setting. In Proceedings of the 31st International Conference on Machine Learning 244–252.
• [15] Coja-Oghlan, A. and Lanka, A. (2009). Finding planted partitions in random graphs with general degree distributions. SIAM J. Discrete Math. 23 1682–1714.
• [16] Condon, A. and Karp, R. M. (2001). Algorithms for graph partitioning on the planted partition model. Random Structures Algorithms 18 116–140.
• [17] Dasgupta, A., Hopcroft, J. and McSherry, F. (2004). Spectral analysis of random graphs with skewed degree distributions. In The 45th IEEE FOCS 602–610.
• [18] Fortunato, S. (2010). Community detection in graphs. Phys. Rep. 486 75–174.
• [19] Fortunato, S. and Barthelemy, M. (2007). Resolution limit in community detection. Proc. Natl. Acad. Sci. USA 104 36–41.
• [20] Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2015). Achieving optimal misclassification proportion in stochastic block model. Preprint. Available at arXiv:1505.03772.
• [21] Grothendieck, A. (1996). Résumé de la théorie métrique des produits tensoriels topologiques. Resen. Inst. Mat. Estat. Univ. Sao Paulo 2 401–480.
• [22] Guédon, O. and Vershynin, R. (2016). Community detection in sparse networks via Grothendieck’s inequality. Probab. Theory Related Fields 165 1025–1049.
• [23] Gulikers, L., Lelarge, M. and Massoulié, L. (2017). A spectral method for community detection in moderately sparse degree-corrected stochastic block models. Adv. in Appl. Probab. 49 686–721.
• [24] Hajek, B., Wu, Y. and Xu, J. (2016). Achieving exact cluster recovery threshold via semidefinite programming: Extensions. IEEE Trans. Inform. Theory 62 5918–5937.
• [25] Hajek, B., Wu, Y. and Xu, J. (2016). Achieving exact cluster recovery threshold via semidefinite programming. IEEE Trans. Inform. Theory 62 2788–2797.
• [26] Hajek, B., Wu, Y. and Xu, J. (2016). Semidefinite programs for exact recovery of a hidden community. In Proceedings of Conference on Learning Theory (COLT). Available at arXiv:1602.06410.
• [27] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.
• [28] Jin, J. (2015). Fast community detection by SCORE. Ann. Statist. 43 57–89.
• [29] Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10.
• [30] Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L. and Zhang, P. (2013). Spectral redemption in clustering sparse networks. Proc. Natl. Acad. Sci. USA 110 20935–20940.
• [31] Lancichinetti, A. and Fortunato, S. (2011). Limits of modularity maximization in community detection. Phys. Rev. E 84.
• [32] Le, C. M., Levina, E. and Vershynin, R. (2016). Optimization via low-rank approximation for community detection in networks. Ann. Statist. 44 373–400.
• [33] Le, C. M. and Vershynin, R. (2015). Concentration and regularization of random graphs. Preprint. Available at arXiv:1506.00669.
• [34] Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237.
• [35] Lindenstrauss, J. and Pełczyński, A. (1968). Absolutely summing operators in $L_{p}$-spaces and their applications. Studia Math. 29 275–326.
• [36] McSherry, F. (2001). Spectral partitioning of random graphs. In 42nd IEEE Symposium on Foundations of Computer Science (Las Vegas, NV, 2001) 529–537. IEEE Computer Soc., Los Alamitos, CA.
• [37] Montanari, A. and Sen, S. (2015). Semidefinite programs on sparse random graphs. Preprint. Available at arXiv:1504.05910.
• [38] Newman, M. E. J. (2006). Modularity and community structure in networks. PNAS 103 8577–8582.
• [39] Oymak, S. and Hassibi, B. (2011). Finding dense clusters via low rank $+$ sparse decomposition. Preprint. Available at arXiv:1104.5186.
• [40] Perry, W. and Wein, A. S. (2015). A semidefinite program for unbalanced multisection in the stochastic block model. Preprint. Available at arXiv:1507.05605.
• [41] Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. In Advances in Neural Information Processing Systems 3120–3128.
• [42] Reichardt, J. and Bornholdt, S. (2006). Statistical mechanics of community detection. Phys. Rev. E (3) 74 016110, 14.
• [43] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
• [44] Traud, A. L., Kelsic, E. D., Mucha, P. J. and Porter, M. A. (2011). Comparing community structure to characteristics in online collegiate social networks. SIAM Rev. 53 526–543.
• [45] Traud, A. L., Mucha, P. J. and Porter, M. A. (2012). Social structure of Facebook networks. Phys. A 391 4165–4180.
• [46] Zhang, A. Y. and Zhou, H. H. (2015). Minimax rates of community detection in stochastic block models. Preprint. Available at arXiv:1507.05313.
• [47] Zhang, Y., Levina, E. and Zhu, J. (2014). Detecting overlapping communities in networks using spectral methods. Preprint. Available at arXiv:1412.3432.
• [48] Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.

#### Supplemental materials

• Additional experiments and remaining proofs. In this supplement [12], we provide additional numerical results and the remaining proofs of the theoretical results.