Abstract
Identifying the number of communities is a fundamental problem in community detection, which has received increasing attention recently. However, rapid advances in technology have led to the emergence of large-scale networks in various disciplines, thereby making existing methods computationally infeasible. To address this challenge, we propose a novel subsampling-based modified Bayesian information criterion (SM-BIC) for identifying the number of communities in a network generated via the stochastic block model and degree-corrected stochastic block model. We first propose a node-pair subsampling method to extract an informative subnetwork from the entire network, and then we derive a purely data-driven criterion to identify the number of communities for the subnetwork. In this way, the SM-BIC can identify the number of communities based on the subsampled network instead of the entire dataset. This leads to important computational advantages over existing methods. We theoretically investigate the computational complexity and identification consistency of the SM-BIC. Furthermore, the advantages of the SM-BIC are demonstrated by extensive numerical studies.
Funding Statement
This research is supported by the MOE Project of Key Research Institute of Humanities and Social Sciences (grant 22JJD110001) and the Public Computing Cloud, Renmin University of China. Danyang Huang’s research is partially supported by the National Natural Science Foundation of China (grants 72471230 and 12071477) and the fund for building world-class universities (disciplines) at Renmin University of China. Xiangyu Chang’s research is partially supported by National Natural Science Foundation for Outstanding Young Scholars of China (grant 72122018). Bo Zhang’s research is partially supported by the National Natural Science Foundation of China (grant 72271232), and the fund for building world-class universities (disciplines) at Renmin University of China.
Citation
Jiayi Deng. Danyang Huang. Xiangyu Chang. Bo Zhang. "Subsampling-based modified Bayesian information criterion for large-scale stochastic block models." Electron. J. Statist. 18 (2) 4724 - 4766, 2024. https://doi.org/10.1214/24-EJS2309
Information