## Electronic Journal of Statistics

### Community detection by $L_{0}$-penalized graph Laplacian

#### Abstract

Community detection in network analysis aims at partitioning nodes into disjoint communities. Real networks often contain outlier nodes that do not belong to any communities and often do not have a known number of communities. However, most current algorithms assume that the number of communities is known and even fewer algorithm can handle networks with outliers. In this paper, we propose detecting communities by maximizing a novel model free tightness criterion. We show that this tightness criterion is closely related with the $L_{0}$-penalized graph Laplacian and develop an efficient algorithm to extract communities based on the criterion. Unlike many other community detection methods, this method does not assume the number of communities is known and can properly detect communities in networks with outliers. Under the degree corrected stochastic block model, we show that even for networks with outliers, maximizing the tightness criterion can extract communities with small misclassification rates when the number of communities grows to infinity as the network size grows. Simulation and real data analysis also show that the proposed method performs significantly better than existing methods.

#### Article information

Source
Electron. J. Statist., Volume 12, Number 1 (2018), 1842-1866.

Dates
First available in Project Euclid: 12 June 2018

https://projecteuclid.org/euclid.ejs/1528769122

Digital Object Identifier
doi:10.1214/18-EJS1445

Mathematical Reviews number (MathSciNet)
MR3813599

Zentralblatt MATH identifier
06886387

Subjects
Primary: 62-09: Graphical methods
Secondary: 62P10: Applications to biology and medical sciences

#### Citation

Chen, Chong; Xi, Ruibin; Lin, Nan. Community detection by $L_{0}$-penalized graph Laplacian. Electron. J. Statist. 12 (2018), no. 1, 1842--1866. doi:10.1214/18-EJS1445. https://projecteuclid.org/euclid.ejs/1528769122

#### References

• [1] Amini, A. A., Chen, A., Bickel, P. J., and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks., The Annals of Statistics, 41, 2097–2122.
• [2] Balakrishnan, S., Xu, M., Krishnamurthy, A., and Singh, A. (2011). Noise thresholds for spectral clustering., In Advances in Neural Information Processing Systems, 954–962.
• [3] Bickel, P. J., and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities., Proceedings of the National Academy of Sciences, 106, 21068–21073.
• [4] Bickel, P. J., Choi, D., Chang, X., and Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels., The Annals of Statistics, 41, 1922–1943.
• [5] Cai, T., and Li, X. (2015). Robust and computationally feasible community detection in the presence of arbitrary outlier nodes., The Annals of Statistics, 43, 1027–1059.
• [6] Chaudhuri, K., Graham, F. C., and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model., Journal of Machine Learning Research, 35, 1–23.
• [7] Choi, D., Wolfe, P., and Airoldi, E. (2012). Stochastic blockmodels with a growing number of classes., Biometrika, 99, 273–284.
• [8] Decelle, A., Krzakala, F., Moore, C., and Zdeborová, L. (2011). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications., Physical Review E, 84, 66–106.
• [9] Füredi, Z., and Komlós, J. (1981). The eigenvalues of random symmetric matrices., Combinatorica, 1, 233–241.
• [10] Hagen, L., and Kahng, A. B. (1992). New spectral methods for ratio cut partitioning and clustering., IEEE Transactions on Computer-aided Design of Integrated Circuits and Systems, 11, 1074–1085.
• [11] Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables., Journal of the American Statistical Association, 58, 13–30.
• [12] Holland, P., Laskey, K., and Leinhardt, S. (1983). Stochastic blockmodels: First steps., Social Networks, 5, 109–137.
• [13] Jin, J. (2015). Fast community detection by SCORE., The Annals of Statistics, 43, 57–89.
• [14] Joseph, A., and Yu, B. (2016). Impact of regularization on spectral clustering., The Annals of Statistics, 44, 1765–1791.
• [15] Karrer, B., and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks., Physical Review E, 83, 16–107.
• [16] Kim, S., and Shi, T. (2012). Scalable spectral algorithms for community detection in directed networks., arXiv preprint arXiv:1211.6807.
• [17] Lancichinetti, A., Radicchi, F., Ramasco, J. J., and Fortunato, S. (2011). Finding statistically significant communities in networks., PLOS ONE, 6, e18961.
• [18] Le, C. M., and Levina, E. (2015). Estimating the number of communities in networks by spectral methods., arXiv preprint arXiv:1507.00827.
• [19] Lei, J., and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models., The Annals of Statistics, 43, 215–237.
• [20] Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. (2008). Statistical properties of community structure in large social and information networks., In Proceedings of the 17th international conference on World Wide Web, 695–704.
• [21] Mariadassou, M., Robin, S., and Vacher, C. (2010). Uncovering latent structure in valued graphs: a variational approach., The Annals of Applied Statistics, 4, 715–742.
• [22] Newman, M. E. J. (2004). Coauthorship networks and patterns of scientific collaboration., Proceedings of the National Academy of Sciences, 101, 5200–5205.
• [23] Newman, M. E. J. (2004). Fast algorithm for detecting community structure in networks., Physical Review E, 69, 66–133.
• [24] Newman, M. E. J. (2006). Modularity and community structure in networks., Proceedings of the National Academy of Sciences, 103, 8577–8582.
• [25] Newman, M. E. J., and Girvan, M. (2004). Finding and evaluating community structure in networks., Physical Review E, 69, 26–113.
• [26] Nowicki, K., and Snijders, T. (2001). Estimation and prediction for stochastic blockstructures., Journal of the American Statistical Association, 96, 1077–1087.
• [27] Rohe, K., Chatterjee, S., and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel., The Annals of Statistics, 39, 1878–1915.
• [28] Shi, J., and Malik, J. (2000). Normalized cuts and image segmentation., IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888–905.
• [29] Wang, Y. R., Bickel, P. J., et al. (2017). Likelihood-based model selection for stochastic block models., The Annals of Statistics, 45, 500–528.
• [30] Xu, X., Yuruk, N., Feng, Z., and Schweiger, T. A. (2007). Scan: a structural clustering algorithm for networks., In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, 824–833.
• [31] Yao, Y. Y. (2003). Information-theoretic measures for knowledge discovery and data mining., In Entropy Measures, Maximum Entropy Principle and Emerging Applications, 115–136.
• [32] Yu, H., et al. (2008). High-quality binary protein interaction map of the yeast interactome network., Science, 322, 104–110.
• [33] Zhao, Y., Levina, E., and Zhu, J. (2011). Community extraction for social networks., Proceedings of the National Academy of Sciences, 108, 7321–7326.
• [34] Zhao, Y., Levina, E., and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models., The Annals of Statistics, 40, 2266–2292.
• [35] Noceda, J., and Wright, S. (2006). Numerical Optimization. Berlin, New York:, Springer-Verlag.
• [36] Byokov, Y., Veksler, O. and Zabih, R. (2001). Fast approximate energy minimization via graph cuts., IEEE Transactions on Pattern Analysis and Machine Intelligence, 23, 1222–1239.