Electronic Journal of Statistics

Consistent community detection in multi-relational data through restricted multi-layer stochastic blockmodel

Subhadeep Paul and Yuguo Chen

Full-text: Open access

Abstract

In recent years there has been an increased interest in statistical analysis of data with multiple types of relations among a set of entities. Such multi-relational data can be represented as multi-layer graphs where the set of vertices represents the entities and multiple types of edges represent the different relations among them. For community detection in multi-layer graphs, we consider two random graph models, the multi-layer stochastic blockmodel (MLSBM) and a model with a restricted parameter space, the restricted multi-layer stochastic blockmodel (RMLSBM). We derive consistency results for community assignments from the maximum likelihood estimators (MLEs) in both models where MLSBM is assumed to be the true model, and either the number of nodes or the number of types of edges or both grow. We compare MLEs in the two models with other baseline approaches, such as separate modeling of layers, aggregating the layers and majority voting. In simulations RMLSBM is shown to have advantage over MLSBM when either the growth rate of the number of communities is high or the growth rate of the average degree of the component graphs in the multi-graph is low. We also derive minimax rates of error and thresholds for achieving consistency of community detection in both models, which are then used to compare the multi-layer models with a baseline model, the aggregate stochastic block model. The simulation studies and real data applications confirm the superior performance of the multi-layer approaches in comparison to the baseline procedures.

Article information

Source
Electron. J. Statist., Volume 10, Number 2 (2016), 3807-3870.

Dates
Received: March 2016
First available in Project Euclid: 6 December 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1480993455

Digital Object Identifier
doi:10.1214/16-EJS1211

Mathematical Reviews number (MathSciNet)
MR3579677

Zentralblatt MATH identifier
1357.62226

Subjects
Primary: 62G20: Asymptotic properties

Keywords
Community detection consistency minimax rates multi-layer networks consistency thresholds stochastic blockmodel

Citation

Paul, Subhadeep; Chen, Yuguo. Consistent community detection in multi-relational data through restricted multi-layer stochastic blockmodel. Electron. J. Statist. 10 (2016), no. 2, 3807--3870. doi:10.1214/16-EJS1211. https://projecteuclid.org/euclid.ejs/1480993455


Export citation

References

  • [1] Abbe, E. and Sandon, C. (2015). Community detection in general stochastic block models: Fundamental limits and efficient algorithms for recovery. In, 2015 IEEE 56th Annual Symposium on Foundations of Computer Science (FOCS) 670–688. IEEE.
  • [2] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels., J. Mach. Learn. Res. 9 1981–2014.
  • [3] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities., Proceedings of the National Academy of Sciences 106 21068–21073.
  • [4] Bickel, P. J., Choi, D., Chang, X. and Zhang, H. (2013). Asymptotic normality of maximum likelihood and its variational approximation for stochastic blockmodels., Ann. Statist 41 1922–1943.
  • [5] Binkiewicz, N. M. (2015). Contextualized Network Analysis: Theory and Methods for Networks with Node Covariates PhD thesis, University of, Wisconsin-Madison.
  • [6] Budimir, I., Dragomir, S. S. and Pecaric, J. (2001). Further reverse results for Jensen’s discrete inequality and applications in information theory., J. Inequal. Pure Appl. Math. 2 5.
  • [7] Celisse, A., Daudin, J. J. and Pierre, L. (2012). Consistency of maximum-likelihood and variational estimators in the stochastic block model., Electronic Journal of Statistics 6 1847–1899.
  • [8] Choi, D. S., Wolfe, P. J. and Airoldi, E. M. (2012). Stochastic blockmodels with a growing number of classes., Biometrika 99 273–284.
  • [9] Chung, F. and Lu, L. (2006)., Complex Graphs and Networks. American Mathematical Society.
  • [10] Daudin, J. J., Picard, F. and Robin, S. (2008). A mixture model for random graphs., Stat Comput 18 173–183.
  • [11] Dong, X., Frossard, P., Vandergheynst, P. and Nefedov, N. (2012). Clustering with multi-layer graphs: A spectral perspective., IEEE Transactions on Signal Processing 60 5820–5831.
  • [12] Gao, C., Lu, Y. and Zhou, H. H. (2015). Rate-optimal graphon estimation., The Annals of Statistics 43 2624–2652.
  • [13] Gao, C., Ma, Z., Zhang, A. Y. and Zhou, H. H. (2015). Achieving Optimal Misclassification Proportion in Stochastic Block Model., arXiv preprint arXiv:1505.03772.
  • [14] Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models., Foundations and Trends in Machine Learning 2 129–233.
  • [15] Greene, D. and Cunningham, P. (2013). Producing a unified graph representation from multiple social network views., ACM Web Science 2 129–233.
  • [16] Hajek, B., Wu, Y. and Xu, J. (2016). Achieving exact cluster recovery threshold via semidefinite programming., IEEE Transactions on Information Theory 62 2788–2797.
  • [17] Han, Q., Xu, K. and Airoldi, E. (2015). Consistent estimation of dynamic and multi-layer block models. In, Proceedings of The 32nd International Conference on Machine Learning 1511–1520.
  • [18] Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks., J. Roy. Statist. Soc. Ser. A 170 301–354.
  • [19] Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis., J. Amer. Statist. Assoc. 97 1090–1098.
  • [20] Holland, P., Laskey, K. and Leinhardt, S. (1983). Stochastic blockmodels: some first steps., Social Networks 5 109–137.
  • [21] Jenatton, R., Le Roux, N., Bordes, A. and Obozinski, G. (2012). A latent factor model for highly multi-relational data., Advances in Neural Information Processing Systems 3167–3175.
  • [22] Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks., Phys. Rev. E. 83 016107.
  • [23] Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In, Proceedings of the 21st National Conference on Artificial Intelligence 1 381–388.
  • [24] Latouche, P., Birmele, E. and Ambroise, C. (2011). Overlapping stochastic block models with application to the French political blogosphere., Ann. Appl. Stat. 5 309–336.
  • [25] Leskovec, J., Lang, K. J., Dasgupta, A. and Mahoney, M. W. (2008). Statistical properties of community structure in large social and information networks. In, Proceedings of the 17th international conference on World Wide Web 695–704. ACM.
  • [26] Matias, C. and Robin, S. (2014). Modeling heterogeneity in random graphs through latent space models: a selective review., ESAIM: Proceedings and Surveys 47 55–74.
  • [27] Mossel, E., Neeman, J. and Sly, A. (2012). Stochastic block models and reconstruction., arXiv preprint arXiv:1202.1499.
  • [28] Mossel, E., Neeman, J. and Sly, A. (2013). A proof of the block model threshold conjecture., arXiv preprint arXiv:1311.4115.
  • [29] Mossel, E., Neeman, J. and Sly, A. (2014). Consistency thresholds for binary symmetric block models., arXiv preprint arXiv:1407.1591.
  • [30] Mucha, P. J., Richardson, T., Macon, K., Porter, M. A. and Onnela, J. P. (2010). Community structure in time-dependent, multiscale, and multiplex networks., Science 328 876–878.
  • [31] Narayanan, M., Vetta, A., Schadt, E. and Zhu, J. (2010). Simultaneous clustering of multiple gene expression and physical interaction datasets., PLoS. Comp. Bio. 6 e1000742.
  • [32] Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks., Phys. Rev. E 69 026113.
  • [33] Nickel, M., Tresp, V. and Kriegel, H. P. (2011). A three-way model for collective learning on multi-relational data., International Conference on Machine Learning 809–816.
  • [34] Nowicki, K. and Snijders, T. (2001). Estimation and prediction for stochastic block structures., J. Am. Stat. Assoc. 96 1077–1087.
  • [35] Papalexakis, E. E., Akoglu, L. and Ience, D. (2013). Do more views of a graph help? Community detection and clustering in multi-graphs. In, Proceedings of the 16th International Conference on Information Fusion 899–905.
  • [36] Rocklin, M. and Pinar, A. (2011). Latent clustering on graphs with multiple edge types. In, Algorithms and Models for the Web Graph 38–49. Springer.
  • [37] Rohe, K., Qin, T. and Fan, H. (2012). The highest dimensional stochastic blockmodel with a regularized estimator., Statistica Sinica 39 1878–1915.
  • [38] Simic, S. (2009). On an upper bound for Jensen’s inequality., Journal of Inequalities in Pure and Applied Mathematics 10 60.
  • [39] Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure., Journal of Classificaion 14 75–100.
  • [40] Tang, W., Lu, Z. and Dhillon, I. S. (2009). Clustering with multiple graphs. In, Proceedings of the 9th IEEE International Conference on Data Mining 1016–1021.
  • [41] Taskar, B., Segal, E. and Koller, D. (2001). Probabilistic classification and clustering in relational data. In, Proceedings of the 17th International Joint Conference on Artificial Intelligence 870–876.
  • [42] Van Erven, T. and Harremoës, P. (2014). Rényi divergence and Kullback-Leibler divergence., IEEE Transactions on Information Theory 60 3797–3820.
  • [43] Zhang, A. Y. and Zhou, H. H. (2015). Minimax rates of community detection in stochastic block model., arXiv preprint arXiv:1507.05313.
  • [44] Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models., Ann. Statist 40 2266–2292.