The Annals of Statistics

Optimization via low-rank approximation for community detection in networks

Can M. Le, Elizaveta Levina, and Roman Vershynin

Full-text: Open access

Abstract

Community detection is one of the fundamental problems of network analysis, for which a number of methods have been proposed. Most model-based or criteria-based methods have to solve an optimization problem over a discrete set of labels to find communities, which is computationally infeasible. Some fast spectral algorithms have been proposed for specific methods or models, but only on a case-by-case basis. Here, we propose a general approach for maximizing a function of a network adjacency matrix over discrete labels by projecting the set of labels onto a subspace approximating the leading eigenvectors of the expected adjacency matrix. This projection onto a low-dimensional space makes the feasible set of labels much smaller and the optimization problem much easier. We prove a general result about this method and show how to apply it to several previously proposed community detection criteria, establishing its consistency for label estimation in each case and demonstrating the fundamental connection between spectral properties of the network and various model-based approaches to community detection. Simulations and applications to real-world data are included to demonstrate our method performs well for multiple problems over a wide range of parameters.

Article information

Source
Ann. Statist., Volume 44, Number 1 (2016), 373-400.

Dates
Received: May 2015
Revised: July 2015
First available in Project Euclid: 5 January 2016

Permanent link to this document
https://projecteuclid.org/euclid.aos/1452004790

Digital Object Identifier
doi:10.1214/15-AOS1360

Mathematical Reviews number (MathSciNet)
MR3449772

Zentralblatt MATH identifier
1331.62312

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 62H25: Factor analysis and principal components; correspondence analysis 62G20: Asymptotic properties

Keywords
Community detection spectral clustering stochastic block model social networks

Citation

Le, Can M.; Levina, Elizaveta; Vershynin, Roman. Optimization via low-rank approximation for community detection in networks. Ann. Statist. 44 (2016), no. 1, 373--400. doi:10.1214/15-AOS1360. https://projecteuclid.org/euclid.aos/1452004790


Export citation

References

  • [1] Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 U.S. election: Divided they blog. In LinkKDD’05—Proceedings of the 3rd International Workshop on Link Discovery 36–43. ACM, New York.
  • [2] Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.
  • [3] Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097–2122.
  • [4] Ball, B., Karrer, B. and Newman, M. E. J. (2011). An efficient and principled method for detecting communities in networks. Phys. Rev. E 34 036103.
  • [5] Bhatia, R. (1997). Matrix Analysis. Springer, New York.
  • [6] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • [7] Chaudhuri, K., Chung, F. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. J. Mach. Learn. Res. Workshop Conf. Proc. 23 35.1–35.23.
  • [8] Chung, F. and Lu, L. (2002). Connected components in random graphs with given expected degree sequences. Ann. Comb. 6 125–145.
  • [9] Decelle, A., Krzakala, F., Moore, C. and Zdeborová, L. (2012). Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications. Phys. Rev. E 84 066106.
  • [10] Erdős, P. and Rényi, A. (1959). On random graphs. I. Publ. Math. Debrecen 6 290–297.
  • [11] Fukuda, K. (2004). From the zonotope construction to the Minkowski addition of convex polytopes. J. Symbolic Comput. 38 1261–1272.
  • [12] Glover, F. and Laguna, M. (1998). Tabu search. In Handbook of Combinatorial Optimization, Vol. 3 621–757. Kluwer Academic, Boston, MA.
  • [13] Goldenberg, A., Zheng, A. X., Fienberg, S. E. and Airoldi, E. M. (2010). A survey of statistical network models. Faund. Trends Mach. Learn. 2 129–233.
  • [14] Gritzmann, P. and Sturmfels, B. (1993). Minkowski addition of polytopes: Computational complexity and applications to Gröbner bases. SIAM J. Discrete Math. 6 246–269.
  • [15] Handcock, M. S., Raftery, A. E. and Tantrum, J. M. (2007). Model-based clustering for social networks. J. Roy. Statist. Soc. Ser. A 170 301–354.
  • [16] Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.
  • [17] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
  • [18] Jin, J. (2015). Fast community detection by SCORE. Ann. Statist. 43 57–89.
  • [19] Joseph, A. and Yu, B. (2013). Impact of regularization on spectral clustering. Preprint. Available at arXiv:1312.1733.
  • [20] Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10.
  • [21] Le, C. M., Levina, E. and Vershynin, R. (2015). Sparse random graphs: Regularization and concentration of the Laplacian. Preprint. Available at arXiv:1502.03049.
  • [22] Le, C. M., Levina, E. and Vershynin, R. (2015). Supplement to “Optimization via low-rank approximation for community detection in networks.” DOI:10.1214/15-AOS1360SUPP.
  • [23] Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237.
  • [24] Lusseau, D. and Newman, M. E. J. (2004). Identifying the role that animals play in their social networks. Proc. R. Soc. London B (Suppl.) 271 S477–S481.
  • [25] Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E. and Dawson, S. M. (2003). The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations: Can geographic isolation explain this unique trait? Behavioral Ecology and Sociobiology 54 396–405.
  • [26] Mariadassou, M., Robin, S. and Vacher, C. (2010). Uncovering latent structure in valued graphs: A variational approach. Ann. Appl. Stat. 4 715–742.
  • [27] Massoulié, L. (2014). Community detection thresholds and the weak Ramanujan property. In STOC’14—Proceedings of the 46th Annual ACM Symposium on Theory of Computing 694–703. ACM, New York.
  • [28] Mihail, M. and Papadimitriou, C. (2002). On the eigenvalue power law. In RANDOM’02—Proceedings of the 6th International Workshop on Randomization and Approximation Techniques 254–262. Springer, London.
  • [29] Mossel, E., Neeman, J. and Sly, A. (2012). Stochastic block models and reconstruction. Preprint. Available at arXiv:1202.1499.
  • [30] Mossel, E., Neeman, J. and Sly, A. (2014). Belief propagation, robust reconstruction, and optimal recovery of block models. COLT 35 356–370.
  • [31] Mossel, E., Neeman, J. and Sly, A. (2014). A proof of the block model threshold conjecture. Preprint. Available at arXiv:1311.4115.
  • [32] Newman, M. E. J. (2013). Spectral methods for network community detection and graph partitioning. Phys. Rev. E 88 042822.
  • [33] Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community structure in networks. Phys. Rev. E 69 026113.
  • [34] Newman, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E (3) 74 036104, 19.
  • [35] Ng, A., Jordan, M. and Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Neural Information Processing Systems (T. Dietterich, S. Becker and Z. Ghahramani, eds.) 14 849–856. MIT Press, Cambridge, MA.
  • [36] Nowicki, K. and Snijders, T. A. B. (2001). Estimation and prediction for stochastic blockstructures. J. Amer. Statist. Assoc. 96 1077–1087.
  • [37] O’Rourke, S., Vu, V. and Wang, K. (2013). Random perturbation of low rank matrices: Improving classical bounds. Preprint. Available at arXiv:1311.2657.
  • [38] Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. In NIPS’13—Advances in Neural Information Processing Systems 26 3120–3128. Curran Associates, Red Hook, NY.
  • [39] Riolo, M. and Newman, M. E. J. (2014). First-principles multiway spectral partitioning of graphs. Journal of Complex Networks 2 121–140.
  • [40] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
  • [41] Sarkar, P. and Bickel, P. J. (2015). Role of normalization in spectral clustering for stochastic blockmodels. Ann. Statist. 43 962–990.
  • [42] Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.
  • [43] Weibel, C. (2010). Implementation and parallelization of a reverse-search algorithm for Minkowski sums. In ALENEX’10—Proceedings of the 12th Workshop on Algorithm Engineering and Experiments 34–42. SIAM, Philadelphia, PA.
  • [44] Yao, Y. Y. (2003). Information-theoretic measures for knowledge discovery and data mining. In Entropy Measures, Maximum Entropy Principle and Emerging Applications 115–136. Springer, Berlin.
  • [45] Zhang, Y., Levina, E. and Zhu, J. (2014). Detecting overlapping communities in networks using spectral methods. Preprint. Available at arXiv:1412.3432.
  • [46] Zhao, Y., Levina, E. and Zhu, J. (2011). Community extraction for social networks. Proc. Natl. Acad. Sci. USA 108 7321–7326.
  • [47] Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.

Supplemental materials