The Annals of Statistics

Role of normalization in spectral clustering for stochastic blockmodels

Purnamrita Sarkar and Peter J. Bickel

Full-text: Open access

Abstract

Spectral clustering is a technique that clusters elements using the top few eigenvectors of their (possibly normalized) similarity matrix. The quality of spectral clustering is closely tied to the convergence properties of these principal eigenvectors. This rate of convergence has been shown to be identical for both the normalized and unnormalized variants in recent random matrix theory literature. However, normalization for spectral clustering is commonly believed to be beneficial [ Stat. Comput. 17 (2007) 395–416]. Indeed, our experiments show that normalization improves prediction accuracy. In this paper, for the popular stochastic blockmodel, we theoretically show that normalization shrinks the spread of points in a class by a constant fraction under a broad parameter regime. As a byproduct of our work, we also obtain sharp deviation bounds of empirical principal eigenvalues of graphs generated from a stochastic blockmodel.

Article information

Source
Ann. Statist., Volume 43, Number 3 (2015), 962-990.

Dates
Received: October 2013
Revised: November 2014
First available in Project Euclid: 15 May 2015

Permanent link to this document
https://projecteuclid.org/euclid.aos/1431695635

Digital Object Identifier
doi:10.1214/14-AOS1285

Mathematical Reviews number (MathSciNet)
MR3346694

Zentralblatt MATH identifier
1320.62150

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 60B20: Random matrices (probabilistic aspects; for algebraic aspects see 15B52)

Keywords
Stochastic blockmodel spectral clustering networks normalization asymptotic analysis

Citation

Sarkar, Purnamrita; Bickel, Peter J. Role of normalization in spectral clustering for stochastic blockmodels. Ann. Statist. 43 (2015), no. 3, 962--990. doi:10.1214/14-AOS1285. https://projecteuclid.org/euclid.aos/1431695635


Export citation

References

  • [1] Adamic, L. A. and Glance, N. (2005). The political blogosphere and the 2004 U.S. election: Divided they blog. In Proceedings of the 3rd Intl. Workshop on Link Discovery. ACM, New York.
  • [2] Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097–2122.
  • [3] Bickel, P. J. and Chen, A. (2009). A nonparametric view of network models and Newman Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • [4] Bollobás, B. (1998). Modern Graph Theory. Graduate Texts in Mathematics 184. Springer, New York.
  • [5] Chaudhuri, K., Graham, F. C. and Tsiatas, A. (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. Journal of Machine Learning Research—Proceedings Track 23 35.1–35.23.
  • [6] Chung, F. and Radcliffe, M. (2011). On the spectra of general random graphs. Electron. J. Combin. 18 Paper 215, 14.
  • [7] Donath, W. E. and Hoffman, A. J. (1973). Lower bounds for the partitioning of graphs. IBM J. Res. Develop. 17 420–425.
  • [8] Feige, U. and Ofek, E. (2005). Spectral techniques applied to sparse random graphs. Random Structures Algorithms 27 251–275.
  • [9] Fiedler, M. (1973). Algebraic connectivity of graphs. Czechoslovak Math. J. 23 298–305.
  • [10] Füredi, Z. and Komlós, J. (1981). The eigenvalues of random symmetric matrices. Combinatorica 1 233–241.
  • [11] Hagen, L. W. and Kahng, A. B. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Trans. on CAD of Integrated Circuits and Systems 11 1074–1085.
  • [12] Hendrickson, B. and Leland, R. (1995). An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16 452–469.
  • [13] Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Social Networks 5 109–137.
  • [14] Joseph, A. and Yu, B. (2013). Impact of regularization on spectral clustering. CoRR.
  • [15] Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika 18 39–43.
  • [16] Liben-Nowell, D. and Kleinberg, J. (2003). The link prediction problem for social networks. In Conference on Information and Knowledge Management ACM, New York.
  • [17] Ng, A. Y., Jordan, M. I. and Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems, Vancouver, British Columbia, Canada. MIT Press, Cambridge, MA.
  • [18] Oliveira, R. I. (2009). Concentration of the adjacency matrix and of the Laplacian in random graphs with independent edges. Preprint.
  • [19] Pothen, A., Simon, H. D. and Liou, K.-P. (1990). Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11 430–452.
  • [20] Qin, T. and Rohe, K. (2013). Regularized spectral clustering under the degree-corrected stochastic blockmodel. In Advances in Neural Information Processing Systems, Lake Tahoe, Nevada, USA. MIT Press, Cambridge, MA.
  • [21] Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
  • [22] Sarkar, P. and Bickel, P. J. (2015). Supplement to “Role of normalization in spectral clustering for stochastic blockmodels.” DOI:10.1214/14-AOS1285SUPP.
  • [23] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22 888–905.
  • [24] Sussman, D. L., Tang, M., Fishkind, D. E. and Priebe, C. E. (2012). A consistent adjacency spectral embedding for stochastic blockmodel graphs. J. Amer. Statist. Assoc. 107 1119–1128.
  • [25] von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395–416.
  • [26] von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555–586.

Supplemental materials

  • Supplement to “Role of normalization in spectral clustering for stochastic blockmodels”. Because of space constraints we have moved some of the technical details to the supplementary material [22].