The Annals of Statistics

Consistency of spectral clustering

Ulrike von Luxburg, Mikhail Belkin, and Olivier Bousquet

Full-text: Open access

Abstract

Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.

Article information

Source
Ann. Statist. Volume 36, Number 2 (2008), 555-586.

Dates
First available in Project Euclid: 13 March 2008

Permanent link to this document
http://projecteuclid.org/euclid.aos/1205420511

Digital Object Identifier
doi:10.1214/009053607000000640

Mathematical Reviews number (MathSciNet)
MR2396807

Zentralblatt MATH identifier
1133.62045

Subjects
Primary: 62G20: Asymptotic properties
Secondary: 05C50: Graphs and linear algebra (matrices, eigenvalues, etc.)

Keywords
Spectral clustering graph Laplacian consistency convergence of eigenvectors

Citation

von Luxburg, Ulrike; Belkin, Mikhail; Bousquet, Olivier. Consistency of spectral clustering. Ann. Statist. 36 (2008), no. 2, 555--586. doi:10.1214/009053607000000640. http://projecteuclid.org/euclid.aos/1205420511.


Export citation

References

  • Abdullaev, Z. and Lakaev, S. (1991). On the spectral properties of the matrix-valued Friedrichs model. In Many-Particle Hamiltonians: Spectra and Scattering. Adv. Soviet Math. 5 1–37. Amer. Math. Soc., Providence, RI.
  • Alpert, C. J. and Yao, S.-Z. (1995). Spectral partitioning: The more eigenvectors, the better. In Proceedings of the 32nd ACM/IEEE Conference on Design Automation 195–200. ACM Press, New York.
  • Anselone, P. (1971). Collectively Compact Operator Approximation Theory. Prentice-Hall, Englewood Cliffs, NJ.
  • Anthony, M. (2002). Uniform Glivenko–Cantelli theorems and concentration of measure in the mathematical modelling of learning. Research Report LSE-CDAM-2002-07.
  • Atkinson, K. (1967). The numerical solution of the eigenvalue problem for compact integral operators. Trans. Amer. Math. Soc. 129 458–465.
  • Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices. Statist. Sinica 9 611–677.
  • Baker, C. (1977). The Numerical Treatment of Integral Equations. Clarendon Press, Oxford.
  • Barnard, S., Pothen, A. and Simon, H. (1995). A spectral algorithm for envelope reduction of sparse matrices. Numer. Linear Algebra Appl. 2 317–334.
  • Bartlett, P., Linder, T. and Lugosi, G. (1998). The minimax distortion redundancy in empirical quantizer design. IEEE Trans. Inform. Theory 44 1802–1813.
  • Belkin, M. and Niyogi, P. (2003). Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15 1373–1396.
  • Ben-David, S. (2004). A framework for statistical clustering with constant time approximation algorithms for K-median clustering. In Proceedings of the 17th Annual Conference on Learning Theory (COLT) (J. Shawe-Taylor and Y. Singer, eds.) 415–426. Springer, Berlin.
  • Bengio, Y., Vincent, P., Paiement, J.-F., Delalleau, O. M. Ouimet, and Le Roux, N. (2004). Learning eigenfunctions links spectral embedding and kernel PCA. Neural Comput. 16 2197–2219.
  • Chatelin, F. (1983). Spectral Approximation of Linear Operators. Academic Press, New York.
  • Chung, F. (1997). Spectral Graph Theory. Conference Board of the Mathematical Sciences, Washington.
  • Dhillon, I. (2001). Co-clustering documents and words using bipartite spectral graph partitioning. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 269–274. ACM Press, New York.
  • Ding, C., He, X., Zha, H., Gu, M. and Simon, H. (2001). A min–max cut algorithm for graph partitioning and data clustering. In Proceedings of the First IEEE International Conference on Data Mining (ICDM) 107–114. IEEE Computer Society, Washington, DC.
  • Donath, W. E. and Hoffman, A. J. (1973). Lower bounds for the partitioning of graphs. IBM J. Res. Develop. 17 420–425.
  • Dudley, R. M. (1999). Uniform Central Limit Theorems. Cambridge Univ. Press.
  • Fiedler, M. (1973). Algebraic connectivity of graphs. Czechoslovak Math. J. 23 298–305.
  • Guattery, S. and Miller, G. (1998). On the quality of spectral separators. SIAM J. Matrix Anal. Appl. 19 701–719.
  • Hagen, L. and Kahng, A. (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Trans. Computer-Aided Design 11 1074–1085.
  • Hartigan, J. (1981). Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76 388–394.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Springer, New York.
  • Hendrickson, B. and Leland, R. (1995). An improved spectral graph partitioning algorithm for mapping parallel computations. SIAM J. Sci. Comput. 16 452–469.
  • Higham, D. and Kibble, M. (2004). A unified view of spectral clustering. Mathematics Research Report 2, Univ. Strathclyde.
  • Ikromov, I. and Sharipov, F. (1998). On the discrete spectrum of the nonanalytic matrix-valued Friedrichs model. Funct. Anal. Appl 32 49–50. Available at http://www.arxiv.org/pdf/funct-an/9502004.
  • Jain, A. K., Murty, M. N. and Flynn, P. J. (1999). Data clustering: A review. ACM Comput. Surv. 31 264–323.
  • Kannan, R., Vempala, S. and Vetta, A. (2000). On clusterings—good, bad and spectral. In 41st Annual Symposium on Foundations Of Computer Science (Redondo Beach, CA, 2000) 367–377. IEEE Comput. Soc. Press, Los Alamitos, CA.
  • Kato, T. (1966). Perturbation Theory for Linear Operators. Springer, Berlin.
  • Koltchinskii, V. (1998). Asymptotics of spectral projections of some random matrices approximating integral operators. In Progr. Probab. 43 191–227. Birkhäuser, Basel.
  • Koltchinskii, V. and Giné, E. (2000). Random matrix approximation of spectra of integral operators. Bernoulli 6 113–167.
  • Lakaev, S. N. (1979). The discrete spectrum of a generalized Friedrichs model. Dokl. Akad. Nauk UzSSR 4 9–10.
  • Meila, M. and Shi, J. (2001). A random walks view of spectral segmentation. In 8th International Workshop on Artificial Intelligence and Statistics (AISTATS).
  • Mendelson, S. (2003). A few notes on statistical learning theory. Advanced Lectures in Machine Learning. Lecture Notes in Comput. Sci. 2600 1–40. Springer, Berlin.
  • Mohar, B. (1991). The Laplacian spectrum of graphs. In Graph Theory, Combinatorics, and Applications 2 (Kalamazoo, MI, 1988) 871–898. Wiley, New York.
  • Ng, A., Jordan, M. and Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (T. Dietterich, S. Becker and Z. Ghahramani, eds.) 14 849–856. MIT Press.
  • Niyogi, P. and Karmarkar, N. K. (2000). An approach to data reduction and clustering with theoretical guarantees. In Proceedings of the Seventeenth International Conference on Machine Learning (P. Langley, ed.) 679–686. Morgan Kaufmann, San Francisco, CA.
  • Pollard, D. (1981). Strong consistency of k-means clustering. Ann. Statist. 9 135–140.
  • Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
  • Pothen, A., Simon, H. D. and Liou, K. P. (1990). Partitioning sparse matrices with eigenvectors of graphs. SIAM J. Matrix Anal. Appl. 11 430–452.
  • Reed, M. and Simon, B. (1980). Functional Analysis. I, 2nd ed. Academic Press, New York.
  • Shawe-Taylor, J., Williams, C., Cristianini, N. and Kandola, J. (2002). On the eigenspectrum of the Gram matrix and its relationship to the operator eigenspectrum. In Proceedings of the 13th International Conference on Algorithmic Learning Theory (N. Cesa-Bianchi, M. Numao and R. Reischuk, eds.) 23–40. Springer, Berlin.
  • Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Trans. Pattern Analysis and Machine Intelligence 22 888–905.
  • Spielman, D. and Teng, S. (1996). Spectral partitioning works: planar graphs and finite element meshes. In 37th Annual Symposium on Foundations of Computer Science (Burlington, VT, 1996) 96–105. IEEE Comput. Soc. Press, Los Alamitos, CA.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • Van Driessche, R. and Roose, D. (1995). An improved spectral bisection algorithm and its application to dynamic load balancing. Parallel Comput. 21 29–48.
  • von Luxburg, U. (2004). Statistical learning with similarity and dissimilarity functions. Ph.D. thesis, Technical Univ. Berlin.
  • von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17. To appear.
  • Weiss, Y. (1999). Segmentation using eigenvectors: A unifying view. In Proceedings of the International Conference on Computer Vision 975–982.
  • Williams, C. K. I. and Seeger, M. (2000). The effect of the input density distribution on kernel-based classifiers. In Proceedings of the 17th International Conference on Machine Learning (P. Langley, ed.) 1159–1166. Morgan Kaufmann, San Francisco.
  • Zhou, D.-X. (2002). The covering number in learning theory. J. Complexity 18 739–767.