This paper focuses on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we provide population analyses to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We learn that a fixed number of top eigenvectors might at the same time contain redundant clustering information and miss relevant clustering information. We use this insight to design the data spectroscopic clustering (DaSpec) algorithm that utilizes properly selected eigenvectors to determine the number of clusters automatically and to group the data accordingly. Our findings extend the intuitions underlying existing spectral techniques such as spectral clustering and Kernel Principal Components Analysis, and provide new understanding into their usability and modes of failure. Simulation studies and experiments on real-world data are conducted to show the potential of our algorithm. In particular, DaSpec is found to handle unbalanced groups and recover clusters of different shapes better than the competing methods.
References
[1] Belkin, M. and Niyogi, P. (2003). Using manifold structure for partially labeled classification. In Advances in Neural Information Processing Systems (S. Becker, S. Thrun and K. Obermayer, eds.) 15 953–960. MIT Press, Cambridge, MA.
[2] Dhillon, I., Guan, Y. and Kulis, B. (2005). A unified view of kernel k-means, spectral clustering, and graph partitioning. Technical Report UTCS TF-04-25, Univ. Texas, Austin.
[3] Diaconis, P., Goel, S. and Holmes, S. (2008). Horseshoes in multidimensional scaling and kernel methods. Ann. Appl. Stat. 2 777–807.
[4] Koltchinskii, V. and Giné, E. (2000). Random matrix approximation of spectra of integral operators. Bernoulli 6 113–167.
[5] Le Cun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W. and Jackel, L. (1990). Handwritten digit recognition with a backpropogation network. In Advances in Neural Information Processing Systems (D. Touretzky, ed.) 2. Morgan Kaufman, Denver, CO.
[6] Malik, J., Belongie, S., Leung, T. and Shi, J. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision 43 7–27.
[7] Nadler, B. and Galun, M. (2007). Fundamental limitations of spectral clustering. In Advances in Neural Information Processing Systems (B. Schölkopf, J. Platt and T. Hoffman, eds.) 19 1017–1024. MIT Press, Cambridge, MA.
[8] Ng, A., Jordan, M. and Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (T. Dietterich, S. Becker and Z. Ghahramani, eds.) 14 955–962. MIT Press, Cambridge, MA.
[9] Parlett, B. N. (1980). The Summetric Eigenvalue Problem. Prentice Hall, Englewood Cliffs, NJ.
Mathematical Reviews (MathSciNet):
MR570116
[10] Perona, P. and Freeman, W. T. (1998). A factorization approach to grouping. In Proceedings of the 5th European Conference on Computer Vision 655–670. Springer, London.
[11] Schölkopf, B. and Smola, A. (2002). Learning with Kernels. MIT Press, Cambridge, MA.
[12] Schölkopf, B., Smola, A. and Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 1299–1319.
[13] Scott, G. and Longuet-Higgins, H. (1990). Feature grouping by relocalisation of eigenvectors of proximity matrix. In Proceedings of British Machine Vision Conference 103–108. Oxford, UK.
[14] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 888–905.
[15] Shi, T., Belkin, M. and Yu, B. (2008). Data spectroscopy: Learning mixture models using eigenspaces of convolution operators. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008) (A. McCallum and S. Roweis, eds.) 936–943. Omnipress, Madison, WI.
[16] Vapnik, V. (1995). The Nature of Statistical Learning. Springer, New York.
[17] Verma, D. and Meila, M. (2001). A comparison of spectral clustering algorithms. Technical report, Univ. Washington Computer Science and Engineering.
[18] von Luxburg, U. (2007). A turorial on spectral clustering. Stat. Comput. 17 395–416.
[19] von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555–586.
[20] Weiss, Y. (1999). Segmentation using eigenvectors: A unifying view. In Proceedings of the Seventh IEEE International Conference on Computer Vision 975–982. IEEE, Los Alamitos, CA.
[21] Williams, C. K. and Seeger, M. (2000). The effect of the input density distribution on kernel-based classifiers. In Proceedings of the 17th International Conference on Machine Learning (P. Langley, ed.) 1159–1166. Morgan Kaufmann, San Francisco, CA.
[22] Zhu, H., Williams, C., Rohwer, R. and Morcinie, M. (1998). Gaussian regression and optimal finite-dimensional linear models. In Neural Networks and Machine Learning (C. Bishop, ed.) 167–184. Springer, Berlin.