The Annals of Statistics

Data spectroscopy: Eigenspaces of convolution operators and clustering

Tao Shi, Mikhail Belkin, and Bin Yu

Source: Ann. Statist. Volume 37, Number 6B (2009), 3960-3984.

Abstract

This paper focuses on obtaining clustering information about a distribution from its i.i.d. samples. We develop theoretical results to understand and use clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function with a sufficiently fast tail decay. In particular, we provide population analyses to gain insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the sample. We learn that a fixed number of top eigenvectors might at the same time contain redundant clustering information and miss relevant clustering information. We use this insight to design the data spectroscopic clustering (DaSpec) algorithm that utilizes properly selected eigenvectors to determine the number of clusters automatically and to group the data accordingly. Our findings extend the intuitions underlying existing spectral techniques such as spectral clustering and Kernel Principal Components Analysis, and provide new understanding into their usability and modes of failure. Simulation studies and experiments on real-world data are conducted to show the potential of our algorithm. In particular, DaSpec is found to handle unbalanced groups and recover clusters of different shapes better than the competing methods.

Primary Subjects: 62H30
Secondary Subjects: 68T10
Keywords: Gaussian kernel; spectral clustering; kernel principal component analysis; support vector machines; unsupervised learning

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1256303533
Digital Object Identifier: doi:10.1214/09-AOS700

References

[1] Belkin, M. and Niyogi, P. (2003). Using manifold structure for partially labeled classification. In Advances in Neural Information Processing Systems (S. Becker, S. Thrun and K. Obermayer, eds.) 15 953–960. MIT Press, Cambridge, MA.
[2] Dhillon, I., Guan, Y. and Kulis, B. (2005). A unified view of kernel k-means, spectral clustering, and graph partitioning. Technical Report UTCS TF-04-25, Univ. Texas, Austin.
[3] Diaconis, P., Goel, S. and Holmes, S. (2008). Horseshoes in multidimensional scaling and kernel methods. Ann. Appl. Stat. 2 777–807.
[4] Koltchinskii, V. and Giné, E. (2000). Random matrix approximation of spectra of integral operators. Bernoulli 6 113–167.
Mathematical Reviews (MathSciNet): MR1781185
Digital Object Identifier: doi:10.2307/3318636
Project Euclid: euclid.bj/1082665383
[5] Le Cun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W. and Jackel, L. (1990). Handwritten digit recognition with a backpropogation network. In Advances in Neural Information Processing Systems (D. Touretzky, ed.) 2. Morgan Kaufman, Denver, CO.
[6] Malik, J., Belongie, S., Leung, T. and Shi, J. (2001). Contour and texture analysis for image segmentation. International Journal of Computer Vision 43 7–27.
[7] Nadler, B. and Galun, M. (2007). Fundamental limitations of spectral clustering. In Advances in Neural Information Processing Systems (B. Schölkopf, J. Platt and T. Hoffman, eds.) 19 1017–1024. MIT Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR2441316
[8] Ng, A., Jordan, M. and Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. In Advances in Neural Information Processing Systems (T. Dietterich, S. Becker and Z. Ghahramani, eds.) 14 955–962. MIT Press, Cambridge, MA.
[9] Parlett, B. N. (1980). The Summetric Eigenvalue Problem. Prentice Hall, Englewood Cliffs, NJ.
Mathematical Reviews (MathSciNet): MR570116
[10] Perona, P. and Freeman, W. T. (1998). A factorization approach to grouping. In Proceedings of the 5th European Conference on Computer Vision 655–670. Springer, London.
[11] Schölkopf, B. and Smola, A. (2002). Learning with Kernels. MIT Press, Cambridge, MA.
[12] Schölkopf, B., Smola, A. and Müller, K. R. (1998). Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10 1299–1319.
[13] Scott, G. and Longuet-Higgins, H. (1990). Feature grouping by relocalisation of eigenvectors of proximity matrix. In Proceedings of British Machine Vision Conference 103–108. Oxford, UK.
[14] Shi, J. and Malik, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 22 888–905.
[15] Shi, T., Belkin, M. and Yu, B. (2008). Data spectroscopy: Learning mixture models using eigenspaces of convolution operators. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML 2008) (A. McCallum and S. Roweis, eds.) 936–943. Omnipress, Madison, WI.
[16] Vapnik, V. (1995). The Nature of Statistical Learning. Springer, New York.
Mathematical Reviews (MathSciNet): MR1367965
[17] Verma, D. and Meila, M. (2001). A comparison of spectral clustering algorithms. Technical report, Univ. Washington Computer Science and Engineering.
[18] von Luxburg, U. (2007). A turorial on spectral clustering. Stat. Comput. 17 395–416.
[19] von Luxburg, U., Belkin, M. and Bousquet, O. (2008). Consistency of spectral clustering. Ann. Statist. 36 555–586.
Mathematical Reviews (MathSciNet): MR2396807
Zentralblatt MATH: 1133.62045
Digital Object Identifier: doi:10.1214/009053607000000640
Project Euclid: euclid.aos/1205420511
[20] Weiss, Y. (1999). Segmentation using eigenvectors: A unifying view. In Proceedings of the Seventh IEEE International Conference on Computer Vision 975–982. IEEE, Los Alamitos, CA.
[21] Williams, C. K. and Seeger, M. (2000). The effect of the input density distribution on kernel-based classifiers. In Proceedings of the 17th International Conference on Machine Learning (P. Langley, ed.) 1159–1166. Morgan Kaufmann, San Francisco, CA.
[22] Zhu, H., Williams, C., Rohwer, R. and Morcinie, M. (1998). Gaussian regression and optimal finite-dimensional linear models. In Neural Networks and Machine Learning (C. Bishop, ed.) 167–184. Springer, Berlin.

2009 © Institute of Mathematical Statistics