Electronic Journal of Statistics

Spectral clustering based on local linear approximations

Ery Arias-Castro, Guangliang Chen, and Gilad Lerman

Full-text: Open access

Abstract

In the context of clustering, we assume a generative model where each cluster is the result of sampling points in the neighborhood of an embedded smooth surface; the sample may be contaminated with outliers, which are modeled as points sampled in space away from the clusters. We consider a prototype for a higher-order spectral clustering method based on the residual from a local linear approximation. We obtain theoretical guarantees for this algorithm and show that, in terms of both separation and robustness to outliers, it outperforms the standard spectral clustering algorithm (based on pairwise distances) of Ng, Jordan and Weiss (NIPS ’01). The optimal choice for some of the tuning parameters depends on the dimension and thickness of the clusters. We provide estimators that come close enough for our theoretical purposes. We also discuss the cases of clusters of mixed dimensions and of clusters that are generated from smoother surfaces. In our experiments, this algorithm is shown to outperform pairwise spectral clustering on both simulated and real data.

Article information

Source
Electron. J. Statist. Volume 5 (2011), 1537-1587.

Dates
First available in Project Euclid: 23 November 2011

Permanent link to this document
http://projecteuclid.org/euclid.ejs/1322057436

Digital Object Identifier
doi:10.1214/11-EJS651

Mathematical Reviews number (MathSciNet)
MR2861697

Zentralblatt MATH identifier
06166866

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 62G20: Asymptotic properties 68T10: Pattern recognition, speech recognition {For cluster analysis, see 62H30}

Keywords
Spectral clustering higher-order affinities local linear approximation local polynomial approximation detection of clusters in point clouds dimension estimation nearest-neighbor search

Citation

Arias-Castro, Ery; Chen, Guangliang; Lerman, Gilad. Spectral clustering based on local linear approximations. Electron. J. Statist. 5 (2011), 1537--1587. doi:10.1214/11-EJS651. http://projecteuclid.org/euclid.ejs/1322057436.


Export citation

References

  • [1] S. Agarwal, K. Branson, and S. Belongie. Higher order learning with graphs. In, Proceedings of the 23rd International Conference on Machine Learning (ICML ’06), volume 148, pages 17–24, 2006.
  • [2] S. Agarwal, J. Lim, L. Zelnik-Manor, P. Perona, D. Kriegman, and S. Belongie. Beyond pairwise clustering. In, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05), volume 2, pages 838–845, 2005.
  • [3] E. Arias-Castro. Clustering based on pairwise distances when the data is of mixed dimensions., IEEE Trans. Inform. Theory, 57(3) :1692–1706, 2011.
  • [4] E. Arias-Castro, D. L. Donoho, X. Huo, and C. A. Tovey. Connect the dots: how many random points can a regular curve pass through?, Adv. in Appl. Probab., 37(3):571–603, 2005.
  • [5] E. Arias-Castro, B. Efros, and O. Levi. Networks of polynomial pieces with application to the analysis of point clouds and images., J. Approx. Theory, 162(1):94–130, 2010.
  • [6] R. Basri and D. Jacobs. Lambertian reflectance and linear subspaces., IEEE Trans. Pattern Anal. Mach. Intell., 25(2):218–233, 2003.
  • [7] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation., Neural Computation, 15(16) :1373–1396, 2003.
  • [8] A. Beygelzimer, S. Kakade, and J. Langford. Cover trees for nearest neighbor. In, Proceedings of the 23rd International Conference on Machine Learning (ICML ’06), pages 97–104, 2006.
  • [9] M. R. Brito, E. L. Chávez, A. J. Quiroz, and J. E. Yukich. Connectivity of the mutual, k-nearest-neighbor graph in clustering and outlier detection. Statist. Probab. Lett., 35(1):33–42, 1997.
  • [10] G. Chen, S. Atev, and G. Lerman. Kernel spectral curvature clustering (KSCC). In, Dynamical Vision Workshop), IEEE 12th International Conference on Computer Vision, pages 765–772, Kyoto, Japan, 2009.
  • [11] G. Chen and G. Lerman. Foundations of a multi-way spectral clustering framework for hybrid linear modeling., Found. Comput. Math., 9(5):517–558, 2009.
  • [12] G. Chen and G. Lerman. Spectral curvature clustering (SCC)., Int. J. Comput. Vision, 81(3):317–330, 2009.
  • [13] G. Chen, G. Lerman, and E. Arias-Castro. Higher order spectral clustering (hosc) algorithm. Matlab code. Current version available at, http://www.math.duke.edu/~glchen/hosc.html.
  • [14] F. R. K. Chung., Spectral graph theory, volume 92 of CBMS Regional Conference Series in Mathematics. Published for the Conference Board of the Mathematical Sciences, Washington, DC, 1997.
  • [15] J. K. Cullum and R. A. Willoughby., Lanczos Algorithms for Large Symmetric Eigenvalue Computations, Vol. 1: Theory. Classics in Applied Mathematics. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 2002.
  • [16] L. Devroye and G. L. Wise. Detection of abnormal behavior via nonparametric estimation of the support., SIAM J. Appl. Math., 38(3):480–488, 1980.
  • [17] D. L. Donoho and C. Grimes. Image manifolds which are isometric to euclidean space., J. Math. Imaging Vis., 23(1):5–24, 2005.
  • [18] R. M. Dudley. Metric entropy of some classes of sets with differentiable boundaries., J. Approx. Theory, 10:227–236, 1974.
  • [19] R. Epstein, P. Hallinan, and A. Yuille., 5±2 eigenimages suffice: An empirical investigation of low-dimensional lighting models. In IEEE Workshop on Physics-based Modeling in Computer Vision, pages 108–116, June 1995.
  • [20] H. Federer. Curvature measures., Trans. Amer. Math. Soc., 93:418–491, 1959.
  • [21] D. J. Field, A. Hayes, and R. F. Hess. Contour integration by the human visual system: Evidence for a local ‘association field’., Vision Research, 33(2):173–193, 1993.
  • [22] M. Filippone, F. Camastra, F. Masulli, and S. Rovetta. A survey of kernel and spectral methods for clustering., Pattern Recogn., 41(1):176–190, 2008.
  • [23] Z. Fu, W. Hu, and T. Tan. Similarity based vehicle trajectory clustering and anomaly detection. In, Proceedings of the IEEE International Conference on Image Processing (ICIP ’05)., volume 2, pages 602–605, 2005.
  • [24] A. Gionis, A. Hinneburg, S. Papadimitriou, and P. Tsaparas. Dimension induced clustering. In, Proceedings of the eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (KDD ’05), pages 51–60, New York, NY, USA, 2005.
  • [25] A. Goldberg, X. Zhu, A. Singh, Z. Xu, and R. Nowak. Multi-manifold semi-supervised learning. In, Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), 2009.
  • [26] G. H. Golub and C. F. Van Loan., Matrix Computations. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins University Press, Baltimore, MD, third edition, 1996.
  • [27] V. Govindu. A tensor decomposition for geometric grouping and segmentation. In, Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05), volume 1, pages 1150–1157, June 2005.
  • [28] P. Grassberger and I. Procaccia. Measuring the strangeness of strange attractors., Physica D, 9:189–208, 1983.
  • [29] Q. Guo, H. Li, W. Chen, I.-F. Shen, and J. Parkkinen. Manifold clustering via energy minimization. In, ICMLA ’07: Proceedings of the Sixth International Conference on Machine Learning and Applications, pages 375–380, Washington, DC, USA, 2007. IEEE Computer Society.
  • [30] G. Haro, G. Randall, and G. Sapiro. Stratification learning: Detecting mixed density and dimensionality in high dimensional point clouds., Advances in Neural Information Processing Systems (NIPS), 19:553, 2007.
  • [31] J. Ho, M. Yang, J. Lim, K. Lee, and D. Kriegman. Clustering appearances of objects under varying illumination conditions. In, Proceedings of International Conference on Computer Vision and Pattern Recognition (CVPR ’03), volume 1, pages 11–18, 2003.
  • [32] D. Kushnir, M. Galun, and A. Brandt. Fast multiscale clustering and manifold identification., Pattern Recogn., 39(10) :1876–1891, 2006.
  • [33] E. Levina and P. Bickel. Maximum likelihood estimation of intrinsic dimension. In, Advances in Neural Information Processing Systems (NIPS), volume 17, pages 777–784. MIT Press, Cambridge, Massachusetts, 2005.
  • [34] U. Luxburg. A tutorial on spectral clustering., Statistics and Computing, 17(4):395–416, 2007.
  • [35] Y. Ma, A. Y. Yang, H. Derksen, and R. Fossum. Estimation of subspace arrangements with applications in modeling and segmenting mixed data., SIAM Review, 50(3):413–458, 2008.
  • [36] M. Maier, M. Hein, and U. Von Luxburg. Cluster identification in nearest-neighbor graphs. In, Algorithmic Learning Theory, pages 196–210. Springer, 2007.
  • [37] M. Maier, M. Hein, and U. von Luxburg. Optimal construction of k-nearest-neighbor graphs for identifying noisy clusters., Theor. Comput. Sci., 410(19) :1749–1764, 2009.
  • [38] E. Mammen and A. B. Tsybakov. Asymptotical minimax recovery of sets with smooth boundaries., Ann. Statist., 23(2):502–524, 1995.
  • [39] V. Martínez and E. Saar., Statistics of the Galaxy Distribution. Chapman and Hall/CRC press, Boca Raton, 2002.
  • [40] H. Narayanan, M. Belkin, and P. Niyogi. On the relation between low density separation, spectral clustering and graph cuts. In B. Schölkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems (NIPS), volume 19. MIT Press, Cambridge, MA, 2007.
  • [41] H. Neumann, A. Yazdanbakhsh, and E. Mingolla. Seeing surfaces: The brain’s vision of the world., Physics of Life Reviews, 4(3):189–222, 2007.
  • [42] A. Y. Ng, M. I. Jordan, and Y. Weiss. On spectral clustering: Analysis and an algorithm. In, Advances in Neural Information Processing Systems (NIPS), volume 14, pages 849–856, 2002.
  • [43] P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with high confidence from random samples., Discrete Comput. Geom., 39(1):419–441, 2008.
  • [44] B. Pelletier and P. Pudlo. Operator norm convergence of spectral clustering on level sets., Journal of Machine Learning Research, 12:385–416, 2011.
  • [45] M. Penrose., Random Geometric Graphs, volume 5 of Oxford Studies in Probability. Oxford University Press, Oxford, 2003.
  • [46] S. Rao, A. Yang, S. Sastry, and Y. Ma. Robust algebraic segmentation of mixed rigid-body and planar motions from two views., International Journal of Computer Vision, 88(3):425–446, 2010.
  • [47] S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding., Science, 290 (5500):2323–2326, 2000.
  • [48] A. Shashua, R. Zass, and T. Hazan. Multi-way clustering using super-symmetric non-negative tensor factorization. In, Proceedings of the European Conference on Computer Vision (ECCV ’06), volume 4, pages 595–608, 2006.
  • [49] R. Souvenir and R. Pless. Manifold clustering. In, IEEE International Conference on Computer Vision (ICCV ’05), volume 1, pages 648–653, 2005.
  • [50] M. Talagrand., The Generic Chaining. Springer Monographs in Mathematics. Springer-Verlag, Berlin, 2005.
  • [51] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear dimensionality reduction., Science, 290 (5500):2319–2323, 2000.
  • [52] R. Valdarnini. Detection of non-random patterns in cosmological gravitational clustering., Astronomy & Astrophysics, 366:376–386, 2001.
  • [53] R. Vidal and Y. Ma. A unified algebraic approach to 2-D and 3-D motion segmentation and estimation., Journal of Mathematical Imaging and Vision, 25(3):403–421, 2006.
  • [54] U. von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering., Ann. Statist., 36(2):555–586, 2008.
  • [55] H. Weyl. On the volume of tubes., Amer. J. Math., 61(2):461–472, 1939.
  • [56] L. Zelnik-Manor and P. Perona. Self-tuning spectral clustering. In, Advances in Neural Information Processing Systems (NIPS), volume 17, pages 1601–1608, 2004.