Open Access
April 2015 The geometry of kernelized spectral clustering
Geoffrey Schiebinger, Martin J. Wainwright, Bin Yu
Ann. Statist. 43(2): 819-846 (April 2015). DOI: 10.1214/14-AOS1283

Abstract

Clustering of data sets is a standard problem in many areas of science and engineering. The method of spectral clustering is based on embedding the data set using a kernel function, and using the top eigenvectors of the normalized Laplacian to recover the connected components. We study the performance of spectral clustering in recovering the latent labels of i.i.d. samples from a finite mixture of nonparametric distributions. The difficulty of this label recovery problem depends on the overlap between mixture components and how easily a mixture component is divided into two nonoverlapping components. When the overlap is small compared to the indivisibility of the mixture components, the principal eigenspace of the population-level normalized Laplacian operator is approximately spanned by the square-root kernelized component densities. In the finite sample setting, and under the same assumption, embedded samples from different components are approximately orthogonal with high probability when the sample size is large. As a corollary we control the fraction of samples mislabeled by spectral clustering under finite mixtures with nonparametric components.

Citation

Download Citation

Geoffrey Schiebinger. Martin J. Wainwright. Bin Yu. "The geometry of kernelized spectral clustering." Ann. Statist. 43 (2) 819 - 846, April 2015. https://doi.org/10.1214/14-AOS1283

Information

Published: April 2015
First available in Project Euclid: 23 March 2015

zbMATH: 1312.62082
MathSciNet: MR3325711
Digital Object Identifier: 10.1214/14-AOS1283

Subjects:
Primary: 62G20

Keywords: kernel function , mixture model , normalized Laplacian , spectral clustering

Rights: Copyright © 2015 Institute of Mathematical Statistics

Vol.43 • No. 2 • April 2015
Back to Top