Open Access
2016 Kernel spectral clustering of large dimensional data
Romain Couillet, Florent Benaych-Georges
Electron. J. Statist. 10(1): 1393-1454 (2016). DOI: 10.1214/16-EJS1144

Abstract

This article proposes a first analysis of kernel spectral clustering methods in the regime where the dimension $p$ of the data vectors to be clustered and their number $n$ grow large at the same rate. We demonstrate, under a $k$-class Gaussian mixture model, that the normalized Laplacian matrix associated with the kernel matrix asymptotically behaves similar to a so-called spiked random matrix. Some of the isolated eigenvalue-eigenvector pairs in this model are shown to carry the clustering information upon a separability condition classical in spiked matrix models. We evaluate precisely the position of these eigenvalues and the content of the eigenvectors, which unveil important (sometimes quite disruptive) aspects of kernel spectral clustering both from a theoretical and practical standpoints. Our results are then compared to the actual clustering performance of images from the MNIST database, thereby revealing an important match between theory and practice.

Citation

Download Citation

Romain Couillet. Florent Benaych-Georges. "Kernel spectral clustering of large dimensional data." Electron. J. Statist. 10 (1) 1393 - 1454, 2016. https://doi.org/10.1214/16-EJS1144

Information

Received: 1 November 2015; Published: 2016
First available in Project Euclid: 31 May 2016

zbMATH: 06600843
MathSciNet: MR3507369
Digital Object Identifier: 10.1214/16-EJS1144

Subjects:
Primary: 60B20 , 62H30
Secondary: 15B52

Keywords: kernel methods , Random matrix theory , spectral clustering

Rights: Copyright © 2016 The Institute of Mathematical Statistics and the Bernoulli Society

Vol.10 • No. 1 • 2016
Back to Top