Open Access
August 2016 Impact of regularization on spectral clustering
Antony Joseph, Bin Yu
Ann. Statist. 44(4): 1765-1791 (August 2016). DOI: 10.1214/16-AOS1447

Abstract

The performance of spectral clustering can be considerably improved via regularization, as demonstrated empirically in Amini et al. [Ann. Statist. 41 (2013) 2097–2122]. Here, we provide an attempt at quantifying this improvement through theoretical analysis. Under the stochastic block model (SBM), and its extensions, previous results on spectral clustering relied on the minimum degree of the graph being sufficiently large for its good performance. By examining the scenario where the regularization parameter $\tau$ is large, we show that the minimum degree assumption can potentially be removed. As a special case, for an SBM with two blocks, the results require the maximum degree to be large (grow faster than $\log n$) as opposed to the minimum degree. More importantly, we show the usefulness of regularization in situations where not all nodes belong to well-defined clusters. Our results rely on a ‘bias-variance’-like trade-off that arises from understanding the concentration of the sample Laplacian and the eigengap as a function of the regularization parameter. As a byproduct of our bounds, we propose a data-driven technique DKest (standing for estimated Davis–Kahan bounds) for choosing the regularization parameter. This technique is shown to work well through simulations and on a real data set.

Citation

Download Citation

Antony Joseph. Bin Yu. "Impact of regularization on spectral clustering." Ann. Statist. 44 (4) 1765 - 1791, August 2016. https://doi.org/10.1214/16-AOS1447

Information

Received: 1 July 2014; Revised: 1 January 2016; Published: August 2016
First available in Project Euclid: 7 July 2016

zbMATH: 1357.62229
MathSciNet: MR3519940
Digital Object Identifier: 10.1214/16-AOS1447

Subjects:
Primary: 62F12
Secondary: 62H99

Keywords: Community detection , network analysis , regularization , spectral clustering , Stochastic block model

Rights: Copyright © 2016 Institute of Mathematical Statistics

Vol.44 • No. 4 • August 2016
Back to Top