Open Access
2009 Separating populations with wide data: A spectral analysis
Avrim Blum, Amin Coja-Oghlan, Alan Frieze, Shuheng Zhou
Electron. J. Statist. 3: 76-113 (2009). DOI: 10.1214/08-EJS289


In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of k product distributions. We are interested in the case that individual features are of low average quality γ, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size—the product of number of data points n and the number of features K—needed to correctly perform this partitioning as a function of 1/γ for K>n. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.


Download Citation

Avrim Blum. Amin Coja-Oghlan. Alan Frieze. Shuheng Zhou. "Separating populations with wide data: A spectral analysis." Electron. J. Statist. 3 76 - 113, 2009.


Published: 2009
First available in Project Euclid: 28 January 2009

zbMATH: 1326.62136
MathSciNet: MR2471587
Digital Object Identifier: 10.1214/08-EJS289

Primary: 60K35 , 60K35
Secondary: 60K35

Keywords: clustering , mixture of product distributions , small sample , spectral analysis

Rights: Copyright © 2009 The Institute of Mathematical Statistics and the Bernoulli Society

Back to Top