## Electronic Journal of Statistics

- Electron. J. Statist.
- Volume 3 (2009), 1-1605

### Separating populations with wide data: A spectral analysis

Avrim Blum, Amin Coja-Oghlan, Alan Frieze, and Shuheng Zhou

#### Abstract

In this paper, we consider the problem of partitioning a small data sample drawn from a mixture of *k* product distributions. We are interested in the case that individual features are of low average quality *γ*, and we want to use as few of them as possible to correctly partition the sample. We analyze a spectral technique that is able to approximately optimize the total data size—the product of number of data points *n* and the number of features *K*—needed to correctly perform this partitioning as a function of 1/*γ* for *K**>**n*. Our goal is motivated by an application in clustering individuals according to their population of origin using markers, when the divergence between any two of the populations is small.

#### Article information

**Source**

Electron. J. Statist. Volume 3 (2009), 76-113.

**Dates**

First available: 28 January 2009

**Permanent link to this document**

http://projecteuclid.org/euclid.ejs/1233176791

**Digital Object Identifier**

doi:10.1214/08-EJS289

**Mathematical Reviews number (MathSciNet)**

MR2471587

**Zentralblatt MATH identifier**

05279470

**Subjects**

Primary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43] 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Secondary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

**Keywords**

mixture of product distributions clustering small sample spectral analysis

#### Citation

Blum, Avrim; Coja-Oghlan, Amin; Frieze, Alan; Zhou, Shuheng. Separating populations with wide data: A spectral analysis. Electronic Journal of Statistics 3 (2009), 76--113. doi:10.1214/08-EJS289. http://projecteuclid.org/euclid.ejs/1233176791.