## The Annals of Statistics

- Ann. Statist.
- Volume 45, Number 5 (2017), 2218-2247.

### Spectrum estimation from samples

Weihao Kong and Gregory Valiant

#### Abstract

We consider the problem of approximating the set of eigenvalues of the covariance matrix of a multivariate distribution (equivalently, the problem of approximating the “population spectrum”), given access to samples drawn from the distribution. We consider this recovery problem in the regime where the sample size is comparable to, or even sublinear in the dimensionality of the distribution. First, we propose a theoretically optimal and computationally efficient algorithm for recovering the moments of the eigenvalues of the population covariance matrix. We then leverage this accurate moment recovery, via a Wasserstein distance argument, to accurately reconstruct the vector of eigenvalues. Together, this yields an eigenvalue reconstruction algorithm that is asymptotically consistent as the dimensionality of the distribution and sample size tend toward infinity, even in the sublinear sample regime where the ratio of the sample size to the dimensionality tends to zero. In addition to our theoretical results, we show that our approach performs well in practice for a broad range of distributions and sample sizes.

#### Article information

**Source**

Ann. Statist., Volume 45, Number 5 (2017), 2218-2247.

**Dates**

Received: February 2016

Revised: October 2016

First available in Project Euclid: 31 October 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1509436833

**Digital Object Identifier**

doi:10.1214/16-AOS1525

**Mathematical Reviews number (MathSciNet)**

MR3718167

**Zentralblatt MATH identifier**

06821124

**Subjects**

Primary: 62H12: Estimation 62H10: Distribution of statistics

**Keywords**

Spectrum estimation eigenvalues of covariance matrices sublinear sample size method of moments random matrix theory high-dimensional inference

#### Citation

Kong, Weihao; Valiant, Gregory. Spectrum estimation from samples. Ann. Statist. 45 (2017), no. 5, 2218--2247. doi:10.1214/16-AOS1525. https://projecteuclid.org/euclid.aos/1509436833

#### Supplemental materials

- Supplement to “Spectrum estimation from samples”. The supplement contains the technical details of the proofs of Propositions 1 and 4.Digital Object Identifier: doi:10.1214/16-AOS1525SUPP