## The Annals of Statistics

### Spectrum estimation from samples

#### Abstract

We consider the problem of approximating the set of eigenvalues of the covariance matrix of a multivariate distribution (equivalently, the problem of approximating the “population spectrum”), given access to samples drawn from the distribution. We consider this recovery problem in the regime where the sample size is comparable to, or even sublinear in the dimensionality of the distribution. First, we propose a theoretically optimal and computationally efficient algorithm for recovering the moments of the eigenvalues of the population covariance matrix. We then leverage this accurate moment recovery, via a Wasserstein distance argument, to accurately reconstruct the vector of eigenvalues. Together, this yields an eigenvalue reconstruction algorithm that is asymptotically consistent as the dimensionality of the distribution and sample size tend toward infinity, even in the sublinear sample regime where the ratio of the sample size to the dimensionality tends to zero. In addition to our theoretical results, we show that our approach performs well in practice for a broad range of distributions and sample sizes.

#### Article information

Source
Ann. Statist., Volume 45, Number 5 (2017), 2218-2247.

Dates
Revised: October 2016
First available in Project Euclid: 31 October 2017

https://projecteuclid.org/euclid.aos/1509436833

Digital Object Identifier
doi:10.1214/16-AOS1525

Mathematical Reviews number (MathSciNet)
MR3718167

Zentralblatt MATH identifier
06821124

Subjects
Primary: 62H12: Estimation 62H10: Distribution of statistics

#### Citation

