The Annals of Statistics
- Ann. Statist.
- Volume 45, Number 5 (2017), 2218-2247.
Spectrum estimation from samples
We consider the problem of approximating the set of eigenvalues of the covariance matrix of a multivariate distribution (equivalently, the problem of approximating the “population spectrum”), given access to samples drawn from the distribution. We consider this recovery problem in the regime where the sample size is comparable to, or even sublinear in the dimensionality of the distribution. First, we propose a theoretically optimal and computationally efficient algorithm for recovering the moments of the eigenvalues of the population covariance matrix. We then leverage this accurate moment recovery, via a Wasserstein distance argument, to accurately reconstruct the vector of eigenvalues. Together, this yields an eigenvalue reconstruction algorithm that is asymptotically consistent as the dimensionality of the distribution and sample size tend toward infinity, even in the sublinear sample regime where the ratio of the sample size to the dimensionality tends to zero. In addition to our theoretical results, we show that our approach performs well in practice for a broad range of distributions and sample sizes.
Ann. Statist., Volume 45, Number 5 (2017), 2218-2247.
Received: February 2016
Revised: October 2016
First available in Project Euclid: 31 October 2017
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Kong, Weihao; Valiant, Gregory. Spectrum estimation from samples. Ann. Statist. 45 (2017), no. 5, 2218--2247. doi:10.1214/16-AOS1525. https://projecteuclid.org/euclid.aos/1509436833
- Supplement to “Spectrum estimation from samples”. The supplement contains the technical details of the proofs of Propositions 1 and 4.