Open Access
April 2019 Chebyshev polynomials, moment matching, and optimal estimation of the unseen
Yihong Wu, Pengkun Yang
Ann. Statist. 47(2): 857-883 (April 2019). DOI: 10.1214/17-AOS1665

Abstract

We consider the problem of estimating the support size of a discrete distribution whose minimum nonzero mass is at least $\frac{1}{k}$. Under the independent sampling model, we show that the sample complexity, that is, the minimal sample size to achieve an additive error of $\varepsilon k$ with probability at least 0.1 is within universal constant factors of $\frac{k}{\log k}\log^{2}\frac{1}{\varepsilon }$, which improves the state-of-the-art result of $\frac{k}{\varepsilon^{2}\log k}$ in [In Advances in Neural Information Processing Systems (2013) 2157–2165]. Similar characterization of the minimax risk is also obtained. Our procedure is a linear estimator based on the Chebyshev polynomial and its approximation-theoretic properties, which can be evaluated in $O(n+\log^{2}k)$ time and attains the sample complexity within constant factors. The superiority of the proposed estimator in terms of accuracy, computational efficiency and scalability is demonstrated in a variety of synthetic and real datasets.

Citation

Download Citation

Yihong Wu. Pengkun Yang. "Chebyshev polynomials, moment matching, and optimal estimation of the unseen." Ann. Statist. 47 (2) 857 - 883, April 2019. https://doi.org/10.1214/17-AOS1665

Information

Received: 1 June 2016; Revised: 1 November 2017; Published: April 2019
First available in Project Euclid: 11 January 2019

zbMATH: 07033154
MathSciNet: MR3909953
Digital Object Identifier: 10.1214/17-AOS1665

Subjects:
Primary: 62G05
Secondary: 62C20

Keywords: High-dimensional statistics , large domain , nonparametric inference , polynomial approximation , Support size estimation

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.47 • No. 2 • April 2019
Back to Top