Abstract
We construct an algorithm for estimating the mean of a heavy-tailed random variable when given an adversarial corrupted sample of N independent observations. The only assumption we make on the distribution of the noncorrupted (or informative) data is the existence of a covariance matrix Σ, unknown to the statistician. Our algorithm outputs , which is robust to the presence of adversarial outliers and satisfies
with probability at least , and runtime where and are two parameters of the algorithm. The algorithm is fully data-dependent and does not use (1) in its construction, which combines recently developed tools for median-of-means estimators and covering semidefinite programming. We also show that this algorithm can automatically adapt to the number of outliers (adaptive choice of K) and that it satisfies the same bound in expectation.
Funding Statement
Guillaume Lecué is supported by a grant overseen by the French National Research Agency (ANR) as part of the “Investments d’Avenir” Program (LabEx ECODEC; ANR-11-LABX-0047), by the Médiamétrie chair on “Statistical models and analysis of high-dimensional data” and by the French ANR PRC grant ADDS (ANR-19-CE48-0005).
Acknowledgments
We would like to thank Yeshwanth Cherapanamjeri, Ilias Diakonikolas, Yihe Dong, Nicolas Flammarion, Sam Hopkins and Jerry Li for helpful comments on our work.
Citation
Jules Depersin. Guillaume Lecué. "Robust sub-Gaussian estimation of a mean vector in nearly linear time." Ann. Statist. 50 (1) 511 - 536, February 2022. https://doi.org/10.1214/21-AOS2118
Information