February 2022 Robust sub-Gaussian estimation of a mean vector in nearly linear time
Jules Depersin, Guillaume Lecué
Author Affiliations +
Ann. Statist. 50(1): 511-536 (February 2022). DOI: 10.1214/21-AOS2118

Abstract

We construct an algorithm for estimating the mean of a heavy-tailed random variable when given an adversarial corrupted sample of N independent observations. The only assumption we make on the distribution of the noncorrupted (or informative) data is the existence of a covariance matrix Σ, unknown to the statistician. Our algorithm outputs μˆ, which is robust to the presence of |O| adversarial outliers and satisfies

(1)μˆμ2Tr(Σ)N+ΣopKN

with probability at least 1exp(c0K)exp(c1u), and runtime O˜(Nd+uKd) where K{600|O|,,N} and uN are two parameters of the algorithm. The algorithm is fully data-dependent and does not use (1) in its construction, which combines recently developed tools for median-of-means estimators and covering semidefinite programming. We also show that this algorithm can automatically adapt to the number of outliers (adaptive choice of K) and that it satisfies the same bound in expectation.

Funding Statement

Guillaume Lecué is supported by a grant overseen by the French National Research Agency (ANR) as part of the “Investments d’Avenir” Program (LabEx ECODEC; ANR-11-LABX-0047), by the Médiamétrie chair on “Statistical models and analysis of high-dimensional data” and by the French ANR PRC grant ADDS (ANR-19-CE48-0005).

Acknowledgments

We would like to thank Yeshwanth Cherapanamjeri, Ilias Diakonikolas, Yihe Dong, Nicolas Flammarion, Sam Hopkins and Jerry Li for helpful comments on our work.

Citation

Download Citation

Jules Depersin. Guillaume Lecué. "Robust sub-Gaussian estimation of a mean vector in nearly linear time." Ann. Statist. 50 (1) 511 - 536, February 2022. https://doi.org/10.1214/21-AOS2118

Information

Received: 1 April 2020; Revised: 1 November 2020; Published: February 2022
First available in Project Euclid: 16 February 2022

MathSciNet: MR4382026
zbMATH: 1486.62077
Digital Object Identifier: 10.1214/21-AOS2118

Subjects:
Primary: 62F35 , 62G08
Secondary: 62C20 , 62G05 , 62G20

Keywords: algorithms , Empirical processes , heavy-tailed data , robust statistics

Rights: Copyright © 2022 Institute of Mathematical Statistics

Vol.50 • No. 1 • February 2022
Back to Top