Abstract
Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be maliciously corrupted in many other ways, such as systematic measurement errors and missing covariates. We consider corruption in either or Wasserstein distance, and show that robust estimation is possible whenever the true population distribution satisfies a property called generalized resilience, which holds under moment or hypercontractive conditions. For corruption model, our finite-sample analysis improves over previous results for mean estimation with bounded kth moment, linear regression, and joint mean and covariance estimation. For corruption, we provide the first finite-sample guarantees for second moment estimation and linear regression.
Technically, our robust estimators are a generalization of minimum distance (MD) functionals, which project the corrupted distribution onto a given set of well-behaved distributions. The error of these MD functionals is bounded by a certain modulus of continuity, and we provide a systematic method for upper bounding this modulus for the class of generalized resilient distributions, which usually gives sharp population-level results and good finite-sample guarantees.
Funding Statement
This work was partially supported by NSF Grants no. 1804794, 1901252 and 1909499.
Acknowledgments
The authors are grateful to Roman Vershynin for discussions that inspired Lemma H.8 of the Supplementary Material (Zhu, Jiao and Steinhardt (2022)), Peter Bartlett for pointing out the VC dimension bound needed for Theorem 3.3 and Adarsh Prasad for comments on results of mean estimation with bounded covariance assumption.
Citation
Banghua Zhu. Jiantao Jiao. Jacob Steinhardt. "Generalized resilience and robust statistics." Ann. Statist. 50 (4) 2256 - 2283, August 2022. https://doi.org/10.1214/22-AOS2186
Information