August 2022 Generalized resilience and robust statistics
Banghua Zhu, Jiantao Jiao, Jacob Steinhardt
Author Affiliations +
Ann. Statist. 50(4): 2256-2283 (August 2022). DOI: 10.1214/22-AOS2186

Abstract

Robust statistics traditionally focuses on outliers, or perturbations in total variation distance. However, a dataset could be maliciously corrupted in many other ways, such as systematic measurement errors and missing covariates. We consider corruption in either TV or Wasserstein distance, and show that robust estimation is possible whenever the true population distribution satisfies a property called generalized resilience, which holds under moment or hypercontractive conditions. For TV corruption model, our finite-sample analysis improves over previous results for mean estimation with bounded kth moment, linear regression, and joint mean and covariance estimation. For W1 corruption, we provide the first finite-sample guarantees for second moment estimation and linear regression.

Technically, our robust estimators are a generalization of minimum distance (MD) functionals, which project the corrupted distribution onto a given set of well-behaved distributions. The error of these MD functionals is bounded by a certain modulus of continuity, and we provide a systematic method for upper bounding this modulus for the class of generalized resilient distributions, which usually gives sharp population-level results and good finite-sample guarantees.

Funding Statement

This work was partially supported by NSF Grants no. 1804794, 1901252 and 1909499.

Acknowledgments

The authors are grateful to Roman Vershynin for discussions that inspired Lemma H.8 of the Supplementary Material (Zhu, Jiao and Steinhardt (2022)), Peter Bartlett for pointing out the VC dimension bound needed for Theorem 3.3 and Adarsh Prasad for comments on results of mean estimation with bounded covariance assumption.

Citation

Download Citation

Banghua Zhu. Jiantao Jiao. Jacob Steinhardt. "Generalized resilience and robust statistics." Ann. Statist. 50 (4) 2256 - 2283, August 2022. https://doi.org/10.1214/22-AOS2186

Information

Received: 1 November 2020; Revised: 1 February 2022; Published: August 2022
First available in Project Euclid: 25 August 2022

MathSciNet: MR4474490
zbMATH: 07610770
Digital Object Identifier: 10.1214/22-AOS2186

Subjects:
Primary: 62F35
Secondary: 62G35

Keywords: minimum distance functional , robust statistics , total variation distance perturbation , Wasserstein distance perturbation

Rights: Copyright © 2022 Institute of Mathematical Statistics

JOURNAL ARTICLE
28 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.50 • No. 4 • August 2022
Back to Top