Open Access
2024 Renewable Huber estimation method for streaming datasets
Rong Jiang, Lei Liang, Keming Yu
Author Affiliations +
Electron. J. Statist. 18(1): 674-705 (2024). DOI: 10.1214/24-EJS2223

Abstract

Streaming data refers to a data collection scheme where observations arrive sequentially and perpetually over time, making it challenging to fit into computer memory for statistical analysis. The ordinary least squares estimate for linear regression is sensitive to heavy-tailed errors and outliers, which are commonly encountered in applications. In this case, the Huber loss function is a useful criterion for robust regression. In this paper, we propose robust regression estimation and variable selection for streaming datasets. Unlike the renewable estimation generalized linear regression for streaming datasets, however, the Huber loss function is only first-order differentiable, which poses challenges to renewable estimation in both computation and theoretical development. To address the challenge, we introduce a new smoothed version of the Huber first derivative, which admits a fast and scalable algorithm to perform optimization for streaming data sets and achieves the best fitting of Huber function among different versions. Theoretically, the proposed statistics are shown to have the same asymptotic properties as the standard version computed on an entire data stream with the data batches pooled into one data set, without additional condition. The proposed methods are illustrated using current data and the summary statistics of historical data. Both simulations and real data analysis are conducted to illustrate the finite sample performance of the proposed methods.

Citation

Download Citation

Rong Jiang. Lei Liang. Keming Yu. "Renewable Huber estimation method for streaming datasets." Electron. J. Statist. 18 (1) 674 - 705, 2024. https://doi.org/10.1214/24-EJS2223

Information

Received: 1 February 2023; Published: 2024
First available in Project Euclid: 23 February 2024

Digital Object Identifier: 10.1214/24-EJS2223

Subjects:
Primary: 60G08
Secondary: 62G20

Keywords: high-dimensional estimation , Huber loss , online updating , streaming data

Vol.18 • No. 1 • 2024
Back to Top