Open Access
2023 Online inference in high-dimensional generalized linear models with streaming data
Lan Luo, Ruijian Han, Yuanyuan Lin, Jian Huang
Author Affiliations +
Electron. J. Statist. 17(2): 3443-3471 (2023). DOI: 10.1214/23-EJS2182

Abstract

In this paper we develop an online statistical inference approach for high-dimensional generalized linear models with streaming data for real-time estimation and inference. We propose an online debiased lasso method that aligns with the data collection scheme of streaming data. Online debiased lasso differs from offline debiased lasso in two important aspects. First, it updates component-wise confidence intervals of regression coefficients with only summary statistics of the historical data. Second, online debiased lasso adds an additional term to correct approximation errors accumulated throughout the online updating procedure. We show that our proposed online debiased estimators in generalized linear models are asymptotically normal. This result provides a theoretical basis for carrying out real-time interim statistical inference with streaming data. Extensive numerical experiments are conducted to evaluate the performance of our proposed online debiased lasso method. These experiments demonstrate the effectiveness of our algorithm and support the theoretical results. Furthermore, we illustrate the application of our method with a high-dimensional text dataset.

Funding Statement

Luo is supported by the National Institute on Aging of the National Institutes of Health (R21AG083364) and the Startup Funds from Rutgers School of Public Health. Han is supported by the Hong Kong Research Grants Council, University Grants Committee (14301821) and The Hong Kong Polytechnic University (P0044617, P0045351). Lin is supported by the Hong Kong Research Grants Council (14306219, 14306620), the National Natural Science Foundation of China (11961028) and Direct Grants for Research from the Chinese University of Hong Kong. Huang is supported by The Hong Kong Polytechnic University (P0042888, A0045417, A0045931).

Acknowledgments

We are grateful to the editor, the associate editor, and the reviewers for their comments which led to a substantial improvement in the manuscript.

Citation

Download Citation

Lan Luo. Ruijian Han. Yuanyuan Lin. Jian Huang. "Online inference in high-dimensional generalized linear models with streaming data." Electron. J. Statist. 17 (2) 3443 - 3471, 2023. https://doi.org/10.1214/23-EJS2182

Information

Received: 1 September 2022; Published: 2023
First available in Project Euclid: 28 November 2023

Digital Object Identifier: 10.1214/23-EJS2182

Subjects:
Primary: 62J07
Secondary: 62F25 , 62J12

Keywords: Confidence interval , generalized linear models , High-dimensional data , online debiased lasso

Vol.17 • No. 2 • 2023
Back to Top