Open Access
2024 On variance estimation of random forests with Infinite-order U-statistics
Tianning Xu, Ruoqing Zhu, Xiaofeng Shao
Author Affiliations +
Electron. J. Statist. 18(1): 2135-2207 (2024). DOI: 10.1214/24-EJS2247

Abstract

Infinite-order U-statistics (IOUS) have been used extensively in subbagging ensemble learning algorithms such as random forests to quantify its uncertainty. While normality results of IOUS have been studied extensively, its variance estimation and theoretical properties remain mostly unexplored. Existing approaches mainly utilize the leading term dominance property in the Hoeffding decomposition. However, such a view usually leads to biased estimation when the kernel size is large relative to sample size. On the other hand, while several unbiased estimators exist in the literature, their relationships and theoretical properties, (e.g., ratio consistency), have never been studied. These limitations lead to unguaranteed asymptotic coverage of constructed confidence intervals. To bridge these gaps in the literature, we propose a new view of the Hoeffding decomposition for variance estimation that leads to an unbiased estimator. Instead of leading term dominance, our view utilizes the dominance of the peak region. Moreover, we establish the connection and equivalence of our estimator with several existing unbiased variance estimators. Theoretically, we are the first to establish the ratio consistency of such a variance estimator, which justifies the coverage rate of confidence intervals constructed from random forests. Numerically, we further propose a local smoothing procedure to improve the estimator’s finite sample performance. Extensive simulation studies show that our estimators enjoy lower bias and achieve targeted coverage rates.

Funding Statement

Ruoqing Zhu was supported by NSF grant 2210657.

Citation

Download Citation

Tianning Xu. Ruoqing Zhu. Xiaofeng Shao. "On variance estimation of random forests with Infinite-order U-statistics." Electron. J. Statist. 18 (1) 2135 - 2207, 2024. https://doi.org/10.1214/24-EJS2247

Information

Received: 1 April 2023; Published: 2024
First available in Project Euclid: 14 May 2024

Digital Object Identifier: 10.1214/24-EJS2247

Subjects:
Primary: 62G99
Secondary: 62G05

Keywords: ensemble learning , Hoeffding decomposition , Infinite-order U-statistics , random forests , ratio consistency , variance estimation

Vol.18 • No. 1 • 2024
Back to Top