Translator Disclaimer
October 2021 Distributed statistical inference for massive data
Song Xi Chen, Liuhua Peng
Author Affiliations +
Ann. Statist. 49(5): 2851-2869 (October 2021). DOI: 10.1214/21-AOS2062


This paper considers distributed statistical inference for general symmetric statistics in the context of massive data with efficient computation. Estimation efficiency and asymptotic distributions of the distributed statistics are provided, which reveal different results between the nondegenerate and degenerate cases, and show the number of the data subsets plays an important role. Two distributed bootstrap methods are proposed and analyzed to approximation the underlying distribution of the distributed statistics with improved computation efficiency over existing methods. The accuracy of the distributional approximation by the bootstrap are studied theoretically. One of the methods, the pseudo-distributed bootstrap, is particularly attractive if the number of datasets is large as it directly resamples the subset-based statistics, assumes less stringent conditions and its performance can be improved by studentization.

Funding Statement

Chen’s research is partially supported by National Natural Science Foundation of China grants 92046021, 12026607, 12071013 and 71973005 and LMEQF at Peking University.


Download Citation

Song Xi Chen. Liuhua Peng. "Distributed statistical inference for massive data." Ann. Statist. 49 (5) 2851 - 2869, October 2021.


Received: 1 August 2020; Revised: 1 January 2021; Published: October 2021
First available in Project Euclid: 12 November 2021

Digital Object Identifier: 10.1214/21-AOS2062

Primary: 62G09
Secondary: 62G20

Keywords: Distributed bootstrap , distributed statistics , massive data , pseudo-distributed bootstrap

Rights: Copyright © 2021 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.49 • No. 5 • October 2021
Back to Top