Open Access
April 2021 Distributed linear regression by averaging
Edgar Dobriban, Yue Sheng
Author Affiliations +
Ann. Statist. 49(2): 918-943 (April 2021). DOI: 10.1214/20-AOS1984

Abstract

Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck. In this paper, we study one-step and iterative weighted parameter averaging in statistical linear models under data parallelism. We do linear regression on each machine, send the results to a central server and take a weighted average of the parameters. Optionally, we iterate, sending back the weighted average and doing local ridge regressions centered at it. How does this work compared to doing linear regression on the full data? Here, we study the performance loss in estimation and test error, and confidence interval length in high dimensions, where the number of parameters is comparable to the training data size.

We find the performance loss in one-step weighted averaging, and also give results for iterative averaging. We also find that different problems are affected differently by the distributed framework. Estimation error and confidence interval length increases a lot, while prediction error increases much less. We rely on recent results from random matrix theory, where we develop a new calculus of deterministic equivalents as a tool of broader interest.

Citation

Download Citation

Edgar Dobriban. Yue Sheng. "Distributed linear regression by averaging." Ann. Statist. 49 (2) 918 - 943, April 2021. https://doi.org/10.1214/20-AOS1984

Information

Received: 1 October 2019; Revised: 1 June 2020; Published: April 2021
First available in Project Euclid: 2 April 2021

Digital Object Identifier: 10.1214/20-AOS1984

Subjects:
Primary: 62J05
Secondary: 65Y05 , 68W10 , 68W15

Keywords: Distributed learning , high dimensional , Linear regression , parallel computation , Random matrix theory

Rights: Copyright © 2021 Institute of Mathematical Statistics

Vol.49 • No. 2 • April 2021
Back to Top