September 2022 Joint integrative analysis of multiple data sources with correlated vector outcomes
Emily C. Hector, Peter X.-K. Song
Author Affiliations +
Ann. Appl. Stat. 16(3): 1700-1717 (September 2022). DOI: 10.1214/21-AOAS1563


We propose a distributed quadratic inference function framework to jointly estimate regression parameters from multiple potentially heterogeneous data sources with correlated vector outcomes. The primary goal of this joint integrative analysis is to estimate covariate effects on all outcomes through a marginal regression model in a statistically and computationally efficient way. We develop a data integration procedure for statistical estimation and inference of regression parameters that is implemented in a fully distributed and parallelized computational scheme. To overcome computational and modeling challenges arising from the high-dimensional likelihood of the correlated vector outcomes, we propose to analyze each data source using Qu, Lindsay and Li’s (Biometrika 87 (2000) 823–836) quadratic inference functions and then to jointly reestimate parameters from each data source by accounting for correlation between data sources using a combined meta-estimator in a similar spirit to the generalized method of moments put forward by Hansen (Econometrica 50 (1982) 1029–1054). We show both theoretically and numerically that the proposed method yields efficiency improvements and is computationally fast. We illustrate the proposed methodology with the joint integrative analysis of the association between smoking and metabolites in a large multicohort study and provide an R package for ease of implementation.

Funding Statement

This work was supported by grants R01ES024732, NSF1811734 and NSF2113564.


We are very grateful to Drs. Michael Boehnke and Markku Laakso for their generosity in letting us use their data. We are grateful to Dr. Lan Luo for sharing her code to estimate parameters using quadratic inference functions. We thank the Associate Editor and the reviewers for their comments which led to a significant improvement of the article.


Download Citation

Emily C. Hector. Peter X.-K. Song. "Joint integrative analysis of multiple data sources with correlated vector outcomes." Ann. Appl. Stat. 16 (3) 1700 - 1717, September 2022.


Received: 1 August 2020; Revised: 1 October 2021; Published: September 2022
First available in Project Euclid: 19 July 2022

Digital Object Identifier: 10.1214/21-AOAS1563

Keywords: data integration , generalized method of moments , parallel computing , quadratic inference function , scalable computing , seemingly unrelated regression

Rights: Copyright © 2022 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.16 • No. 3 • September 2022
Back to Top