September 2022 B-scaling: A novel nonparametric data fusion method
Yiwen Liu, Xiaoxiao Sun, Wenxuan Zhong, Bing Li
Author Affiliations +
Ann. Appl. Stat. 16(3): 1292-1312 (September 2022). DOI: 10.1214/21-AOAS1537


Very often for the same scientific question, there may exist different techniques or experiments that measure the same numerical quantity. Historically, various methods have been developed to exploit the information within each type of data independently. However, statistical data fusion methods that could effectively integrate multisource data under a unified framework are lacking. In this paper we propose a novel data fusion method, called B-scaling, for integrating multisource data. Consider K measurements that are generated from different sources but measure the same latent variable through some linear or nonlinear ways. We seek to find a representation of the latent variable, named B-mean, which captures the common information contained in the K measurements while taking into account the nonlinear mappings between them and the latent variable. We also establish the asymptotic property of the B-mean and apply the proposed method to integrate multiple histone modifications and DNA methylation levels for characterizing epigenomic landscape. Both numerical and empirical studies show that B-scaling is a powerful data fusion method with broad applications.

Funding Statement

Zhong’s research was supported by U.S. National Science Foundation under grants DMS-1440037, DMS-1440038, and DMS-1438957 and by U.S. National Institute of Health under grants R01GM122080 and R01GM113242. Li’s research was supported by U.S. National Science Foundation under grant DMS-1713078.


The authors would like to thank the anonymous referees, an Associate Editor and the Editor for their constructive comments that improved the quality of this paper.


Download Citation

Yiwen Liu. Xiaoxiao Sun. Wenxuan Zhong. Bing Li. "B-scaling: A novel nonparametric data fusion method." Ann. Appl. Stat. 16 (3) 1292 - 1312, September 2022.


Received: 1 May 2020; Revised: 1 August 2021; Published: September 2022
First available in Project Euclid: 19 July 2022

MathSciNet: MR4455881
zbMATH: 1498.62011
Digital Object Identifier: 10.1214/21-AOAS1537

Keywords: data fusion , epigenetics , generalized eigenvalue problem , multisource data

Rights: Copyright © 2022 Institute of Mathematical Statistics


This article is only available to subscribers.
It is not available for individual sale.

Vol.16 • No. 3 • September 2022
Back to Top