Open Access
June 2019 Large sample theory for merged data from multiple sources
Takumi Saegusa
Ann. Statist. 47(3): 1585-1615 (June 2019). DOI: 10.1214/18-AOS1727

Abstract

We develop large sample theory for merged data from multiple sources. Main statistical issues treated in this paper are (1) the same unit potentially appears in multiple datasets from overlapping data sources, (2) duplicated items are not identified and (3) a sample from the same data source is dependent due to sampling without replacement. We propose and study a new weighted empirical process and extend empirical process theory to a dependent and biased sample with duplication. Specifically, we establish the uniform law of large numbers and uniform central limit theorem over a class of functions along with several empirical process results under conditions identical to those in the i.i.d. setting. As applications, we study infinite-dimensional $M$-estimation and develop its consistency, rates of convergence and asymptotic normality. Our theoretical results are illustrated with simulation studies and a real data example.

Citation

Download Citation

Takumi Saegusa. "Large sample theory for merged data from multiple sources." Ann. Statist. 47 (3) 1585 - 1615, June 2019. https://doi.org/10.1214/18-AOS1727

Information

Received: 1 February 2017; Revised: 1 May 2018; Published: June 2019
First available in Project Euclid: 13 February 2019

zbMATH: 07053519
MathSciNet: MR3911123
Digital Object Identifier: 10.1214/18-AOS1727

Subjects:
Primary: 62E20
Secondary: 62D99 , 62G20 , 62N01

Keywords: Calibration , data integration , empirical process , nonregular , sampling without replacement , Semiparametric model

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.47 • No. 3 • June 2019
Back to Top