## The Annals of Statistics

### Large sample theory for merged data from multiple sources

Takumi Saegusa

#### Abstract

We develop large sample theory for merged data from multiple sources. Main statistical issues treated in this paper are (1) the same unit potentially appears in multiple datasets from overlapping data sources, (2) duplicated items are not identified and (3) a sample from the same data source is dependent due to sampling without replacement. We propose and study a new weighted empirical process and extend empirical process theory to a dependent and biased sample with duplication. Specifically, we establish the uniform law of large numbers and uniform central limit theorem over a class of functions along with several empirical process results under conditions identical to those in the i.i.d. setting. As applications, we study infinite-dimensional $M$-estimation and develop its consistency, rates of convergence and asymptotic normality. Our theoretical results are illustrated with simulation studies and a real data example.

#### Article information

Source
Ann. Statist., Volume 47, Number 3 (2019), 1585-1615.

Dates
Revised: May 2018
First available in Project Euclid: 13 February 2019

Permanent link to this document
https://projecteuclid.org/euclid.aos/1550026850

Digital Object Identifier
doi:10.1214/18-AOS1727

Mathematical Reviews number (MathSciNet)
MR3911123

Zentralblatt MATH identifier
07053519

#### Citation

Saegusa, Takumi. Large sample theory for merged data from multiple sources. Ann. Statist. 47 (2019), no. 3, 1585--1615. doi:10.1214/18-AOS1727. https://projecteuclid.org/euclid.aos/1550026850

#### Supplemental materials

• Supplement to “Large sample theory for merged data from multiple sources.”. The proofs and additional simulations are given in the Supplement [51].