Open Access
2020 Asymptotic seed bias in respondent-driven sampling
Yuling Yan, Bret Hanlon, Sebastien Roch, Karl Rohe
Electron. J. Statist. 14(1): 1577-1610 (2020). DOI: 10.1214/20-EJS1698


Respondent-driven sampling (RDS) collects a sample of individuals in a networked population by incentivizing the sampled individuals to refer their contacts into the sample. This iterative process is initialized from some seed node(s). Sometimes, this selection creates a large amount of seed bias. Other times, the seed bias is small. This paper gains a deeper understanding of this bias by characterizing its effect on the limiting distribution of various RDS estimators. Using classical tools and results from multi-type branching processes [12], we show that the seed bias is negligible for the Generalized Least Squares (GLS) estimator and non-negligible for both the inverse probability weighted and Volz-Heckathorn (VH) estimators. In particular, we show that (i) above a critical threshold, VH converge to a non-trivial mixture distribution, where the mixture component depends on the seed node, and the mixture distribution is possibly multi-modal. Moreover, (ii) GLS converges to a Gaussian distribution independent of the seed node, under a certain condition on the Markov process. Numerical experiments with both simulated data and empirical social networks suggest that these results appear to hold beyond the Markov conditions of the theorems.


Download Citation

Yuling Yan. Bret Hanlon. Sebastien Roch. Karl Rohe. "Asymptotic seed bias in respondent-driven sampling." Electron. J. Statist. 14 (1) 1577 - 1610, 2020.


Received: 1 August 2019; Published: 2020
First available in Project Euclid: 9 April 2020

zbMATH: 07200237
MathSciNet: MR4082477
Digital Object Identifier: 10.1214/20-EJS1698

Primary: 62D05
Secondary: 60J20

Keywords: Galton-Watson process , limit distribution , Volz-Heckathorn estimator

Vol.14 • No. 1 • 2020
Back to Top