Asymptotic seed bias in respondent-driven sampling

Yuling Yan; Bret Hanlon; Sebastien Roch; Karl Rohe

doi:10.1214/20-EJS1698

2020 Asymptotic seed bias in respondent-driven sampling

Yuling Yan, Bret Hanlon, Sebastien Roch, Karl Rohe

Electron. J. Statist. 14(1): 1577-1610 (2020). DOI: 10.1214/20-EJS1698

Abstract

Respondent-driven sampling (RDS) collects a sample of individuals in a networked population by incentivizing the sampled individuals to refer their contacts into the sample. This iterative process is initialized from some seed node(s). Sometimes, this selection creates a large amount of seed bias. Other times, the seed bias is small. This paper gains a deeper understanding of this bias by characterizing its effect on the limiting distribution of various RDS estimators. Using classical tools and results from multi-type branching processes [12], we show that the seed bias is negligible for the Generalized Least Squares (GLS) estimator and non-negligible for both the inverse probability weighted and Volz-Heckathorn (VH) estimators. In particular, we show that (i) above a critical threshold, VH converge to a non-trivial mixture distribution, where the mixture component depends on the seed node, and the mixture distribution is possibly multi-modal. Moreover, (ii) GLS converges to a Gaussian distribution independent of the seed node, under a certain condition on the Markov process. Numerical experiments with both simulated data and empirical social networks suggest that these results appear to hold beyond the Markov conditions of the theorems.

Citation

Download Citation

Yuling Yan. Bret Hanlon. Sebastien Roch. Karl Rohe. "Asymptotic seed bias in respondent-driven sampling." Electron. J. Statist. 14 (1) 1577 - 1610, 2020. https://doi.org/10.1214/20-EJS1698