The Annals of Applied Statistics

Inference for respondent-driven sampling with misclassification

Isabelle S. Beaudry, Krista J. Gile, and Shruti H. Mehta

Respondent-driven sampling (RDS) is a sampling method designed to study hard-to-reach human populations. Beginning with a convenience sample, each participant receives a small number of coupons, which they distribute to their contacts who become eligible. RDS participants are asked to report on their number of contacts in the target population. Also, a set of characteristics is observed for each participant. Current prevalence estimators assume that these attributes are measured accurately. However, ignoring misclassification may lead to biased estimates.

The main contribution of this paper is to discuss two approaches to correct for the bias introduced by the misclassification on nodal attributes for existing RDS estimators. The two approaches leverage misclassification rates assumed to be available from external validation studies. Most importantly, our analysis identifies circumstances for which the performance of the correction methods is impaired in the specific context of RDS. The two methods that are discussed are an analytical correction for estimators of the Hájek estimator style and the Simulation Extrapolation Misclassification (SIMEX MC) approach. Extended methodology to estimate the uncertainty of the corrected estimators is also presented. The performance of the proposed methods is assessed under varying levels of known or uncertain misclassification error across simulated social networks of varying features. Finally, the methods are used to estimate HIV prevalence among people who inject drugs (PWID) and men who have sex with men (MSM) in India.

Article information

Ann. Appl. Stat. Volume 11, Number 4 (2017), 2111-2141.

Received: August 2016
Revised: April 2017
First available in Project Euclid: 28 December 2017

Digital Object Identifier

Digital Object Identifier

Hard-to-reach population sampling misclassification SIMEX MC network sampling social networks


Beaudry, Isabelle S.; Gile, Krista J.; Mehta, Shruti H. Inference for respondent-driven sampling with misclassification. Ann. Appl. Stat. 11 (2017), no. 4, 2111--2141. doi:10.1214/17-AOAS1063.

Supplemental materials

  • Supplement to “Inference for respondent-driven sampling with misclassification”. Supplement A—Performance of the Analytical Adjustment with the Salganik–Heckathorn Estimator: The performance of the Salganik–Heckathorn estimator depends on whether it is close enough to a Hajek style estimator. In this supplement, we discuss why the $c$-factor and its observed version $c^{*}$ both play a role in whether the analytical adjustment suits the Salganik–Heckathorn estimator. Supplement B—Additional Results From Simulation Study: In this supplement, we present additional results from the simulation study such as: (1) the calculations of the Root Mean-Squared-Error (RMSE); (2) the RMSE at various levels of misclassification rates and (3) the sensitivity to erroneous error rates.