The Annals of Applied Statistics

Inference for respondent-driven sampling with misclassification

Isabelle S. Beaudry, Krista J. Gile, and Shruti H. Mehta

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Respondent-driven sampling (RDS) is a sampling method designed to study hard-to-reach human populations. Beginning with a convenience sample, each participant receives a small number of coupons, which they distribute to their contacts who become eligible. RDS participants are asked to report on their number of contacts in the target population. Also, a set of characteristics is observed for each participant. Current prevalence estimators assume that these attributes are measured accurately. However, ignoring misclassification may lead to biased estimates.

The main contribution of this paper is to discuss two approaches to correct for the bias introduced by the misclassification on nodal attributes for existing RDS estimators. The two approaches leverage misclassification rates assumed to be available from external validation studies. Most importantly, our analysis identifies circumstances for which the performance of the correction methods is impaired in the specific context of RDS. The two methods that are discussed are an analytical correction for estimators of the Hájek estimator style and the Simulation Extrapolation Misclassification (SIMEX MC) approach. Extended methodology to estimate the uncertainty of the corrected estimators is also presented. The performance of the proposed methods is assessed under varying levels of known or uncertain misclassification error across simulated social networks of varying features. Finally, the methods are used to estimate HIV prevalence among people who inject drugs (PWID) and men who have sex with men (MSM) in India.

Article information

Ann. Appl. Stat., Volume 11, Number 4 (2017), 2111-2141.

Received: August 2016
Revised: April 2017
First available in Project Euclid: 28 December 2017

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Hard-to-reach population sampling misclassification SIMEX MC network sampling social networks


Beaudry, Isabelle S.; Gile, Krista J.; Mehta, Shruti H. Inference for respondent-driven sampling with misclassification. Ann. Appl. Stat. 11 (2017), no. 4, 2111--2141. doi:10.1214/17-AOAS1063.

Export citation


  • Barron, B. A. (1977). The effects of misclassification on the estimation of relative risk. Biometrics 33 414–418.
  • Beaudry, I. S, Gile, K. J and Mehta, S. H (2017). Supplement to “Inference for respondent-driven sampling with misclassification.” DOI:10.1214/17-AOAS1063SUPP.
  • Biernacki, P. and Waldorf, D. (1981). Snowball sampling: Problem and techniques of chain referral sampling. Sociol. Methods Res. 10 141–163.
  • Buonaccorsi, J. P. (2010). Measurement Error: Models, Methods, and Applications. CRC Press, Boca Raton, FL.
  • Cook, J. R. and Stefanski, L. A. (1994). Simulation extrapolation estimation in parametric measurement error models. J. Amer. Statist. Assoc. 89 1314–1328.
  • Frank, O. and Strauss, D. (1986). Markov graphs. J. Amer. Statist. Assoc. 81 832–842.
  • Frost, S. D. W., Brouwer, K. C., Cruz, M. A. F., Ramos, R., Ramos, M. E., Lozada, R. M., Magis-Rodriguez, C. and Strathdee, S. A. (2006). Respondent-driven sampling of injection drug users in two U.S.–Mexico border cities: Recruitment dynamics and impact on estimates of HIV and syphilis prevalence. J. Urban Health 83 83–97.
  • Gile, K. J. (2011). Improved inference for respondent-driven sampling data with application to HIV prevalence estimation. J. Amer. Statist. Assoc. 106 135–146.
  • Gile, K. J. and Handcock, M. S. (2010). Respondent-driven sampling: An assessment of current methodology. Sociol. Method. 40 285–327.
  • Gile, K. J., Johnston, L. G. and Salganik, M. J. (2015). Diagnostics for respondent-driven sampling. J. Roy. Statist. Soc. Ser. A 178 241–269.
  • Goodman, L. A. (1961). Snowball sampling. Ann. Math. Stat. 32 148–170.
  • Handcock, M. S., Fellows, I. E. and Gile, K. J. (2015). RDS: respondent-driven sampling. R package version 0.7-2, Los Angeles, CA.
  • Handcock, M. S. and Gile, K. J. (2011). Comment: On the concept of snowball sampling. Sociol. Method. 41 367–371.
  • Handcock, M. S., Hunter, D. R., Butts, C. T., Goodreau, S. M., Krivitsky, P. N., Bender-deMoll, S. and Morris, M. (2015). statnet: Software tools for the statistical analysis of network data. R package version 2015.6.2, The Statnet Project (
  • Heckathorn, D. D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations. Soc. Probl. 44 174–199.
  • Hunter, D. R., Goodreau, S. M. and Handcock, M. S. (2008). Goodness of fit of social network models. J. Amer. Statist. Assoc. 103 248–258.
  • Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks. J. Comput. Graph. Statist. 15 565–583.
  • Johnston, L. G., Malekinejad, M., Kendall, C., Iuppa, I. M. and Rutherford, G. W. (2008). Implementation challenges to using respondent-driven sampling methodology for HIV biological and behavioral surveillance: Field experiences in international settings. AIDS Behav. 12 131–141.
  • Küchenhoff, H., Lederer, W. and Lesaffre, E. (2007). Asymptotic variance estimation for the misclassification SIMEX. Comput. Statist. Data Anal. 51 6197–6211.
  • Küchenhoff, H., Mwalili, S. M. and Lesaffre, E. (2006). A general method for dealing with misclassification in regression: The misclassification SIMEX. Biometrics 62 85–96.
  • Liu, H., Li, J., Ha, T. and Li, J. (2012). Assessment of random recruitment assumption in respondent-driven sampling in egocentric network data. Soc. Netw. 1 13–21.
  • Lu, X. (2013). Linked ego networks: Improving estimate reliability and validity with respondent-driven sampling. Soc. Netw. 35 669–685.
  • Lu, X., Bengtsson, L., Britton, T., Camitz, M., Kim, B. J., Thorson, A. and Liljeros, F. (2012). The sensitivity of respondent-driven sampling. J. Roy. Statist. Soc. Ser. A 175 191–216.
  • Lu, X., Malmros, J., Liljeros, F. and Britton, T. (2013). Respondent-driven sampling on directed networks. Electron. J. Stat. 7 292–322.
  • Lucas, G. M., Solomon, S. S., Srikrishnan, A. K., Agrawal, A., Iqbal, S., Laeyendecker, O., McFall, A. M., Kumar, M. S., Ogburn, E. L., Celentano, D. D., Solomon, S. and Mehta, S. H. (2015). High HIV burden among people who inject drugs in 15 Indian cities. AIDS 29 619–628.
  • Malekinejad, M., Johnston, L., Kendall, C., Kerr, L., Rifkin, M. and Rutherford, G. (2008). Using respondent-driven sampling methodology for HIV biological and behavioral surveillance in international settings: A systematic review. AIDS Behav. 12 105–130.
  • Marks, G., Crepaz, N., Senterfitt, J. W. and Janssen, R. S. (2005). Meta-analysis of high-risk sexual behavior in persons aware and unaware they are infected with HIV in the United States: Implications for HIV prevention programs. J. Acquir. Immune Defic. Syndr. 39 446–453.
  • Mccreesh, N., Frost, S. D. W., Seeley, J., Katongole, J., Tarsh, M. N., Ndunguse, R., Jichi, F., Lunel, N. L., Maher, D., Johnston, L. G., Sonnenberg, P., Copas, A. J., Hayes, R. J. and White, R. G. (2012). Evaluation of respondent-driven sampling. Epidemiology 23 138–147.
  • Mills, H. L., Johnson, S., Hickman, M., Jones, N. S. and Colijn, C. (2014). Errors in reported degrees and respondent driven sampling: Implications for bias. Drug Alcohol Depend. 142 120–126.
  • Montealegre, J. R., Johnston, L. G., Murrill, C. and Monterroso, E. (2013). Respondent driven sampling for HIV biological and behavioral surveillance in Latin America and the Caribbean. AIDS Behav. 17 2313–2340.
  • World Health Organization (2015). Consolidated guidelines on HIV testing services 2015. Technical report, World Health Organization, Geneva.
  • Rudolph, A., Fuller, C. and Latkin, C. (2013). The importance of measuring and accounting for potential biases in respondent-driven samples. AIDS Behav. 17 2244–2252.
  • Salganik, M. J. (2006). Variance estimation, design effects, and sample size calculations for respondent-driven sampling. J. Urban Health 83 i98–i112.
  • Salganik, M. J. and Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent-drive sampling. Sociol. Method. 34 193–239.
  • Shanks, L., Klarkowski, D. and O’Brien, D. P. (2013). False positive HIV diagnoses in resource limited settings: Operational lessons learned for HIV programmes. PLoS ONE 8 8–13.
  • Smith, R., Rossetto, K. and Peterson, B. (2008). A meta-analysis of disclosure of one’s HIV-positive status, stigma and social support. AIDS Care 20 1266–1275.
  • Solomon, S. S., Mehta, S. H., Srikrishnan, A. K., Vasudevan, C. K., Mcfall, A. M., Balakrishnan, P., Anand, S., Nandagopal, P., Ogburn, E. L., Laeyendecker, O., Lucas, G. M., Solomon, S. and Celentano, D. D. (2015). High HIV prevalence and incidence among MSM across 12 cities in India. AIDS 29 723–731.
  • Tomas, A. and Gile, K. J. (2011). The effect of differential recruitment, non-response and non-recruitment on estimators for respondent-driven sampling. Electron. J. Stat. 5 899–934.
  • Trow, M. (1957). Right-Wing Radicalism and Political Intolerance. Arno Press, New York. Reprinted 1980.
  • UNAIDS (2014). The gap report.
  • Verdery, A. M., Merli, M. G., Moody, J., Smith, J. A. and Fisher, J. C. (2015). Respondent-driven sampling estimators under real and theoretical recruitment conditions of female sex workers in China. Epidemiology 26 661–665.
  • Volz, E. and Heckathorn, D. D. (2008). Probability based estimation theory for respondent driven sampling. J. Off. Stat. 24 79–97.
  • Wejnert, C. and Heckathorn, D. D. (2008). Web-based network sampling: Efficiency and efficacy of respondent-driven sampling for online research. Sociol. Methods Res. 37 105–134.
  • Yamanis, T. J., Merli, M. G., Neely, W. W., Tian, F. F., Moody, J., Tu, X. and Gao, E. (2013). An empirical analysis of the impact of recruitment patterns on RDS estimates among a socially ordered population of female sex workers in China. Sociol. Methods Res. 42 392–425.
  • Yates, F. and Grundy, P. M. (1953). Selection without replacement from within strata with probability proportional to size. J. R. Stat. Soc. Ser. B. Stat. Methodol. 15 253–261.

Supplemental materials

  • Supplement to “Inference for respondent-driven sampling with misclassification”. Supplement A—Performance of the Analytical Adjustment with the Salganik–Heckathorn Estimator: The performance of the Salganik–Heckathorn estimator depends on whether it is close enough to a Hajek style estimator. In this supplement, we discuss why the $c$-factor and its observed version $c^{*}$ both play a role in whether the analytical adjustment suits the Salganik–Heckathorn estimator. Supplement B—Additional Results From Simulation Study: In this supplement, we present additional results from the simulation study such as: (1) the calculations of the Root Mean-Squared-Error (RMSE); (2) the RMSE at various levels of misclassification rates and (3) the sensitivity to erroneous error rates.