Electronic Journal of Statistics

Respondent-driven sampling on directed networks

Xin Lu, Jens Malmros, Fredrik Liljeros, and Tom Britton

Full-text: Open access


Respondent-driven sampling (RDS) is a widely used method for generating chain-referral samples from hidden populations. It is an extension of the snowball sampling method and can, given that some assumptions are met, generate unbiased population estimates. One key assumption, not likely to be met, is that the acquaintance network in which the recruitment process takes place is undirected, meaning that all recruiters should have the potential to be recruited by the person they recruit. Using a mean-field approach, we develop an estimator which is based on prior information about the average indegrees of estimated variables. When the indegree is known, such as for RDS studies over internet social networks, the estimator can greatly reduce estimate error and bias as compared with current methods; when the indegree is not known, which is most common for interview-based RDS studies, the estimator can through sensitivity analysis be used as a tool to account for uncertainties of network directedness and error in self-reported degree data. The performance of the new estimator, together with previous RDS estimators, is investigated thoroughly by simulations on networks with varying structures. We have applied the new estimator on an empirical RDS study for injecting drug users in New York City.

Article information

Electron. J. Statist., Volume 7 (2013), 292-322.

First available in Project Euclid: 24 January 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62P25: Applications to social sciences 62-07: Data analysis

Respondent-driven sampling directed networks degree correlation attractivity ratio HIV


Lu, Xin; Malmros, Jens; Liljeros, Fredrik; Britton, Tom. Respondent-driven sampling on directed networks. Electron. J. Statist. 7 (2013), 292--322. doi:10.1214/13-EJS772. https://projecteuclid.org/euclid.ejs/1359041593

Export citation


  • [1] Abdul-Quader, A. S., Heckathorn, D. D., McKnight, C., Bramson, H., Nemeth, C., Sabin, K., Gallagher, K. and Des Jarlais, D. C. (2006). Effectiveness of respondent-driven sampling for recruiting drug users in New York city: Findings from a pilot study., Journal of Urban Health-Bulletin of the New York Academy of Medicine 83 459-476.
  • [2] Abramovitz, D., Volz, E. M., Strathdee, S. A., Patterson, T. L., Vera, A., Frost, S. D. W. and ElCuete, P. (2009). Using Respondent-Driven Sampling in a Hidden Population at Risk of HIV Infection: Who Do HIV-Positive Recruiters Recruit?, Sexually Transmitted Diseases 36 750-756.
  • [3] adams, j. and Moody, J. (2007). To tell the truth: Measuring concordance in multiply reported network data., Social Networks 29 44-58.
  • [4] Binney, J., Dowrick, N., Fisher, A. and Newman, M. (1992)., The theory of critical phenomena, First edn ed. Oxford University Press, Oxford.
  • [5] Deaux, E. and Callaghan, J. W. (1985). Key Informant Versus Self-Report Estimates of Health-Risk Behavior., Evaluation Review 9 365-368.
  • [6] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm., Journal of the Royal Statistical Society. Series B (Methodological) 1-38.
  • [7] Erickson, B. H. (1979). Some Problems of Inference from Chain Data., Sociological Methodology 10 276-302.
  • [8] Feld, S. L. and Carter, W. C. (2002). Detecting measurement bias in respondent reports of personal networks., Social Networks 24 365-383.
  • [9] Fortunato, S., Boguñá, M., Flammini, A. and Menczer, F. (2008). Algorithms and Models for the Web-Graph. (W. Aiello, A. Broder, J. Janssen and E. Milios, eds.) Approximating PageRank from In-Degree, 59–71. Springer-Verlag, Berlin, Heidelberg.
  • [10] Gile, K. J. (2011). Improved Inference for Respondent-Driven Sampling Data With Application to HIV Prevalence Estimation., Journal of the American Statistical Association 106 135-146.
  • [11] Gile, K. J. and Handcock, M. S. (2010). Respondent-Driven Sampling: An Assessment of Current Methodology., Sociological Methodology 40 285-327.
  • [12] Gilks, W. R. (1996)., Markov chain Monte Carlo in practice, 1 ed. Chapman & Hall, London.
  • [13] Gjoka, M., Kurant, M., Butts, C. T. and Markopoulou, A. (2010). Walking in Facebook: A Case Study of Unbiased Sampling of OSNs. In, INFOCOM, 2010 Proceedings IEEE 1 -9.
  • [14] Goel, S. and Salganik, M. J. (2009). Respondent-driven sampling as Markov chain Monte Carlo., Statistics in Medicine 28 2202-2229.
  • [15] Goel, S. and Salganik, M. J. (2010). Assessing respondent-driven sampling., Proceedings of the National Academy of Sciences of the United States of America 107 6743-6747.
  • [16] Hastings, W. K. (1970). Monte-Carlo Sampling Methods Using Markov Chains and Their Applications., Biometrika 57 97-109.
  • [17] Heckathorn, D. D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations., Social Problems 44 174-199.
  • [18] Heckathorn, D. D. (2002). Respondent-driven sampling II: Deriving valid population estimates from chain-referral samples of hidden populations., Social Problems 49 11-34.
  • [19] Iguchi, M. Y., Ober, A. J., Berry, S. H., Fain, T., Heckathorn, D. D., Gorbach, P. M., Heimer, R., Kozlov, A., Ouellet, L. J., Shoptaw, S. and Zule, W. A. (2009). Simultaneous Recruitment of Drug Users and Men Who Have Sex with Men in the United States and Russia Using Respondent-Driven Sampling: Sampling Methods and Implications., Journal of Urban Health-Bulletin of the New York Academy of Medicine 86 S5-S31.
  • [20] Johnston, L. G., Malekinejad, M., Kendall, C., Iuppa, I. M. and Rutherford, G. W. (2008). Implementation challenges to using respondent-driven sampling methodology for HIV biological and behavioral surveillance: Field experiences in international settings., Aids and Behavior 12 S131-S141.
  • [21] Lu, X. (2012). Linked Ego Networks: Improving Estimate Reliability and Validity with Respondent-driven Sampling. arXiv preprint, arXiv:1205.1971v2.
  • [22] Lu, X., Bengtsson, L., Britton, T., Camitz, M., Kim, B. J., Thorson, A. and Liljeros, F. (2012). The sensitivity of respondent-driven sampling., Journal of the Royal Statistical Society: Series A (Statistics in Society) 175 191-216.
  • [23] Ma, X. Y., Zhang, Q. Y., He, X., Sun, W. D., Yue, H., Chen, S., Raymond, H. F., Li, Y., Xu, M., Du, H. and McFarland, W. (2007). Trends in prevalence of HIV, syphilis, hepatitis C, hepatitis B, and sexual risk behavior among men who have sex with men - Results of 3 consecutive respondent-driven sampling surveys in Beijing, 2004 through 2006., Jaids-Journal of Acquired Immune Deficiency Syndromes 45 581-587.
  • [24] Malekinejad, M., Johnston, L. G., Kendall, C., Kerr, L., Rifkin, M. R. and Rutherford, G. W. (2008). Using respondent-driven sampling methodology for HIV biological and behavioral surveillance in international settings: A systematic review., Aids and Behavior 12 S105-S130.
  • [25] McPherson, M., Smith-Lovin, L. and Cook, J. M. (2001). Birds of a feather: Homophily in social networks., Annual Review of Sociology 27 415-444.
  • [26] Morris, M. and Kretzschmar, M. (1995). Concurrent Partnerships and Transmission Dynamic in Networks., Social Networks 17 299-318.
  • [27] Morris, M. and Rothenberg, R. (2011). HIV Transmission Network Metastudy Project: An Archive of Data From Eight Network Studies, 1988–2001.
  • [28] Neely, W. W. (2009). Statistical theory for respondent-driven sampling Doctoral, dissertation.
  • [29] Newman, M. E. J. (2002). Assortative Mixing in Networks., Physical Review Letters 89 208701.
  • [30] O’Neill, E. T., McClain, P. D. and Lavoie, B. F. (2001). A Methodology for Sampling the World Wide Web Web., Journal of Library Administration 34 279-291.
  • [31] Paquette, D. M., Bryant, J. and De Wit, J. (2011). Use of respondent-driven sampling to enhance understanding of injecting networks: A study of people who inject drugs in Sydney, Australia., International Journal of Drug Policy 22 267-273.
  • [32] Rapoport, A. (1980). A Probabilistic Approach to Networks., Social Networks 2 1-18.
  • [33] Rybski, D., Buldyrev, S. V., Havlin, S., Liljeros, F. and Makse, H. A. (2009). Scaling laws of human interaction activity., Proceedings of the National Academy of Sciences of the United States of America 106 12640-12645.
  • [34] Salganik, M. J. (2006). Variance estimation, design effects, and sample size calculations for respondent-driven sampling., Journal of Urban Health-Bulletin of the New York Academy of Medicine 83 I98-I112.
  • [35] Salganik, M. J. and Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent-driven sampling., Sociological Methodology 34 193-239.
  • [36] Schwartz, N., Cohen, R., Avraham, B., Barabsi, A. L. and Havlin, S. (2002). Percolation in directed scale-free networks., Physical Review E 66.
  • [37] Shrestha, S., Smith, M. W., Broman, K. W., Farzadegan, H., Vlahov, D. and Strathdee, S. A. (2006). Multiperson use of syringes among injection drug users in a needle exchange program: A gene-based molecular epidemiologic analysis., Jaids-Journal of Acquired Immune Deficiency Syndromes 43 335-343.
  • [38] Snelson, C. (2005). Sampling the Web: The Development of a Custom Search Tool for Research., Library and Information Science Research Electronic Journal 16.
  • [39] South, S. J. and Haynie, D. L. (2004). Friendship networks of mobile adolescents., Social Forces 83 315-350.
  • [40] Tomas, A. and Gile, K. J. (2011). The effect of differential recruitment, non-response and non-recruitment on estimators for respondent-driven sampling., Electronic Journal of Statistics 5 899-934.
  • [41] UN Joint Programme on HIV/AIDS (2010). Global Report: UNAIDS Report on the Global AIDS Epidemic 2010 Technical, Report.
  • [42] Valente, T. W., Foreman, R. K., Junge, B. and Vlahov, D. (1998). Satellite exchange in the Baltimore Needle Exchange Program., Public Health Rep 113 Suppl 1 90-6.
  • [43] Volz, E. and Heckathorn, D. D. (2008). Probability Based Estimation Theory for Respondent Driven Sampling., Journal of Official Statistics 24 79-97.
  • [44] Wallace, W. L. (1966)., Student culture: Social structure and continuity in a liberal arts college. Aldine Publishing Company, Chicago.
  • [45] Watters, J. K. and Biernacki, P. (1989). Targeted Sampling: Options for the Study of Hidden Populations., Social Problems 36 416-430.
  • [46] Wejnert, C. (2009). An Empirical Test of Respondent-Driven Sampling: Point Estimates, Variance, Degree Measures, and out-of-Equilibrium Data., Sociol Methodol 39 73-116.
  • [47] Wejnert, C. and Heckathorn, D. D. (2008). Web-based network sampling - Efficiency and efficacy of respondent-driven sampling for online research., Sociological Methods $\&$ Research 37 105-134.
  • [48] Xulvi-Brunet, R. and Sokolov, I. M. (2004). Reshuffling scale-free networks: From random to assortative., Physical Review E 70 066102.