The Annals of Statistics

A critical threshold for design effects in network sampling

Karl Rohe

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Web crawling, snowball sampling, and respondent-driven sampling (RDS) are three types of network sampling techniques used to contact individuals in hard-to-reach populations. This paper studies these procedures as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree (instead of a chain) allows for the sampled units to refer multiple future units into the sample.

In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is some value $\operatorname{DE}$, then constructing an estimator from the novel design makes the variance of the estimator $\operatorname{DE}$ times greater than it would be under a simple random sample with the same sample size $n$. Under certain assumptions on the referral tree, the design effect of network sampling has a critical threshold that is a function of the referral rate $m$ and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix, $\lambda_{2}$. If $m<1/\lambda_{2}^{2}$, then the design effect is finite (i.e., the standard estimator is $\sqrt{n}$-consistent). However, if $m>1/\lambda_{2}^{2}$, then the design effect grows with $n$ (i.e., the standard estimator is no longer $\sqrt{n}$-consistent). Past this critical threshold, the standard error of the estimator converges at the slower rate of $n^{\log_{m}\lambda_{2}}$. The Markov model allows for nodes to be resampled; computational results show that the findings hold in without-replacement sampling. To estimate confidence intervals that adapt to the correct level of uncertainty, a novel resampling procedure is proposed. Computational experiments compare this procedure to previous techniques.

Article information

Ann. Statist., Volume 47, Number 1 (2019), 556-582.

Received: May 2017
Revised: February 2018
First available in Project Euclid: 30 November 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62D99: None of the above, but in this section
Secondary: 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) [See also 90B30, 91D10, 91D35, 91E40]

Stochastic blockmodel social network link-tracing Galton–Watson


Rohe, Karl. A critical threshold for design effects in network sampling. Ann. Statist. 47 (2019), no. 1, 556--582. doi:10.1214/18-AOS1700.

Export citation


  • Abdul-Quader, A. S., Heckathorn, D. D., McKnight, C., Bramson, H., Nemeth, C., Sabin, K., Gallagher, K. and Des Jarlais, D. C. (2006). Effectiveness of respondent-driven sampling for recruiting drug users in New York City: Findings from a pilot study. J. Urban Health 83 459–476.
  • Arayasirikul, S., Cai, X. and Wilson, E. C. (2015). A qualitative examination of respondent-driven sampling (RDS) peer referral challenges among young transwomen in the San Francisco bay area. JMIR Public Health Surveill. 1 e9.
  • Athreya, K. B. and Ney, P. E. (1972). Branching Processes. Die Grundlehren der Mathematischen Wissenschaften 196. Springer, New York.
  • Baraff, A. J., McCormick, T. H. and Raftery, A. E. (2016). Estimating uncertainty in respondent-driven sampling using a tree bootstrap method. Proc. Natl. Acad. Sci. USA 201617258.
  • Benjamini, I. and Peres, Y. (1994). Markov chains indexed by trees. Ann. Probab. 22 219–243.
  • Chung, F. R. K. (1997). Spectral Graph Theory. CBMS Regional Conference Series in Mathematics 92. Published for the Conference Board of the Mathematical Sciences, Washington, DC; by the Amer. Math. Soc., Providence, RI.
  • Gile, K. J. (2011). Improved inference for respondent-driven sampling data with application to HIV prevalence estimation. J. Amer. Statist. Assoc. 106 135–146.
  • Gile, K. J. and Handcock, M. S. (2010). Respondent-driven sampling: An assessment of current methodology. Sociol. Method. 40 285–327.
  • Gile, K. J., Johnston, L. G. and Salganik, M. J. (2015). Diagnostics for respondent-driven sampling. J. Roy. Statist. Soc. Ser. A 178 241–269.
  • Goel, S. and Salganik, M. J. (2009). Respondent-driven sampling as Markov chain Monte Carlo. Stat. Med. 28 2202–2229.
  • Goel, S. and Salganik, M. J. (2010). Assessing respondent-driven sampling. Proc. Natl. Acad. Sci. USA 107 6743–6747.
  • Handcock, M. S., Fellows, I. E. and Gile, K. J. (2016). RDS: Respondent-driven sampling. Los Angeles, CA, R package version 0.7-5.
  • Heckathorn, D. D. (1997). Respondent-driven sampling: A new approach to the study of hidden populations. Soc. Probl. 44 174–199.
  • Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.
  • Johnston, L. G., Chen, Y.-H., Silva-Santisteban, A. and Raymond, H. F. (2013). An empirical examination of respondent driven sampling design effects among HIV risk groups from studies conducted around the world. AIDS Behav. 17 2202–2210.
  • Khabbazian, M., Hanlon, B., Russek, Z. and Rohe, K. (2017). Novel sampling design for respondent-driven sampling. Electron. J. Stat. 11 4769–4812.
  • Levin, D. A., Peres, Y. and Wilmer, E. L. (2009). Markov Chains and Mixing Times. Amer. Math. Soc., Providence, RI.
  • Li, X. and Rohe, K. (2017). Central limit theorems for network driven sampling. Electron. J. Stat. 11 4871–4895.
  • Lu, X., Bengtsson, L., Britton, T., Camitz, M., Kim, B. J., Thorson, A. and Liljeros, F. (2012). The sensitivity of respondent-driven sampling. J. Roy. Statist. Soc. Ser. A 175 191–216.
  • McCreesh, N., Frost, S., Seeley, J., Katongole, J., Tarsh, M. N., Ndunguse, R., Jichi, F., Lunel, N. L., Maher, D., Johnston, L. G. et al. (2012). Evaluation of respondent-driven sampling. Epidemiology 23 138.
  • Roch, S. and Rohe, K. (2017). Generalized least squares can overcome the critical threshold in respondent-driven sampling. ArXiv Preprint ArXiv:1708.04999.
  • Rohe, K. (2019). Supplement to “A critical threshold for design effects in network sampling.” DOI:10.1214/18-AOS1700SUPP.
  • Rohe, K., Chatterjee, S. and Yu, B. (2011). Spectral clustering and the high-dimensional stochastic blockmodel. Ann. Statist. 39 1878–1915.
  • Salganik, M. J. (2006). Variance estimation, design effects, and sample size calculations for respondent-driven sampling. J. Urban Health 83 98–112.
  • Salganik, M. J. and Heckathorn, D. D. (2004). Sampling and estimation in hidden populations using respondent-driven sampling. Sociol. Method. 34 193–240.
  • Szwarcwald, C. L., de Souza Júnior, P. R. B., Damacena, G. N., Junior, A. B. and Kendall, C. (2011). Analysis of data collected by RDS among sex workers in 10 Brazilian cities, 2009: Estimation of the prevalence of HIV, variance, and design effect. JAIDS J. Acquir. Immune Defic. Syndr. 57 S129–S135.
  • Verdery, A. M., Mouw, T., Bauldry, S. and Mucha, P. J. (2015). Network structure and biased variance estimation in respondent driven sampling. PLoS ONE 10 e0145296.
  • Volz, E. and Heckathorn, D. D. (2008). Probability based estimation theory for respondent driven sampling. J. Off. Stat. 24 79.
  • von Luxburg, U. (2007). A tutorial on spectral clustering. Stat. Comput. 17 395–416.
  • White, R. G., Hakim, A. J., Salganik, M. J., Spiller, M. W., Johnston, L. G., Kerr, L., Kendall, C., Drake, A., Wilson, D., Orroth, K. et al. (2015). Strengthening the reporting of observational studies in epidemiology for respondent-driven sampling studies: STROBE-RDS statement. J. Clin. Epidemiol. 68 1463–1471.
  • World Health Organization and UNAIDS (2013). Introduction To HIV/AIDS And Sexually Transmitted Infection Surveillance Module 4: Introduction to Respondent-drive Sampling. World Health Organization & UNAIDS.

Supplemental materials

  • Supplement: Proofs for Sections 3 and 4. Due to space constraints, this supplement contains the proofs for the results in Sections 3 and 4. Moreover, it contains an addition computational experiment to study the widths of the bootstrap confidence intervals.