## The Annals of Statistics

- Ann. Statist.
- Volume 47, Number 1 (2019), 556-582.

### A critical threshold for design effects in network sampling

#### Abstract

Web crawling, snowball sampling, and respondent-driven sampling (RDS) are three types of network sampling techniques used to contact individuals in hard-to-reach populations. This paper studies these procedures as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree (instead of a chain) allows for the sampled units to refer multiple future units into the sample.

In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is some value $\operatorname{DE}$, then constructing an estimator from the novel design makes the variance of the estimator $\operatorname{DE}$ times greater than it would be under a simple random sample with the same sample size $n$. Under certain assumptions on the referral tree, the design effect of network sampling has a critical threshold that is a function of the referral rate $m$ and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix, $\lambda_{2}$. If $m<1/\lambda_{2}^{2}$, then the design effect is finite (i.e., the standard estimator is $\sqrt{n}$-consistent). However, if $m>1/\lambda_{2}^{2}$, then the design effect grows with $n$ (i.e., the standard estimator is no longer $\sqrt{n}$-consistent). Past this critical threshold, the standard error of the estimator converges at the slower rate of $n^{\log_{m}\lambda_{2}}$. The Markov model allows for nodes to be resampled; computational results show that the findings hold in without-replacement sampling. To estimate confidence intervals that adapt to the correct level of uncertainty, a novel resampling procedure is proposed. Computational experiments compare this procedure to previous techniques.

#### Article information

**Source**

Ann. Statist., Volume 47, Number 1 (2019), 556-582.

**Dates**

Received: May 2017

Revised: February 2018

First available in Project Euclid: 30 November 2018

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1543568598

**Digital Object Identifier**

doi:10.1214/18-AOS1700

**Mathematical Reviews number (MathSciNet)**

MR3909942

**Zentralblatt MATH identifier**

07036211

**Subjects**

Primary: 62D99: None of the above, but in this section

Secondary: 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) [See also 90B30, 91D10, 91D35, 91E40]

**Keywords**

Stochastic blockmodel social network link-tracing Galton–Watson

#### Citation

Rohe, Karl. A critical threshold for design effects in network sampling. Ann. Statist. 47 (2019), no. 1, 556--582. doi:10.1214/18-AOS1700. https://projecteuclid.org/euclid.aos/1543568598

#### Supplemental materials

- Supplement: Proofs for Sections 3 and 4. Due to space constraints, this supplement contains the proofs for the results in Sections 3 and 4. Moreover, it contains an addition computational experiment to study the widths of the bootstrap confidence intervals.Digital Object Identifier: doi:10.1214/18-AOS1700SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.