Statistical Science

Inference for Nonprobability Samples

Michael R. Elliott and Richard Valliant

Although selecting a probability sample has been the standard for decades when making inferences from a sample to a finite population, incentives are increasing to use nonprobability samples. In a world of “big data”, large amounts of data are available that are faster and easier to collect than are probability samples. Design-based inference, in which the distribution for inference is generated by the random mechanism used by the sampler, cannot be used for nonprobability samples. One alternative is quasi-randomization in which pseudo-inclusion probabilities are estimated based on covariates available for samples and nonsample units. Another is superpopulation modeling for the analytic variables collected on the sample units in which the model is used to predict values for the nonsample units. We discuss the pros and cons of each approach.

Article information

Statist. Sci., Volume 32, Number 2 (2017), 249-264.

First available in Project Euclid: 11 May 2017

Coverage error hierarchical regression quasi-randomization reference sample selection bias superpopulation model


Elliott, Michael R.; Valliant, Richard. Inference for Nonprobability Samples. Statist. Sci. 32 (2017), no. 2, 249--264. doi:10.1214/16-STS598.

