Open Access
December 2019 A general theory for preferential sampling in environmental networks
Joe Watson, James V. Zidek, Gavin Shaddick
Ann. Appl. Stat. 13(4): 2662-2700 (December 2019). DOI: 10.1214/19-AOAS1288


This paper presents a general model framework for detecting the preferential sampling of environmental monitors recording an environmental process across space and/or time. This is achieved by considering the joint distribution of an environmental process with a site-selection process that considers where and when sites are placed to measure the process. The environmental process may be spatial, temporal or spatio-temporal in nature. By sharing random effects between the two processes, the joint model is able to establish whether site placement was stochastically dependent of the environmental process under study. Furthermore, if stochastic dependence is identified between the two processes, then inferences about the probability distribution of the spatio-temporal process will change, as will predictions made of the process across space and time. The embedding into a spatio-temporal framework also allows for the modelling of the dynamic site-selection process itself. Real-world factors affecting both the size and location of the network can be easily modelled and quantified. Depending upon the choice of the population of locations considered for selection across space and time under the site-selection process, different insights about the precise nature of preferential sampling can be obtained. The general framework developed in the paper is designed to be easily and quickly fit using the R-INLA package. We apply this framework to a case study involving particulate air pollution over the UK where a major reduction in the size of a monitoring network through time occurred. It is demonstrated that a significant response-biased reduction in the air quality monitoring network occurred, namely the relocation of monitoring sites to locations with the highest pollution levels, and the routine removal of sites at locations with the lowest. We also show that the network was consistently unrepresenting levels of particulate matter seen across much of GB throughout the operating life of the network. Finally we show that this may have led to a severe overreporting of the population-average exposure levels experienced across GB. This could have great impacts on estimates of the health effects of black smoke levels.


Download Citation

Joe Watson. James V. Zidek. Gavin Shaddick. "A general theory for preferential sampling in environmental networks." Ann. Appl. Stat. 13 (4) 2662 - 2700, December 2019.


Received: 1 September 2018; Revised: 1 July 2019; Published: December 2019
First available in Project Euclid: 28 November 2019

zbMATH: 07160954
MathSciNet: MR4037445
Digital Object Identifier: 10.1214/19-AOAS1288

Keywords: Air pollution , big data , health effects , INLA , mobile monitors , preferential sampling , Random fields

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.13 • No. 4 • December 2019
Back to Top