In this article we discuss statistical techniques for modeling data from cohort studies that examine long-term effects of air pollution on children’s health by comparing data from multiple communities with a diverse pollution profile. Under a general multilevel modeling paradigm, we discuss models for different outcome types along with their connections to the generalized mixed effects models methodology. The model specifications include linear and flexible models for continuous lung function data, logistic and/or time-to-event models for symptoms data that account for misspecifications via hidden Markov models and Poisson models for school absence counts. The main aim of the modeling scheme is to be able to estimate effects at various levels (e.g., within subjects across time, within communities across subjects and between communities). We also discuss in detail various recurring issues such as ecologic bias, exposure measurement error, multicollinearity in multipollutant models, interrelationships between major endpoints and choice of appropriate exposure metrics. The key conceptual issues and recent methodologic advances are reviewed, with illustrative results from the Southern California Children’s Health Study, a 10-year study of the effects of air pollution on children’s respiratory health.
"Statistical Issues in Studies of the Long-Term Effects of Air Pollution: The Southern California Children’s Health Study." Statist. Sci. 19 (3) 414 - 449, August 2004. https://doi.org/10.1214/088342304000000413