Abstract
Small area population counts are necessary for many epidemiological studies, yet their quality and accuracy are often not assessed. In the United States, small area population counts are published by the United States Census Bureau (USCB) in the form of the decennial census counts, intercensal population projections (PEP), and American Community Survey (ACS) estimates. Although there are significant relationships between these three data sources, there are important contrasts in data collection, data availability, and processing methodologies such that each set of reported population counts may be subject to different sources and magnitudes of error. Additionally, these data sources do not report identical small area population counts due to post-survey adjustments specific to each data source. Consequently, in public health studies, small area disease/mortality rates may differ depending on which data source is used for denominator data. To accurately estimate annual small area population counts and their associated uncertainties, we present a Bayesian population (BPop) model, which fuses information from all three USCB sources, accounting for data source specific methodologies and associated errors. We produce comprehensive small area race-stratified estimates of the true population, and associated uncertainties, given the observed trends in all three USCB population estimates. The main features of our framework are: (1) a single model integrating multiple data sources, (2) accounting for data source specific data generating mechanisms and specifically accounting for data source specific errors, and (3) prediction of population counts for years without USCB reported data. We focus our study on the Black and White only populations for 159 counties of Georgia and produce estimates for years 2006–2023. We compare BPop population estimates to decennial census counts, PEP annual counts, and ACS multi-year estimates. Additionally, we illustrate and explain the different types of data source specific errors. Lastly, we compare model performance using simulations and validation exercises. Our Bayesian population model can be extended to other applications at smaller spatial granularity and for demographic subpopulations defined further by race, age, and sex, and/or for other geographical regions.
Funding Statement
This work, led by Lance A. Waller, was funded by the US National Institutes for Health (NIH) (#RO1HD092580). FBP acknowledges support from Health Data Research UK(HDR UK) and the UK National Institute for Health Research (NIHR) Imperial Biomedical Research Centre. FBP is a member of the NIHR Health Protection Research Units in Chemical and Radiation Threats and Hazards, and in Environmental Exposures and Health, which are partnerships between UK Health Safety Agency and Imperial College London funded by the UK NIHR.
Acknowledgments
The authors are very grateful to all the members of the Spatial Uncertainty Research Team, from the Emory Rollins School of Public Health, Harvard T.H. Chan School of Public Health, and the U.K. Small Area Health Statistics Unit (SAHSU) at Imperial College London’s School of Public Health for the support of this work.
Citation
Emily N. Peterson. Rachel C. Nethery. Tullia Padellini. Jarvis T. Chen. Brent A. Coull. Frédéric B. Piel. Jon Wakefield. Marta Blangiardo. Lance A. Waller. "A Bayesian hierarchical small area population model accounting for data source specific methodologies from American Community Survey, Population Estimates Program, and Decennial census data." Ann. Appl. Stat. 18 (2) 1565 - 1595, June 2024. https://doi.org/10.1214/23-AOAS1849
Information