## The Annals of Applied Statistics

- Ann. Appl. Stat.
- Volume 11, Number 1 (2017), 93-113.

### Covariate-adaptive clustering of exposures for air pollution epidemiology cohorts

Joshua P. Keller, Mathias Drton, Timothy Larson, Joel D. Kaufman, Dale P. Sandler, and Adam A. Szpiro

#### Abstract

Cohort studies in air pollution epidemiology aim to establish associations between health outcomes and air pollution exposures. Statistical analysis of such associations is complicated by the multivariate nature of the pollutant exposure data as well as the spatial misalignment that arises from the fact that exposure data are collected at regulatory monitoring network locations distinct from cohort locations. We present a novel clustering approach for addressing this challenge. Specifically, we present a method that uses geographic covariate information to cluster multi-pollutant observations and predict cluster membership at cohort locations. Our predictive $k$-means procedure identifies centers using a mixture model and is followed by multiclass spatial prediction. In simulations, we demonstrate that predictive $k$-means can reduce misclassification error by over 50% compared to ordinary $k$-means, with minimal loss in cluster representativeness. The improved prediction accuracy results in large gains of 30% or more in power for detecting effect modification by cluster in a simulated health analysis. In an analysis of the NIEHS Sister Study cohort using predictive $k$-means, we find that the association between systolic blood pressure (SBP) and long-term fine particulate matter (PM$_{2.5}$) exposure varies significantly between different clusters of PM$_{2.5}$ component profiles. Our cluster-based analysis shows that, for subjects assigned to a cluster located in the Midwestern U.S., a 10 $\mu$g/m$^{3}$ difference in exposure is associated with 4.37 mmHg (95% CI, 2.38, 6.35) higher SBP.

#### Article information

**Source**

Ann. Appl. Stat., Volume 11, Number 1 (2017), 93-113.

**Dates**

Received: December 2015

Revised: August 2016

First available in Project Euclid: 8 April 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.aoas/1491616873

**Digital Object Identifier**

doi:10.1214/16-AOAS992

**Mathematical Reviews number (MathSciNet)**

MR3634316

**Zentralblatt MATH identifier**

1366.62256

**Keywords**

Air pollution clustering dimension reduction particulate matter

#### Citation

Keller, Joshua P.; Drton, Mathias; Larson, Timothy; Kaufman, Joel D.; Sandler, Dale P.; Szpiro, Adam A. Covariate-adaptive clustering of exposures for air pollution epidemiology cohorts. Ann. Appl. Stat. 11 (2017), no. 1, 93--113. doi:10.1214/16-AOAS992. https://projecteuclid.org/euclid.aoas/1491616873

#### Supplemental materials

- Supplemental material for “Covariate-adaptive clustering of exposures for air pollution epidemiology cohorts”. The Supplemental Material document contains details of the algorithm for selecting predictive $k$-means cluster centers, additional results from the simulations, sensitivity results from the PM$_{2.5}$ analysis that use different numbers of clusters, and the results from applying $k$-means clustering to the PM$_{2.5}$ data.Digital Object Identifier: doi:10.1214/16-AOAS992SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.