The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 2, Number 3 (2008), 1103-1122.
A general formulation for standardization of rates as a method to control confounding by measured and unmeasured disease risk factors
Standardization, a common approach for controlling confounding in population-studies or data from disease registries, is defined to be a weighted average of stratum specific rates. Typically, discussions on the construction of a particular standardized rate regard the strata as fixed, and focus on the considerations that affect the specification of weights. Each year the data from the SEER cancer registries are analyzed using a weighting procedure referred to as “direct standardization for age.” To evaluate the performance of direct standardization, we define a general class of standardization operators. We regard a particular standardized rate to be the output of an operator and a given data set. Based on the functional form of the operators, we define a subclass of standardization operators that controls for confounding by measured risk factors. Using the fundamental disease probability paradigm for inference, we establish the conclusions that can be drawn from year-to-year contrasts of standardized rates produced by these operators in the presence of unmeasured cancer risk factors. These conclusions take the form of falsifying specific assumptions about the conditional probabilities of disease given all the risk factors (both measured and unmeasured), and the conditional probabilities of the unmeasured risk factors given the measured risk factors. We show the one-to-one correspondence between these falsifications and the inferences made from the contrasts of directly standardized rates reported each year in the Annual Report to the Nation on the Status of Cancer. We further show that the “direct standardization for age” procedure is not a member of the class of unconfounded standardization operators. Consequently, it can, and usually will, introduce confounding when confounding is not present in the data. We propose a particular standardization operator, the SCC operator, that is in the class of unconfounded operators. We contrast the mathematical properties of the SCC and the SEER operator (SCA), and present an analysis of SEER cancer registry data that demonstrates the consequences of these differences. We further prove that the SCC operator is a projection operator. We discuss how this property can enable the SCC operator to be developed as a method for comparing nested conditional expectations in the same manner as is currently done with regression methods that control for confounding.
Ann. Appl. Stat. Volume 2, Number 3 (2008), 1103-1122.
First available in Project Euclid: 13 October 2008
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Mark, Steven D. A general formulation for standardization of rates as a method to control confounding by measured and unmeasured disease risk factors. Ann. Appl. Stat. 2 (2008), no. 3, 1103--1122. doi:10.1214/08-AOAS170. https://projecteuclid.org/euclid.aoas/1223908054
- Supplementary material: Fundamental disease probability inference: A new paradigm for causal inferencein the biological sciences.