The Annals of Applied Statistics

A general formulation for standardization of rates as a method to control confounding by measured and unmeasured disease risk factors

Steven D. Mark

Full-text: Open access

Abstract

Standardization, a common approach for controlling confounding in population-studies or data from disease registries, is defined to be a weighted average of stratum specific rates. Typically, discussions on the construction of a particular standardized rate regard the strata as fixed, and focus on the considerations that affect the specification of weights. Each year the data from the SEER cancer registries are analyzed using a weighting procedure referred to as “direct standardization for age.” To evaluate the performance of direct standardization, we define a general class of standardization operators. We regard a particular standardized rate to be the output of an operator and a given data set. Based on the functional form of the operators, we define a subclass of standardization operators that controls for confounding by measured risk factors. Using the fundamental disease probability paradigm for inference, we establish the conclusions that can be drawn from year-to-year contrasts of standardized rates produced by these operators in the presence of unmeasured cancer risk factors. These conclusions take the form of falsifying specific assumptions about the conditional probabilities of disease given all the risk factors (both measured and unmeasured), and the conditional probabilities of the unmeasured risk factors given the measured risk factors. We show the one-to-one correspondence between these falsifications and the inferences made from the contrasts of directly standardized rates reported each year in the Annual Report to the Nation on the Status of Cancer. We further show that the “direct standardization for age” procedure is not a member of the class of unconfounded standardization operators. Consequently, it can, and usually will, introduce confounding when confounding is not present in the data. We propose a particular standardization operator, the SCC operator, that is in the class of unconfounded operators. We contrast the mathematical properties of the SCC and the SEER operator (SCA), and present an analysis of SEER cancer registry data that demonstrates the consequences of these differences. We further prove that the SCC operator is a projection operator. We discuss how this property can enable the SCC operator to be developed as a method for comparing nested conditional expectations in the same manner as is currently done with regression methods that control for confounding.

Article information

Source
Ann. Appl. Stat. Volume 2, Number 3 (2008), 1103-1122.

Dates
First available in Project Euclid: 13 October 2008

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1223908054

Digital Object Identifier
doi:10.1214/08-AOAS170

Mathematical Reviews number (MathSciNet)
MR2522173

Zentralblatt MATH identifier
05377261

Keywords
Cancer registry cancer trends causal inference confounding direct standardization fundamental disease probability SEER standardization

Citation

Mark, Steven D. A general formulation for standardization of rates as a method to control confounding by measured and unmeasured disease risk factors. Ann. Appl. Stat. 2 (2008), no. 3, 1103--1122. doi:10.1214/08-AOAS170. https://projecteuclid.org/euclid.aoas/1223908054


Export citation

References

  • Anderson, R. N. and Rosenberg, H. M. (1998). Age standardization of death rates: Implementation of the year 2000 standard. National Vital Statistics Reports 47.
  • Dudley, R. M. (1989). Real Analysis and Probability, 1st ed. Wadsworth, Pacific Grove, CA.
  • Edwards, B. K., Howe, H. L., Ries, L. A. G. et al. (2002). Annual report to the nation on the status of cancer, 1973–1999, featuring implications of age and aging on U.S. cancer burden. Cancer 94 2766–2792.
  • Edwards, B. K., Brown, M. L., Wingo, P. A. et al. (2005). Annual report to the nation on the status of cancer, 1975–2002, featuring population-based trends in cancer treatment. J. National Cancer Institute 97 1407–1427.
  • Giovannucci, E. (2002). Epidemiologic studies of folate and colorectal neoplasia: A review. J. Nutrition 132 2350S–2355S.
  • Howe, H. L., Wingo, P. A., Thun, M. J. et al. (2001). Annual report to the nation on the status of cancer (1973 through 1998), featuring cancers with recent increasing trends. J. National Cancer Institute 93 824–842.
  • Howe, H. L., Wu, X., Ries, L. A. G. et al. (2006). Annual report to the nation on the status of cancer, 1975–2003, featuring cancer among U.S. Hispanic/Latino populations. Cancer 107 1711–1742.
  • Jemal, A., Clegg, L. X., Ward, E. et al. (2004). Annual report to the nation on the status of cancer, 1975–2001, with a special feature regarding survival. Cancer 101 3–27.
  • Klein, R. J. and Schoenborn, C. A. (2001). Age adjustment procedures using the 2000 projected U.S. population. Healthy People Statistical Notes 20. National Center for Health Statistics, Hyattsville, MD.
  • Mark, S. D. (2004). A formal approach for defining and identifying the fundamental effects of exposures on disease from a series of experiments conducted on populations of non-identical subjects. In Proce. Amer. Statist. Assoc. 3120–3142. American Statistical Association, Alexandria, VA.
  • Mark, S. D. (2005). Using V-range maps to locate exposure regions where observable contrasts identify the effects of exposure on contrasts of fundamental disease probabilities. In Proc. Amer. Statist. Assoc. 299–305. American Statistical Association, Alexandria, VA.
  • Mark, S. D. (2006). Fundamental disease probability inference: A new paradigm for causal inference in the biological sciences. In Proc. American Statist. Assoc. 283–290. American Statistical Association, Alexandria, VA.
  • Mark, S. D. (2008). Supplement to “A general formulation for standardization of rates as a method to control confounding by measured and unmeasured disease risk factors.” DOI: 10.1214/08-AOAS170SUPP.
  • Quinlivan, E. P. and Gregory, J. F. III. (2003). Effect of food fortification on folic acid intake in the United States. American J. Clinical Nutrition 77 221–225.
  • R Development Core Team. (2007). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria.
  • Ries, L. A. G., Eisner, M. P. and Kosary, C. L. (2005). SEER Cancer Statistics Review, 1975–2002. National Cancer Institute, Bethesda, MD.
  • Rothman, K. J. (1986). Modern Epidemiology, 1st ed. Little, Brown, and Company, New York.
  • SEER, Surveillance, Epidemiology and End Results Program (2005a). NIH Publication No. 05-4772, National Cancer Institute.
  • SEER, Surveillance, Epidemiology and End Results Program (2005b). SEER 13 Regs Limited_USE, Nov. 2005 Sub (1992–2003). National Cancer Institute, DCCPS, Surveillance Research Program, Cancer Statistics Branch, Washington, D.C.
  • The Surveillence Research Program of the Division of Cancer Control and Population Sciences (2007). SEER*Stat.6.3.3. National Cancer Institute.
  • Ward, E. M., Thun, M. J. and Hanna, L. M. et al. (2006). Interpreting cancer trends. Ann. New York Acad. Sci. 1076 29–53.
  • Weir, H. K., Thun, M. J., Hankey, B. F. et al. (2003). Annual report to the nation on the status of cancer, 1975–2000, featuring the uses of surveillance data for cancer prevention and control. J. National Cancer Institute 95 1276–1299.

Supplemental materials