Statistical Science

Statistical Issues in Studies of the Long-Term Effects of Air Pollution: The Southern California Children’s Health Study

Kiros Berhane, W. James Gauderman, Daniel O. Stram, and Duncan C. Thomas

Full-text: Open access


In this article we discuss statistical techniques for modeling data from cohort studies that examine long-term effects of air pollution on children’s health by comparing data from multiple communities with a diverse pollution profile. Under a general multilevel modeling paradigm, we discuss models for different outcome types along with their connections to the generalized mixed effects models methodology. The model specifications include linear and flexible models for continuous lung function data, logistic and/or time-to-event models for symptoms data that account for misspecifications via hidden Markov models and Poisson models for school absence counts. The main aim of the modeling scheme is to be able to estimate effects at various levels (e.g., within subjects across time, within communities across subjects and between communities). We also discuss in detail various recurring issues such as ecologic bias, exposure measurement error, multicollinearity in multipollutant models, interrelationships between major endpoints and choice of appropriate exposure metrics. The key conceptual issues and recent methodologic advances are reviewed, with illustrative results from the Southern California Children’s Health Study, a 10-year study of the effects of air pollution on children’s respiratory health.

Article information

Statist. Sci. Volume 19, Number 3 (2004), 414-449.

First available: 16 March 2005

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier


Berhane, Kiros; Gauderman, W. James; Stram, Daniel O.; Thomas, Duncan C. Statistical Issues in Studies of the Long-Term Effects of Air Pollution: The Southern California Children’s Health Study. Statistical Science 19 (2004), no. 3, 414--449. doi:10.1214/088342304000000413.

Export citation


  • Abbey, D. E., Nishino, N., McDonnell, W. F., Burchette, R. J., Knutsen, S. F., Beeson, W. L. and Yang, J. X. (1999). Long-term inhalable particles and other air pollutants related to mortality in nonsmokers. American J. Respiratory and Critical Care Medicine 159 373--382.
  • Amemiya, T. (1985). Advanced Econometrics. Blackwell, Oxford.
  • Avol, E. L., Gauderman, W. J., Tan, S. M., London, S. J. and Peters, J. M. (2001). Respiratory effects of relocating to areas of differing air pollution levels. American J. Respiratory and Critical Care Medicine 164 2067--2072.
  • Avol, E. L., Navidi, W. C. and Colome, S. D. (1998). Modeling ozone levels in and around Southern California homes. Environmental Science and Technology 32 463--468.
  • Bartholomew, D. (1987). Latent Variable Models and Factor Analysis. Oxford Univ. Press.
  • Benson, P. (1989). CALINE4---A dispersion model for predicting air pollution concentration near roadways. Report, Office of Transportation Laboratory, California Department of Transportation, Sacramento, CA.
  • Berhane, K., McConnell, R., Gilliland, F., Islam, T., Gauderman, W. J., Avol, E., London, S. J., Rappaport, E., Margolis, H. G. and Peters, J. M. (2000). Sex-specific effects of asthma on pulmonary function in children. American J. Respiratory and Critical Care Medicine 162 1723--1730.
  • Berhane, K. and Thomas, D. (2002). A two-stage model for multiple time series data of counts. Biostatistics 3 21--32.
  • Brenner, H., Savitz, D. A., Jöckel, K.-H. and Greenland, S. (1992). Effects of nondifferential exposure misclassification in ecologic studies. American J. Epidemiology 135 85--95.
  • Breslow, N. E. and Clayton, D. G. (1993). Approximate inference in generalized linear mixed models. J. Amer. Statist. Assoc. 88 9--25.
  • Breslow, N. E. and Lin, X. (1995). Bias correction in generalised linear mixed models with a single component of dispersion. Biometrika 82 81--91.
  • Brumback, B., Ryan, L., Schwartz, J., Neas, L., Stark, P. and Burge, H. (2000). Transitional regression models, with application to environmental time series. J. Amer. Statist. Assoc. 95 16--27.
  • Bryk, A., Raudenbash, S. and Congdon, R. (1996). HLM: Hierarchical Linear and Nonlinear Modeling with the HLM/2L and HLM/3L Programs. Scientific Software International, Chicago.
  • Burke, J. M., Zufall, M. J. and Özkaynak, H. (2001). A population exposure model for particulate matter: Case study results for PM$_2.5$ in Philadelphia, PA. J. Exposure Analysis and Environmental Epidemiology 11 470--489.
  • Burnett, R., Ma, R., Jerrett, M., Goldberg, M. S., Cakmak, S., Pope, C. A. and Krewski, D. (2001). The spatial association between community air pollution and mortality: A new method of analyzing correlated geographic cohort data. Environmental Health Perspectives 109(suppl.) 375--380.
  • Carroll, R. J., Ruppert, D. and Stefanski, L. A. (1995). Measurement Error in Nonlinear Models. Chapman and Hall, London.
  • Catalano, P. J. and Ryan, L. M. (1992). Bivariate latent variable models for clustered discrete and continuous outcomes. J. Amer. Statist. Assoc. 87 651--658.
  • Chipman, H. (1996). Bayesian variable selection with related predictors. Canad. J. Statist. 24 17--36.
  • Cohen, B. L. (1990). Ecological versus case-control studies for testing a linear-no threshold dose--response relationship. International J. Epidemiology 19 680--684.
  • Cox, D. R. (1972). Regression models and life tables (with discussion). J. Roy. Statist. Soc. Ser. B 34 187--220.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data, revised ed. Wiley, New York.
  • Darby, S., Deo, H., Doll, R. and Whitley, E. (2001). A parallel analysis of individual and ecological data on residential radon and lung cancer in south-west England. J. Roy. Statist. Soc. Ser. A 164 193--203.
  • de Boor, C. (1978). A Practical Guide to Splines. Springer, New York.
  • Dempster, A. P., Rubin, D. B. and Tsutakawa, R. K. (1981). Estimation in covariance components models. J. Amer. Statist. Assoc. 76 341--353.
  • Diggle, P. J., Liang, K.-Y. and Zeger, S. L. (1994). Analysis of Longitudinal Data. Oxford Univ. Press.
  • Dockery, D. W., Pope, C. A., Xu, X., Spengler, J. D., Ware, J. H., Fay, M. E., Ferris, B. G. and Speizer, F. E. (1993). An association between air pollution and mortality in six U.S. cities. New England J. Medicine 329 1753--1759.
  • Dockery, D. W., Speizer, F. E., Stram, D. O., Ware, J. H., Spengler, J. D. and Ferris, B. G. (1989). Effects of inhalable particles on respiratory health of children. American Review of Respiratory Disease 139 587--594.
  • Dominici, F., Samet, J. and Zeger, S. L. (2000). Combining evidence on air pollution and daily mortality from the largest 20 US cities: A hierarchical modeling strategy (with discussion). J. Roy. Statist. Soc. Ser. A 163 263--302.
  • Duan, N. (1982). Models for human exposure to air pollution. Environment International 8 305--309.
  • Duan, N. (1991). Stochastic microenvironment models for air pollution exposure. J. Exposure Analysis and Environmental Epidemiology 1 235--257.
  • Ferris, B. G., Speizer, F. E., Spengler, J. D., Dockery, D., Bishop, Y. M., Wolfson, M. and Humble, C. (1979). Effects of sulfur oxides and respirable particles on human health: Methodology and demography of populations in study. American Review of Respiratory Disease 120 767--779.
  • Firebaugh, G. (1978). A rule for inferring individual-level relationships from aggregate data. American Sociological Review 43 557--572.
  • Fitzmaurice, G. M. and Laird, N. M. (1995). Regression models for a bivariate discrete and continuous outcome with clustering. J. Amer. Statist. Assoc. 90 845--852.
  • Frischer, T., Studnicka, M., Gartner, C., Tauber, E., Horak, F., Veiter, A., Spengler, J., Kühr, J. and Urbanek, R. (1999). Lung function growth and ambient ozone: A three-year population study in school children. American J. Respiratory and Critical Care Medicine 160 390--396.
  • Fuller, W. (1987). Measurement Error Models. Wiley, New York.
  • Gauderman, W. J., McConnell, R., Gilliland, F., London, S., Thomas, D., Avol, E., Vora, H., Berhane, K., Rappaport, E. B., Lurmann, F., Margolis, H. G. and Peters, J. M. (2000). Association between air pollution and lung function growth in Southern California children. American J. Respiratory and Critical Care Medicine 162 1383--1390.
  • George, E. I. and Foster, D. P. (2000). Calibration and empirical Bayes variable selection. Biometrika 87 731--747.
  • Gilks, W., Richardson, S. and Spiegelhalter, D., eds. (1996). Markov Chain Monte Carlo in Practice. Chapman and Hall, London.
  • Gong, H., Jr., Wong, R., Sarma, R., Linn, W. S., Sullivan, E. D., Shamoo, D. A., Anderson, K. R. and Prasad, S. B. (1998). Cardiovascular effects of ozone exposure in human volunteers. American J. Respiratory and Critical Care Medicine 158 538--546.
  • Greenland, S. (2001). Ecologic versus individual-level sources of bias in ecologic estimates of contextual health effects. International J. Epidemiology 30 1343--1350.
  • Greenland, S. (2002). A review of multilevel theory for ecologic analyses. Statistics in Medicine 21 389--395.
  • Greenland, S. and Brenner, H. (1993). Correcting for non-differential misclassification in ecological analyses. Appl. Statist. 42 117--126.
  • Greenland, S. and Morgenstern, H. (1989). Ecological bias, confounding, and effect modification. International J. Epidemiology 18 269--274.
  • Greenland, S. and Robins, J. (1994). Ecologic studies---biases, misconceptions, and counterexamples (with discussion). American J. Epidemiology 139 747--771.
  • Harville, D. A. (1977). Maximum likelihood approaches to variance component estimation and to related problems (with discussion). J. Amer. Statist. Assoc. 72 320--340.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, London.
  • Hastie, T. and Tibshirani, R. (2000). Bayesian backfitting (with discussion). Statist. Sci. 15 196--223.
  • Heagerty, P. and Zeger, S. L. (2000). Marginalized multilevel models and likelihood inference (with discussion). Statist. Sci. 15 1--26.
  • Johnson, T., Long, T. and Ollison, W. (2000). Prediction of hourly microenvironmental concentrations of fine particles based on measurements obtained from the Baltimore scripted activity study. J. Exposure Analysis and Environmental Epidemiology 10 403--411.
  • Kass, R. and Raftery, A. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773--795.
  • Kelsall, J. E., Zeger, S. L. and Samet, J. M. (1999). Frequency domain log-linear models; air pollution and mortality. Appl. Statist. 48 331--344.
  • Krewski, D., Burnett, R., Goldberg, M., Hoover, K., Siemiatycki, J., Jerrett, M., Abrahamowicz, M. and White, W. (2003). Overview of the reanalysis of the Harvard six-cities study and the American Cancer Society study of particulate air pollution and mortality. J. Toxicology and Environmental Health 66 1507--1551.
  • Künzli, N. and Tager, I. B. (1997). The semi-individual study in air pollution epidemiology: A valid design as compared to ecologic studies. Environmental Health Perspectives 105 1078--1083.
  • Laird, N. M. and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38 963--974.
  • Lin, X. and Breslow, N. E. (1996). Bias correction in generalized linear mixed models with multiple components of dispersion. J. Amer. Statist. Assoc. 91 1007--1016.
  • Lin, X. and Zhang, D. (1999). Inference in generalized additive mixed models by using smoothing splines. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 381--400.
  • Lioy, P. J., Freeman, N. C., Wainman, T., Stern, A. H., Boesch, R., Howell, T. and Shupack, S. I. (1992). Microenvironmental analysis of residential exposure to chromium-laden wastes in and around New Jersey homes. Risk Analysis 12 287--299.
  • Littell, R. C., Milliken, G. A., Stroup, W. W. and Wolfinger, R. D. (1996). SAS System for Mixed Models. SAS Institute, Inc., Cary, NC.
  • Lubin, J. H. (1998). On the discrepancy between epidemiologic studies in individuals of lung cancer and residential radon and Cohen's ecologic regression. Health Physics 75 4--10.
  • Ma, R., Krewski, D. and Burnett, R. (2000). Random effects Cox models: A Poisson modeling approach. Report, Laboratory for Research in Statistics and Probability, Univ. Ottawa.
  • MacDonald, I. L. and Zucchini, W. (1997). Hidden Markov and Other Models for Discrete-Valued Time Series. Chapman and Hall, London.
  • McConnell, R., Berhane, K., Gilliland, F., London, S. J., Islam, T., Gauderman, W. J., Avol, E., Margolis, H. G. and Peters, J. M. (2002). Asthma in exercising children exposed to ozone: A cohort study. Lancet 359 386--391.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman and Hall, London.
  • McCurdy, T. (1995). Estimating human exposure to selected motor vehicle pollutants using the NEM series of models: Lessons to be learned. J. Exposure Analysis and Environmental Epidemiology 5 533--550.
  • Morgenstern, H. (1982). Uses of ecologic analysis in epidemiologic research. American J. Public Health 72 1336--1344.
  • Morgenstern, H. (1995). Ecologic studies in epidemiology: Concepts, principles, and methods. Annual Review Public Health 16 61--81.
  • Navidi, W. and Lurmann, F. (1995). Measurement error in air pollution exposure assessment. J. Exposure Analysis and Environmental Epidemiology 5 111--124.
  • Navidi, W., Thomas, D., Stram, D. and Peters, J. M. (1994). Design and analysis of multilevel analytic studies with applications to a study of air pollution. Environmental Health Perspectives 102 25--32.
  • Pearce, N. (2000). The ecological fallacy strikes back. J. Epidemiology and Community Health 54 326--327.
  • Pearson, R. L., Wachtel, H. and Ebi, K. L. (2000). Distance-weighted traffic density in proximity to a home is a risk factor for leukemia and other childhood cancers. J. Air and Waste Management Association 50 175--180.
  • Peters, J. M., Avol, E., Navidi, W., London, S. J., Gauderman, W. J., Lurmann, F., Linn, W. S., Margolis, H., Rappaport, E., Gong, H., Jr. and Thomas, D. (1999a). A study of twelve southern California communities with differing levels and types of air pollution. I. Prevalence of respiratory morbidity. American J. Respiratory and Critical Care Medicine 159 760--767.
  • Peters, J. M., Avol, E., Gauderman, W. J., Linn, W. S., Navidi, W., London, S. J., Margolis, H., Rappaport, E., Vora, H., Gong, H., Jr. and Thomas, D. (1999b). A study of twelve southern California communities with differing levels and types of air pollution. II. Effects on pulmonary function. American J. Respiratory and Critical Care Medicine 159 768--775.
  • Pinheiro, J. C. and Bates, D. M. (2000). Mixed-Effects Models in S and S-Plus. Springer, New York.
  • Pope, C. A., Thun, M. J., Namboodiri, M. M., Dockery, D., Evans, J., Speizer, F. and Heath, C. M., Jr. (1995). Particulate air pollution as a predictor of mortality in a prospective study of U.S. adults. American J. Respiratory and Critical Care Medicine 151 669--674.
  • Prentice, R. L. and Sheppard, L. (1995). Aggregate data studies of disease risk factors. Biometrika 82 113--125.
  • Raftery, A. E., Madigan, D. and Hoeting, J. A. (1997). Bayesian model averaging for linear regression models. J. Amer. Statist. Assoc. 92 179--191.
  • Raizenne, M., Neas, L. M., Damokosh, A. I., Dockery, D. W., Spengler, J. D., Koutrakis, P., Ware, J. H. and Speizer, F. E. (1996). Health effects of acid aerosols on North American children: Pulmonary function. Environmental Health Perspectives 104 506--514.
  • Ramsay, J. O. and Silverman, B. W. (1997). Functional Data Analysis. Springer, New York.
  • Ransom, M. R. and Pope, C. A. (1992). Elementary school absences and PM$_10$ pollution in Utah Valley. Environmental Research 58 204--219.
  • Rasbash, J. and Woodhouse, G. (1995). MLn command reference, Version 1.0. Multilevel Models Project, Institute of Education, Univ. London.
  • Reinsel, G. (1984). Estimation and prediction in a multivariate random effects generalized linear model. J. Amer. Statist. Assoc. 79 406--414.
  • Rijnders, E., Janssen, N. A. H., van Vliet, P. H. N. and Brunekreef, B. (2001). Personal and outdoor nitrogen dioxide concentrations in relation to degree of urbanization and traffic density. Environmental Health Perspectives 109(suppl.) 411--417.
  • Robinson, W. S. (1950). Ecological correlations and the behavior of individuals. American Sociological Review 15 351--357.
  • Samet, J., Zeger, S. and Berhane, K. (1995). The association of mortality and particulate air pollution. Phase I Report, Particulate Epidemiology Evaluation Project. Health Effects Institute, Boston, 1--104.
  • Schwartz, J. (1994). Air pollution and daily mortality: A review and meta-analysis. Environmental Research 64 36--52.
  • Selvin, H. C. (1958). Durkheim's suicide and problems of empirical research. American J. Sociology 63 607--619.
  • Shah, A., Laird, N. and Schoenfeld, D. (1997). A random-effects model for multiple characteristics with possibly missing data. J. Amer. Statist. Assoc. 92 775--779.
  • Sheppard, L. (2003). Insights on bias and information in group-level studies. Biostatistics 4 265--278.
  • Stidley, C. A. and Samet, J. M. (1994). Assessment of ecologic regression in the study of lung cancer and indoor radon. American J. Epidemiology 139 312--322.
  • Stram, D. O. (1996). Meta-analysis of published data using a linear mixed-effects model. Biometrics 52 536--544.
  • Stram, D. O. (2002). Shared uncertainty. In Letter Report: Review of a Research Protocol Prepared by the University of Utah. Board on Radiation Effects Research, National Academy of Sciences, Washington.
  • Thomas, D. C. (1988). Models for exposure-time--response relationships with applications to cancer epidemiology. Annual Review of Public Health 9 451--482.
  • Thomas, D. C. (2000). Some contributions of statistics in environmental epidemiology. J. Amer. Statist. Assoc. 95 315--319.
  • Thomas, D. C., Stram, D. and Dwyer, J. (1993). Exposure measurement error: Influence on exposure--disease relationships and methods of correction. Annual Review of Public Health 14 69--93.
  • U.S. EPA (2002). Users Guide to MOBILE6 Emission Factor Model. Environmental Protection Agency, U.S. GPO, Washington.
  • Wakefield, J. and Elliott, P. (1999). Issues in the statistical analysis of small area health data. Statistics in Medicine 18 2377--2399.
  • Wakefield, J. and Salway, R. (2001). A statistical framework for ecological and aggregate studies. J. Roy. Statist. Soc. Ser. A 164 119--137.
  • Ware, J. H. and Stram, D. O. (1988). Statistical issues in epidemiologic studies of the health effects of ambient acid aerosols. Canad. J. Statist. 16 5--13.
  • Wypij, D., Pugh, M. and Ware, J. H. (1993). Modeling pulmonary function growth with regression splines. Statist. Sinica 3 329--350.
  • Zanobetti, A., Wand, M., Schwartz, J. and Ryan, L. (2000). Generalized additive distributed lag models: Quantifying mortality displacement. Biostatistics 1 279--292.
  • Zeger, S. L. (1988). A regression model for time series of counts. Biometrika 75 621--629.
  • Zeger, S. L., Thomas, D., Dominici, F., Samet, J. M., Schwartz, J., Dockery, D. W. and Cohen, A. (2000). Exposure measurement error in time-series studies of air pollution: Concepts and consequences. Environmental Health Perspectives 108 419--426.
  • Zidek, J. V., Wong, H., Le, N. D. and Burnett, R. (1996). Causality, measurement error and multicollinearity in epidemiology. Environmetrics 7 441--451.