Statistical Science

Relaxation Penalties and Priors for Plausible Modeling of Nonidentified Bias Sources

Sander Greenland

Full-text: Open access


In designed experiments and surveys, known laws or design features provide checks on the most relevant aspects of a model and identify the target parameters. In contrast, in most observational studies in the health and social sciences, the primary study data do not identify and may not even bound target parameters. Discrepancies between target and analogous identified parameters (biases) are then of paramount concern, which forces a major shift in modeling strategies. Conventional approaches are based on conditional testing of equality constraints, which correspond to implausible point-mass priors. When these constraints are not identified by available data, however, no such testing is possible. In response, implausible constraints can be relaxed into penalty functions derived from plausible prior distributions. The resulting models can be fit within familiar full or partial likelihood frameworks.

The absence of identification renders all analyses part of a sensitivity analysis. In this view, results from single models are merely examples of what might be plausibly inferred. Nonetheless, just one plausible inference may suffice to demonstrate inherent limitations of the data. Points are illustrated with misclassified data from a study of sudden infant death syndrome. Extensions to confounding, selection bias and more complex data structures are outlined.

Article information

Statist. Sci. Volume 24, Number 2 (2009), 195-210.

First available in Project Euclid: 14 January 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bias biostatistics causality epidemiology measurement error misclassification observational studies odds ratio relative risk risk analysis risk assessment selection bias validation


Greenland, Sander. Relaxation Penalties and Priors for Plausible Modeling of Nonidentified Bias Sources. Statist. Sci. 24 (2009), no. 2, 195--210. doi:10.1214/09-STS291.

Export citation


  • Baker, S. G. (1996). The analysis of categorical case-control data subject to nonignorable nonresponse. Biometrics 52 362–369.
  • Bedrick, E. J., Christensen, R. and Johnson, W. (1996). A new perspective on generalized linear models. J. Amer. Statist. Assoc. 91 1450–1460.
  • Bishop, Y. M. M., Fienberg, S. E. and Holland, P. W. (1975). Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA.
  • Box, G. E. P. (1980). Sampling and Bayes inference in scientific modeling and robustness. J. Roy. Statist. Soc. Ser. A 143 383–430.
  • Bross, I. D. J. (1967). Pertinency of an extraneous variable. Journal of Chronic Diseases 20 487–495.
  • Brumback, B. A., Hernan, M. A., Haneuse, S. and Robins, J. M. (2004). Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Statist. Med. 23 749–767.
  • Bull, S. B., Lewinger, J. B. and Lee, S. S. F. (2007). Confidence intervals for multinomial logistic regression in sparse data. Statist. Med. 26 903–918.
  • Carroll, R. J., Ruppert, D., Stefanski, L. A. and Crainiceanu, C. (2006). Measurement Error in Nonlinear Models, 2nd ed. Chapman and Hall, Boca Raton, FL.
  • Copas, J. B. (1999). What works? Selectivity models and meta-analysis. J. R. Stat. Soc. Ser. B 162 95–109.
  • Cox, D. R. (1975). A note on partially Bayes inference and the linear model. Biometrika 62 651–654.
  • Deely, J. J. and Lindley, D. V. (1981). Bayes empirical Bayes. J. Amer. Statist. Assoc. 76 833–841.
  • Drews, C., Kraus, J. F. and Greenland, S. (1990). Recall bias in a case-control study of sudden infant death syndrome. International Journal of Epidemiology 19 405–411.
  • Eddy, D. M., Hasselblad, V. and Shachter, R. (1992). Meta-Analysis by the Confidence Profile Method. Academic Press, New York.
  • Espeland, M. and Hui, S. L. (1987). A general approach to analyzing epidemiologic data that contain misclassification errors. Biometrics 43 1001–1012.
  • Fortes, C., Mastroeni, S., Melchi, F., Pilla, M. A., Antonelli, G., Camaioni, D., Alotto, M. and Pasquini, P. (2008). A protective effect of the Mediterranean diet for cutaneous melanoma. International Journal of Epidemiology 37 1018–1029.
  • Gelfand, A. E. and Sahu, S. K. (1999). Identifiability, improper priors, and Gibbs sampling for generalized linear models. J. Amer. Statist. Assoc. 94 247–253.
  • Geneletti, S., Ricequalityson, S. and Best, N. (2009). Adjusting for selection bias in retrospective case-control studies. Biostatistics 10 17–31.
  • Good, I. J. (1983). Good Thinking. Univ. Minnesota Press, Minneapolis.
  • Goubar, A., Aedes, A. E., DeAngelis, D., McGarrigle, C. A., Mercer, C. H., Tookey, P. A., Fenton, K. and Gill, O. N. (2008). Estimates of human immunodeficiency virus prevalence and proportion diagnosed based on Bayesian multiparameter synthesis of surveillance data (with discussion). J. Roy. Statist. Soc. Ser. A 171 541–580.
  • Greenland, S. (1992). A semi-Bayes approach to the analysis of correlated associations, with an application to an occupational cancer-mortality study. Statist. Med. 11 219–230.
  • Greenland, S. (2000). When should epidemiologic regressions use random coefficients? Biometrics 56 915–921.
  • Greenland, S. (2003a). The impact of prior distributions for uncontrolled confounding and response bias: A case study of the relation of wire codes and magnetic fields to childhood leukemia. J. Amer. Statist. Assoc. 98 47–54.
  • Greenland, S. (2003b). Generalized conjugate priors for Bayesian analysis of risk and survival regressions. Biometrics 59 92–99.
  • Greenland, S. (2003c). Quantifying biases in causal models: Classical confounding versus collider-stratification bias. Epidemiology 14 300–306.
  • Greenland, S. (2005a). Multiple-bias modeling for analysis of observational data (with discussion). J. Roy. Statist. Soc. Ser. A 168 267–308.
  • Greenland, S. (2005b). Contribution to discussion of Prentice, Pettinger, and Anderson. Biometrics 61 920–921.
  • Greenland, S. (2006). Bayesian perspectives for epidemiologic research. I. Foundations and basic methods (with comment and reply). International Journal of Epidemiology 35 765–778.
  • Greenland, S. (2007a). Bayesian perspectives for epidemiologic research. II. Regression analysis. International Journal of Epidemiology 36 195–202.
  • Greenland, S. (2007b). Prior data for non-normal priors. Statist. Med. 26 3578–3590.
  • Greenland, S. (2007c). Maximum-likelihood and closed-form estimators of epidemiologic measures under misclassification. J. Statist. Plann. Inference 138 528–538.
  • Greenland, S. (2009). Bayesian perspectives for epidemiologic research III. Bias analysis via missing data methods. International Journal of Epidemiology 38 1662–1673.
  • Greenland, S., Gago-Domiguez, M. and Castellao, J. E. (2004). The value of risk-factor (“black-box”) epidemiology (with discussion). Epidemiology 15 519–535.
  • Greenland, S. and Kheifets, L. (2006). Leukemia attributable to residential magnetic fields: Results from analyses allowing for study biases. Risk Analysis 26 471–482.
  • Greenland, S. and Lash, T. L. (2008). Bias analysis. In Modern Epidemiology, 3rd ed. (K. J. Rothman, S. Greenland and T. L. Lash, eds.) Chapter 19, 345–380. Lippincott–Williams–Wilkins, Philadelphia.
  • Greenland, S. and Maldonado, G. (1994). The interpretation of multiplicative model parameters as standardized parameters. Statist. Med. 13 989–999.
  • Gustafson, P. (2003). Measurement Error and Misclassification in Statistics and Epidemiology: Impacts and Bayesian Adjustments. Chapman and Hall/CRC Press, Boca Raton.
  • Gustafson, P. (2005). On model expansion, model contraction, identifiability, and prior information: Two illustrative scenarios involving mismeasured variables (with discussion). Statist. Sci. 20 111–140.
  • Gustafson, P. and Greenland, S. (2006). The performance of random coefficient regression in accounting for residual confounding. Biometrics 62 760–768.
  • Gustafson, P. and Greenland, S. (2010). Interval estimation for messy observational data. To appear.
  • Gustafson. P., Le, N. D. and Saskin, R. (2001). Case-control analysis with partial knowledge of exposure misclassification probabilities. Biometrics 57 598–609.
  • Hastie, T. and Tibshirani, R. (1990). Generalized Additive Models. Chapman and Hall, New York.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
  • Higgins, J. P. T. and Spiegelhalter, D. J. (2002). Being skeptical about meta-analyses: A Bayesian perspective on magnesium trials in myocardial infarction. International Journal of Epidemiology 31 96–104, appendix.
  • Hui, S. L. and Walter, S. D. (1980). Estimating the error rates of diagnostic tests. Biometrics 36 167–171.
  • Johnson, W. O., Gastwirth, J. L. and Pearson, L. M. (2001). Screening without a “Gold Standard”: The Hui–Walter Paradigm revisited. American Journal of Epidemiology 153 921–924.
  • Jones, M. C. (2004). Families of distributions arising from distributions of order statistics. Test 13 1–44.
  • Joseph, L., Gyorkos, T. W. and Coupal, L. (1995). Bayesian estimation of disease prevalence and parameters for diagnostic tests in the absence of a gold standard. American Journal of Epidemiology 141 263–272.
  • Kadane, J. B. (1993). Subjective Bayesian analysis for surveys with missing data. The Statistician 42 415–426. Erratum (1996): The Statistician 45 539.
  • Kraus, J. F., Greenland, S. and Bulterys, M. G. (1989). Risk factors for sudden infant death syndrome in the U.S. Collaborative Perinatal Project. International Journal of Epidemiology 18 113–120.
  • Lash, T. L. and Fink, A. K. (2003). Semi-automated sensitivity analysis to assess systematic errors in observational epidemiologic data. Epidemiology 14 451–458.
  • Lawlor, D. A., Davey Smith, G., Bruckdorfer, K. R., Kundu, D. and Ebrahim, S. (2004). Those confounded vitamins: What can we learn from the differences between observational versus randomized trial evidence? Lancet 363 1724–1727.
  • Leamer, E. E. (1974). False models and post-data model construction. J. Amer. Statist. Assoc. 69 122–131.
  • Leonard, T. and Hsu, J. S. J. (1999). Bayesian Methods. Cambridge University Press, Cambridge.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, New York.
  • Lyles, R. H. (2002). A note on estimating crude odds ratios in case-control studies with differentially misclassified exposure. Biometrics 58 1034–1037.
  • Maldonado, G. (2008). Adjusting a relative-risk estimate for study imperfections. Journal of Epidemiology and Community Health 62 655–663.
  • McCandless, L. C., Gustafson, P. and Levy, A. (2007). Bayesian sensitivity analysis for unmeasured confounding in observational studies. Statist. Med. 26 2331–2347.
  • McLachlan, G. J. and Krishnan, T. (1997). The EM Algorithm and Extensions. Wiley, New York.
  • Messer, K. and Natarajan, L. (2008). Maximum likelihood, multiple imputation and regression calibration for measurement error adjustment. Statist. Med. 27 6332–6350.
  • Molenberghs, G., Kenward, M. G. and Goetghebeur, E. (2001). Sensitivity analysis for incomplete contingency tables. Appl. Statist. 50 15–29.
  • Molitor, J., Jackson, C., Best, N. B. and Ricequalityson, S. (2008). Using Bayesian graphical models to model biases in observational studies and to combine multiple data sources: Application to low birthweight and water disinfection by-products. J. Roy. Statist. Soc. Ser. A 172 615–638.
  • Neath, A. A. and Samaniego, F. J. (1997). On the efficacy of Bayesian inference for nonidentifiable models. Amer. Statist. 51 225–232.
  • Phillips, C. V. (2003). Quantifying and reporting uncertainty from systematic errors. Epidemiology 14 459–466.
  • Robins, J. M., Rotnitzky, A. and Scharfstein, D. O. (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical Models in Epidemiology, the Environment, and Clinical Trials (Minneapolis, MN, 1997). IMA Vol. Math. Appl. 116 1–94. Springer, New York.
  • Rosenbaum, P. R. (1999). Choice as an alternative to control in observational studies (with discussion). Statist. Sci. 14 259–304.
  • Rosenbaum, P. R. (2002). Observational Studies, 2nd ed. Springer, New York.
  • Samaniego, F. J. and Neath, A. A. (1996). How to be a better Bayesian. J. Amer. Statist. Assoc. 91 733–742.
  • Scharfstein, D. O., Rotnitsky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Amer. Statist. Assoc. 94 1096–1120.
  • Scharfstein, D. O., Daniels, M. J. and Robins, J. M. (2003). Incorporating prior beliefs about selection bias into the analysis of randomized trials with missing outcomes. Biostatistics 4 495–512.
  • Small, D. R. and Rosenbaum, P. R. (2009). Error-free milestones in error-prone measurements. Ann. Appl. Statist. To appear.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the Lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Titterington, D. M. (1985). Common structure of smoothing techniques in statistics. Internat. Statist. Rev. 53 141–170.
  • Turner, R. M., Spiegelhalter, D. J., Smith, G. C. S. and Thompson, S. G. (2009). Bias modeling in evidence synthesis. J. Roy. Statist. Soc. Ser. A 172 21–47.
  • Vansteelandt, S., Goetghebeur, E., Kenward, M. G. and Molenberghs, G. (2006). Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statist. Sinica 16 953–980.
  • Walker, A. M. (1982). Anamorphic analysis: Sampling and estimation for covariate effects when both exposure and disease are known. Biometrics 38 1025–1032.
  • Welton, N. J., Ades, A. E., Carlin, J. B., Altman, D. G. and Sterne, J. B. (2009). Models for potentially biased evidence in meta-analysis using empirically based priors. J. Roy. Statist. Soc. Ser. A 172 119–136.
  • Werler, M. M., Pober, B. R., Nelson, K. and Holmes, L. B. (1989). Reporting accuracy among mothers of malformed and nonmalformed infants. American Journal of Epidemiology 129 415–421.
  • White, J. E. (1982). A two-stage design for the study of the relationship between a rare exposure and a rare disease. American Journal of Epidemiology 115 119–128.
  • Yanagawa, T. (1984). Case-control studies: Assessing the effect of a confounding factor. Biometrika 71 191–194.