Statistical Science

Publication bias in meta-analysis: a Bayesian data-augmentation approach to account for issues exemplified in the passive smoking debate

Geof H. Givens, D. D. Smith, and R. L. Tweedie

Full-text: Open access


"Publication bias" is a relatively new statistical phenomenon that only arises when one attempts through a meta-analysis to review all studies, significant or insignificant, in order to provide a total perspective on a particular issue. This has recently received some notoriety as an issue in the evaluation of the relative risk of lung cancer associated with passive smoking, following legal challenges to a 1992 Environmental Protection Agency analysis which concluded that such exposure is associated with significant excess risk of lung cancer.

We introduce a Bayesian approach which estimates and adjusts for publication bias. Estimation is based on a data-augmentation principle within a hierarchical model, and the number and outcomes of unobserved studies are simulated using Gibbs sampling methods. This technique yields a quantitative adjustment for the passive smoking meta-analysis. We estimate that there may be both negative and positive but insignificant studies omitted, and that failing to allow for these would mean that the estimated excess risk may be overstated by around 30%, both in U.S. studies and in the global collection of studies.

Article information

Statist. Sci., Volume 12, Number 4 (1997), 221-250.

First available in Project Euclid: 22 August 2002

Permanent link to this document

Digital Object Identifier

Meta-analysis publication bias missing data data augmentation Markov chain Monte Carlo MCMC Gibbs sampling environmental tobacco smoke ETS passive smoking lung cancer file-drawer problem


Givens, Geof H.; Smith, D. D.; Tweedie, R. L. Publication bias in meta-analysis: a Bayesian data-augmentation approach to account for issues exemplified in the passive smoking debate. Statist. Sci. 12 (1997), no. 4, 221--250. doi:10.1214/ss/1030037958.

Export citation


  • (1992). Pre-existing lung disease and lung cancer among nonsmoking women. American Journal of Epidemiology 6 623-632.
  • Berlin, J. A., Begg, C. B. and Louis, T. A. (1989). An assessment of publication bias using a sample of published clinical trials. J. Amer. Statist. Assoc. 84 381-392.
  • Bero, L. A., Glantz, S. A. and Rennie, D. (1994). Publication bias and public health policy on environmental tobacco smoke. Journal of the American Medical Association 272 133-136.
  • Besag, J. E. and Green, P. J. (1993). Spatial statistics and Bayesian computation (with discussion). J. Roy. Statist. Soc. Ser. B 55 25-37. Biggerstaff, B. J., Mengersen, K. L. and Tweedie, R. L.
  • (1994). Passive smoking in the workplace: a classical and Bayesian meta-analysis. International Archives of Occupational and Environmental Health 66 269-277.
  • Biggerstaff, B. J. and Tweedie, R. L. (1996). Incorporating variability in estimates of heterogeneity in the random effects model in meta-analysis. Statistics in Medicine 16 753- 768.
  • Carlin, J. B. (1992). Meta-analysis for 2 × 2 tables: a Bayesian approach. Statistics in Medicine 11 141-158.
  • Chalmers, T. C. (1991). Problems induced by meta-analysis. Statistics in Medicine 10 971-980.
  • Cooper, H. and Hedges, L. V., eds. (1994). The Handbook of Research Sy nthesis. Russell Sage Foundation, New York.
  • Crossen, C. (1994). Tainted Truth: The Manipulation of Fact in America. Simon and Schuster, New York.
  • Dear, K. B. G. (1995). Personal communication.
  • Dear, K. B. G. and Begg, C. B. (1992). An approach for assessing publication bias prior to performing a meta-analysis. Statist. Sci. 7 237-245.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38.
  • Doll, R. (1986). The aetiology of the Spanish toxic shock sy ndrome: interpretation of the epidemiological evidence. Report to the WHO Regional Office for Europe.
  • DuMouchel, W. (1990). Bayesian meta-analysis. In Statistical Methods for Pharmacology (D. Berry, ed.) 509-529. Dekker, New York.
  • Eberly, L. E. and Casella, G. (1996). Estimating the number of unseen studies. Technical Report BUM 1308-MA, Biometrics Unit, Cornell Univ.
  • EPA (1990). Health effects of passive smoking: assessment of lung cancer in adults and respiratory disorders in children. Draft report, United States EPA, Washington, D.C.
  • EPA (1992). Health Effects of Passive Smoking: Assessment of Lung Cancer in Adults and Respiratory Disorders in Children. United States EPA, National Academy Press, Washington, D.C.
  • Felson, D. T. (1992). Bias in meta-analytic research. Journal Clinical Epidemiology 45 885-892.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analy sis and Machine Intelligence 6 721-741.
  • Gleser, L. J. and Olkin, I. (1996). Models for estimating the number of unpublished studies. Statistics in Medicine 15 2493-2507.
  • Hedges, L. V. (1992). Modeling publication selection effects in meta-analysis. Statist. Sci. 7 246-255.
  • Hedges, L. V. and Olkin, I. (1985). Statistical Methods for Metaanalysis. Academic Press, New York.
  • Hiray ama, T. (1981). Nonsmoking wives of heavy smokers have a higher risk of lung cancer: a study from Japan. British Medical Journal 282 183-185.
  • Hiray ama, T. (1984). Lung cancer in Japan: effects of nutrition and exposure to ETS. In Lung Cancer: Causes and Preventions 175-195. Verlag Chemie, Weinheim.
  • Iy engar, S. and Greenhouse, J. B. (1988). Selection models and the file drawer problem (with discussion). Statist. Sci. 3 109- 135. Janerich, D. T., Thompson, W. D., Varela, L. R., Greenwald, P., Chorost, S., Tucci, C., Zaman, M. B., Melamed, M. R.,
  • Kiely, M. and McKneally, M. F. (1990). Lung cancer and exposure to tobacco smoke in the household. New England Journal of Medicine 323 632-636.
  • Kawachi, I. and Colditz, G. C. (1996). Invited commentary: confounding, measurement error, and publication bias in studies of passive smoking. American Journal of Epidemiology 144 909-915. LaFleur, B., Tay lor, S. J., Smith, D. D. and Tweedie, R. L.
  • (1996). Bayesian assessment of publication bias in metaanalyses of cervical cancer and oral contraceptives. In Proceedings of the 1996 Epidemiology Section of the Joint Statistical Meetings, Amer. Statist. Assoc., Alexandria, VA 32-37.
  • Larose, D. T. and Dey, D. K. (1995). Modeling publication bias using weighted distributions in a Bayesian framework. Technical Report 95-02, Dept. Statistics, Univ. Connecticut.
  • Lee, P. N. (1992). Environmental Tobacco Smoke and Mortality. Karger, Basel.
  • Light, R. J. and Pillemer, D. B. (1984). Summing Up: The Science of Reviewing Research. Harvard Univ. Press.
  • Mausner, J. S. and Kramer, S. (1985). Mausner & Bahn Epidemiology-An Introductory Text, 2nd ed. Saunders, Philadelphia. Mengersen, K. L., Tweedie, R. L. and Biggerstaff, B. J.
  • (1995). The impact of method choice in meta-analysis. Austral. J. Statist. 7 19-44.
  • Mosteller, F. and Chalmers, T. C. (1992). Some progress and problems in meta-analysis of clinical trials. Statist. Sci. 7 227-236.
  • NH&MRC (1995). The Health Effects of Passive Smoking: Draft Report. NH&MRC Working Party, Canberra, Australia.
  • NRC (1992). Combining Information: Statistical Issues and Opportunities for Research. National Academy Press, Washington, D.C. (Report of the National Research Council Committee on Applied and Theoretical Statistics.)
  • OEHHA (1996). Carcinogenic effects of exposure to environmental tobacco smoke. Excerpt: ETS and lung cancer. Review draft report, Reproductive and Cancer Hazard Assessment Section, Office of Environmental Health Hazard Assessment, CA.
  • OSHA (1994). Proposed rule on indoor air quality. Federal Register 59(65) 15968-16039. (Draft regulation.)
  • Olkin, I. (1992). Meta-analysis: methods for combining independent studies. Statist. Sci. 7 226.
  • Patil, G. P. and Taillie, C. (1989). Probing encountered data, meta-analysis and weighted distribution methods. In Statistical Data Analy sis and Inference (Y. Dodge, ed.). NorthHolland, Amsterdam.
  • Paul, N. L. (1995). Non-parametric classes of weight functions to model publication bias. Technical Report 622, Dept. Statistics, Carnegie Mellon Univ. Press, W. H., Flannery, B. P., Teukolsky, S. A. and Vetter
  • ling, W. T. (1986). Numerical Recipes: The Art of Scientific Computing. Cambridge Univ. Press. Raftery, A. E. and Lewis, S. M. (1992a). How many iterations in the Gibbs sampler? In Bayesian Statistics 4 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 763-773. Oxford Univ. Press. Raftery, A. E. and Lewis, S. M. (1992b). One long run with diagnostics: implementation strategies for Markov chain Monte Carlo. Statist. Sci. 7 493-497.
  • Raftery, A. E. and Lewis, S. M. (1995). The number of iterations, convergence diagnostics and generic Metropolis algorithms. In Practical Markov Chain Monte Carlo (W. R. Gilks, D. J. Spiegelhalter and S. Richardson, eds.). Chapman and Hall, London.
  • Schneiderman, M. A., Davis, D. L. and Wagener, D. K. (1989). Lung cancer that is not attributable to smoking: letter to the editor. Journal of the American Medical Association 261 2635-2636.
  • Simes, R. J. (1996). Strategies for minimising bias in sy stematic reviews of randomised trials. Presented at Sy dney International Statistical Congress, Sy dney, Australia.
  • Smith, D. D., Givens, G. H. and Tweedie, R. L. (1997). Adjustment for publication and quality bias in Bayesian metaanalysis. Unpublished manuscript.
  • Smith, A. F. M. and Roberts, G. O. (1993). Bayesian computation via the Gibbs sampler and related Markov chain Monte Carlo methods (with discussion). J. Roy. Statist. Soc. Ser. B 55 3-23.
  • Spiegelhalter, D., Thomas, A., Best, N. and Gilks, W. (1996). BUGS: Bayesian Inference Using Gibbs Sampling, v. 0.50. MRC Biostatistics Unit, Institute of Public Health, Cambridge.
  • Sterling, T. D., Rosenbaum, W. L. and Weinkam, J. J. (1995). Publication decisions revisited: the effect of the outcome of statistical tests on the decision to publish and vice versa. Amer. Statist. 49 108-112.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528-550.
  • Tay lor, S. J. and Tweedie, R. L. (1997). Assessing sensitivity to multiple factors in calculating attributable risks. Environmetrics 8 351-372.
  • Terrell, G. R. (1990). The maximal smoothing principle in density estimation. J. Amer. Statist. Assoc. 85 470-477.
  • Thompson, S. G. (1993). Controversies in meta-analysis: the case of the trials of serum cholesterol reduction. Statistical Methods in Medical Research 2 173-192.
  • Thompson, S. G. and Pocock, S. J. (1991). Can meta-analyses be trusted? The Lancet 338 1127-1130.
  • Trichopoulos, D., Kalandidi, A. and Sparros, L. (1983). Lung cancer and exposure to ETS. Conclusion of Greek Study. The Lancet ii 677-678. Trichopoulos, D., Kalandidi, A., Sparros, L. and MacMahon,
  • B. (1981). Lung cancer and exposure to ETS. International Journal of Cancer 27 1-4.
  • Tweedie, R. L., Mengersen, K. L. and Eccleston, J. A. (1994). Garbage in, garbage out: can statisticians quantify the effects of poor data? Chance 7(2) 20-27. Tweedie, R. L., Scott, D. J., Biggerstaff, B. J. and Mengersen,
  • K. L. (1996). Bayesian meta-analysis, with application to studies of environmental tobacco smoke and lung cancer. Lung Cancer Suppl. 1 14 S171-S194.
  • Vandenbroucke, J. P. (1988). Passive smoking and lung cancer: a publication bias? British Medical Journal 296 391-392.
  • Varela, L. R. (1987). Assessment of the association between passive smoking and lung cancer. Ph.D. dissertation, Yale Univ.
  • Wy nder, E. L. (1987). Workshop on guidelines to the epidemiology of weak associations. Preventive Medicine 16 139-141.
  • presence (for a review, see Begg and Berlin, 1988). This plot of effect size versus sample size, or more properly the variance of the effect size, should be sy mmetric in the absence of bias. If there is a sy stematic preference for publishing data-dependent results favoring (or opposing) the hy pothesis of interest, this will have the effect of skewing the graph. A relatively simple significance test can thus be constructed based on the rank correlation between the effect sizes and their variances, suitably standardized to ensure that the studies are i.i.d. (Begg and
  • Mazumdar, 1994). I have performed this test using the data reported in Table 2 of Tweedie et al. (1996), including the study by Butler although it appears to be omitted from the analysis by Givens, Smith and Tweedie. This results in an adjusted rank correlation of 0.18 and a corresponding two-sided pvalue of 0.13. The sample size for this test is the number of component studies in the meta-analysis, namely, 36, so its power is limited. Nonetheless the results show a nonsignificant trend supporting the concept of bias, that is, the studies with the smallest p-values tend to be the ones with the smaller sample sizes. Bias of this nature has a differential impact on the ty pe of analysis performed. Givens, Smith and Tweedie, like many other commentators, favor a random effects approach to the analysis. In fact the traditional random effects method (Dersimonian and Laird, 1986) is much more susceptible to publication bias, in the absence of adjustment for bias, than the fixed effects approach. In both of these methods, the summary effect size is a weighted average of the individual effect sizes; only the weights differ. In the fixed effects approach the
  • al. (1996). I obtain a fixed effects estimate of 1.17 1 08 1 26 and a random effects estimate of 1.21 1 09 1 35, and so the additional bias in the random effects estimator caused by this phenomenon would appear to be about 0.04. Although the relative impact of bias on these methods, if it exists, is fairly small in this example, it can be profound if the range of variances is large and the effect of selective publication is strong, as was the case in the important meta-analysis of the risks of cancer due to the chlorination of the water supply (Morris
  • et al., 1992). In general, uncritical use of the random effects method is hazardous, in my opinion. The use of the funnel graph and the analogous rank correlation test is not the only contrast available for detecting publication bias. In fact this approach is largely dependent for its power on the existence of a broad range of variances among the individual component studies, and this has been examined quantitatively by simulation (Begg and
  • Mazumdar, 1994). A completely different structure for tackling the problem, and one which does not rely in any fundamental way on the variances differing from study to study, is to use "selection modeling," and this is the general framework employ ed by Givens, Smith and Tweedie in the spirit of earlier work by Iy engar and Greenhouse (1998), Hedges
  • (1992) and Dear and Begg (1992). All of these authors have elected to assume that the selection probability is a function of the p-value. Conceptually, what happens in these models is that the pattern of the distribution of p-values is examined to see if it is consistent with what would be expected in the absence of bias. If there are gaps in the anticipated pattern, then their presence is attributed to missing unpublished studies, the impact of which is imputed to make the bias adjustments. It is easiest to conceptualize this in the context of the null hy pothesis of no effect size, that is, = 0. In this setting the p-values should correspond to a uniform distribution on 0 1. Selective publication of statistically significant studies will lead to a concentration of p-values at the lower end of the sample space. However, this pattern could also be due to the fact that = 0. Thus, the effects of a true signal = 0 and of publication bias are hard to disentangle, and the leverage for doing so is entirely bound up in the modeling assumptions, notably the assumption of normal distributions for the observed effect sizes, the assumption of known variances and the nature of the random effects distribution. My own experiences with this kind of approach lead me to believe that it is not a sound basis for making inferences about the true effect size, and that these models are useful only as part of a set of semiformal tools for identifying bias, rather than for cor
  • recting it (Dear and Begg, 1992). Indeed the "simulation" studies presented by Givens, Smith and Tweedie, which appear to be simply two applications of the method using data generated from a known model, do not inspire confidence that the model will be reliable in making accurate bias corrections in general. A final concern I have is with the selection of prior distributions. As so often occurs in the application of fully Bayesian methods the priors appear to be picked out of the thin air without any substantive justification. In the primary analysis, the use of a N 0 0 152 prior is essentially akin to adding a new study to the meta-analysis with effect size zero. That is, this imaginary study has a relative risk of 1 and a confidence interval ranging from 0.75 to 1.34. A glance at Figure 1 of Givens, Smith and Tweedie shows that such a study would be among the larger of the existing studies. Moreover, since it is centered on the null hy pothesis, its inclusion clearly tilts the analysis in favor of the null. The sensitivity analyses of this issue are unconvincing to me, since even though the posterior means only range from 1.12 to 1.15 (Table 2), this is quite a large difference in the context of the analysis, especially regarding the conclusion in the Abstract about the 30% overstatement of risk. The only prior that would make any sense to me in this context is the noninformative prior, and the implied advocacy of a highly informative prior centered on the null would seem to me to be very poor advice for any future users of this methodology. The other priors are similarly unappealing to me. The restricted uniform priors on the weights seem contrived, and also tilted in favor of publication bias with no clear rationale. The need to generate study variances via a prior distribution also seems contrived, and tangential to the fundamental goals of the analysis. In summary, I suspect that the overall conclusions of Givens, Smith and Tweedie may not be too far from the truth, but I am concerned about how the authors got there. There is some suggestive
  • (1983). A key assumption made by GST is that the publication selection criterion is based solely on each study's one-sided p-value for rejecting the null hy pothesis 0. Why should this be based on the onesided p-value? Are the authors assuming that studies showing a significant protective effect of ETS would be discriminated against?
  • (EPA, 1992). What were the values of the j for these new studies? We guess that they are larger than those for most of the first-reported studies. To summarize our discussion, in spite of what may seem like critical comments we do assume that publication bias is a real phenomenon and that the paper under discussion is a nice contribution to the methodology of detecting and correcting for such bias. Our most serious concern is with the form of the assumed publication bias criterion, and we would like to see whether adding a factor for dependence on the j, as we suggested above, would modify the results of the ETS analysis.
  • fects, as developed in Smith and Tweedie (1997). Figure 1 shows the characteristic shape of a funnel based on a density estimate of the 50 "observed" studies in simulation example (a) [Section 3.5(a)]. Figure 2 shows a density estimate for the location of the imputed studies in simulation example (a). Note that the imputation ty pically misses the extreme suppressed studies, although in general it does coincide with the locations of the missing studies; and the augmenting studies are indeed smaller than Begg fears. All discussants consider questions raised by the use of prior distributions. DuMouchel and Harris, Fig. 1. Funnel plot and smoothed density of the total data set of 50 studies from simulation example (a) [Section 3.5(a)] before suppression.
  • Begg, C. B. and Berlin, J. A. (1988). Publication bias: a problem in interpreting medical data (with discussion). J. Roy. Statist. Soc. Ser. A 151 419-463.
  • Begg, C. B. and Mazumdar, M. (1994). Operating characteristics of a rank correlation test for publication bias. Biometrics 50 1088-1101.
  • Dersimonian, R. and Laird, N. (1986). Meta-analysis in clinical trials. Controlled Clinical Trials 7 177-188.
  • Garfinkel, L., Auerbach, O. and Joubert, L. (1985). Involuntary smoking and lung cancer: a case-control study. Journal of the National Cancer Institute 75 463-469. Hung, H. M. J., O'Neill, R. T., Bauer, P. and K ¨ohne, K.
  • (1997). The behavior of the P-value when the alternative hy pothesis is true. Biometrics 53 11-22. Morris, R. D., Audet, A. M., Angelillo, I. F., Chalmers, T.
  • C. and Mosteller, F. (1992). Chlorination, chlorination by products and cancer: a meta-analysis. American Journal of Public Health 82 955-963.
  • Rosenthal, R. (1979). The "file drawer problem" and tolerance for null results. Psy chological Bulletin 86 638-641.
  • Smith, D. D. (1997). Accounting for publication bias and quality differences in Bayesian random effects meta-analytic models. Ph.D. dissertation, Colorado State Univ.
  • Smith, D. D. and Tweedie, R. L. (1997). A density estimation diagnostic for publication bias adjusted meta-analysis. In preparation.

See also

  • Includes: Colin B. Begg. Comment by Colin B. Begg.
  • Includes: William DuMouchel, Jeffrey Harris. Comment by William DuMouchel and Jeffrey Harris.
  • Includes: Annette Dobson, Keith Dear. Comment by Annette Dobson and Keith Dear.
  • Includes: Geof H. Givens, D. D. Smith, R. L. Tweedie. Rejoinder by Geof H. Givens, D. D. Smith and R. L. Tweedie.