Open Access
2018 Pitfalls of significance testing and $p$-value variability: An econometrics perspective
Norbert Hirschauer, Sven Grüner, Oliver Mußhoff, Claudia Becker
Statist. Surv. 12: 136-172 (2018). DOI: 10.1214/18-SS122

Abstract

Data on how many scientific findings are reproducible are generally bleak and a wealth of papers have warned against misuses of the $p$-value and resulting false findings in recent years. This paper discusses the question of what we can(not) learn from the $p$-value, which is still widely considered as the gold standard of statistical validity. We aim to provide a non-technical and easily accessible resource for statistical practitioners who wish to spot and avoid misinterpretations and misuses of statistical significance tests. For this purpose, we first classify and describe the most widely discussed (“classical”) pitfalls of significance testing, and review published work on these misuses with a focus on regression-based “confirmatory” study. This includes a description of the single-study bias and a simulation-based illustration of how proper meta-analysis compares to misleading significance counts (“vote counting”). Going beyond the classical pitfalls, we also use simulation to provide intuition that relying on the statistical estimate “$p$-value” as a measure of evidence without considering its sample-to-sample variability falls short of the mark even within an otherwise appropriate interpretation. We conclude with a discussion of the exigencies of informed approaches to statistical inference and corresponding institutional reforms.

Citation

Download Citation

Norbert Hirschauer. Sven Grüner. Oliver Mußhoff. Claudia Becker. "Pitfalls of significance testing and $p$-value variability: An econometrics perspective." Statist. Surv. 12 136 - 172, 2018. https://doi.org/10.1214/18-SS122

Information

Received: 1 November 2017; Published: 2018
First available in Project Euclid: 4 October 2018

zbMATH: 06976331
MathSciNet: MR3860867
Digital Object Identifier: 10.1214/18-SS122

Keywords: $p$-hacking , $p$-value misinterpretations , $p$-value sample-to-sample variability , Meta-analysis , multiple testing , publication bias , statistical inference , statistical significance

Vol.12 • 2018
Back to Top