Our purpose is to recommend a change in the paradigm of testing by generalizing a very natural idea, originated perhaps in Jeffreys [Proceedings of the Cambridge Philosophy Society 31 (1935) 203–222; The Theory of Probability (1961) Oxford Univ. Press] and clearly exposed by DeGroot [Probability and Statistics (1975) Addison-Wesley], with the aim of developing an approach that is attractive to all schools of statistics, resulting in a procedure better suited to the needs of science. The essential idea is to base testing statistical hypotheses on minimizing a weighted sum of type I and type II error probabilities instead of the prevailing paradigm, which is fixing type I error probability and minimizing type II error probability. For simple vs simple hypotheses, the optimal criterion is to reject the null using the likelihood ratio as the evidence (ordering) statistic, with a fixed threshold value instead of a fixed tail probability. By defining expected type I and type II error probabilities, we generalize the weighting approach and find that the optimal region is defined by the evidence ratio, that is, a ratio of averaged likelihoods (with respect to a prior measure) and a fixed threshold. This approach yields an optimal theory in complete generality, which the classical theory of testing does not. This can be seen as a Bayesian/non-Bayesian compromise: using a weighted sum of type I and type II error probabilities is Frequentist, but basing the test criterion on a ratio of marginalized likelihoods is Bayesian. We give arguments to push the theory still further, so that the weighting measures (priors) of the likelihoods do not have to be proper and highly informative, but just “well calibrated.” That is, priors that give rise to the same evidence (marginal likelihoods) using minimal (smallest) training samples.
The theory that emerges, similar to the theories based on objective Bayesian approaches, is a powerful response to criticisms of the prevailing approach of hypothesis testing. For criticisms see, for example, Ioannidis [PLoS Medicine 2 (2005) e124] and Siegfried [Science News 177 (2010) 26–29], among many others.
"Adaptative significance levels using optimal decision rules: Balancing by weighting the error probabilities." Braz. J. Probab. Stat. 30 (1) 70 - 90, February 2016. https://doi.org/10.1214/14-BJPS257