## Brazilian Journal of Probability and Statistics

### Adaptative significance levels using optimal decision rules: Balancing by weighting the error probabilities

#### Abstract

Our purpose is to recommend a change in the paradigm of testing by generalizing a very natural idea, originated perhaps in Jeffreys [Proceedings of the Cambridge Philosophy Society 31 (1935) 203–222; The Theory of Probability (1961) Oxford Univ. Press] and clearly exposed by DeGroot [Probability and Statistics (1975) Addison-Wesley], with the aim of developing an approach that is attractive to all schools of statistics, resulting in a procedure better suited to the needs of science. The essential idea is to base testing statistical hypotheses on minimizing a weighted sum of type I and type II error probabilities instead of the prevailing paradigm, which is fixing type I error probability and minimizing type II error probability. For simple vs simple hypotheses, the optimal criterion is to reject the null using the likelihood ratio as the evidence (ordering) statistic, with a fixed threshold value instead of a fixed tail probability. By defining expected type I and type II error probabilities, we generalize the weighting approach and find that the optimal region is defined by the evidence ratio, that is, a ratio of averaged likelihoods (with respect to a prior measure) and a fixed threshold. This approach yields an optimal theory in complete generality, which the classical theory of testing does not. This can be seen as a Bayesian/non-Bayesian compromise: using a weighted sum of type I and type II error probabilities is Frequentist, but basing the test criterion on a ratio of marginalized likelihoods is Bayesian. We give arguments to push the theory still further, so that the weighting measures (priors) of the likelihoods do not have to be proper and highly informative, but just “well calibrated.” That is, priors that give rise to the same evidence (marginal likelihoods) using minimal (smallest) training samples.

The theory that emerges, similar to the theories based on objective Bayesian approaches, is a powerful response to criticisms of the prevailing approach of hypothesis testing. For criticisms see, for example, Ioannidis [PLoS Medicine 2 (2005) e124] and Siegfried [Science News 177 (2010) 26–29], among many others.

#### Article information

Source
Braz. J. Probab. Stat., Volume 30, Number 1 (2016), 70-90.

Dates
Accepted: August 2014
First available in Project Euclid: 19 January 2016

https://projecteuclid.org/euclid.bjps/1453211803

Digital Object Identifier
doi:10.1214/14-BJPS257

Mathematical Reviews number (MathSciNet)
MR3453515

Zentralblatt MATH identifier
1381.62015

#### Citation

Pericchi, Luis; Pereira, Carlos. Adaptative significance levels using optimal decision rules: Balancing by weighting the error probabilities. Braz. J. Probab. Stat. 30 (2016), no. 1, 70--90. doi:10.1214/14-BJPS257. https://projecteuclid.org/euclid.bjps/1453211803

#### References

• Bayarri, M. J. and Berger, J. O. (2004). The interplay between Bayesian and frequentist analysis. Statist. Sci. 19, 58–80.
• Berger, J. O. (2008). A comparison of testing methodologies. In The Proceedings of PHYSTAT-LHC Workshop on Statistical Issues for LHC Physics, June 2007, CERN 2008-001, 8–19. Geneve: CERN.
• Berger, J. O. and Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and prediction. J. Amer. Statist. Assoc. 91, 109–122.
• Berger, J. O., Pericchi, L. R. and Varshavsky, J. A. (1998). Bayes factors and marginal distributions in invariant situations. Sankyā Ser. A 60, 307–321.
• Berger, J. O. and Pericchi, L. R. (2001). Objective Bayesian model selection. Introduction and comparisons. In Model Selection (P. Lahiri, ed.). Lecture Notes Monogr. Ser. 68, 135–207. Beachwood, OH: IMS.
• Bickel, P. J. and Doksum, K. A. (1977). Mathematical Statistics: Basic Ideas and Selected Topics. San Francisco, CA: Holden-Day, Inc.
• Birnbaum, A. (1969). Concepts of Statistical Evidence. In Philosophy, Science and Methods: Essays in Honor of Ernest Nagel (S. Morgenbesser, P. Suppes and M. White, eds.). New York: St. Martin’s Press.
• Cox, D. R. and Hinkley, D. V. (1974). Concepts of Statistical Evidence. Theoretical Statistics. London: Chapman and Hall.
• DeGroot, M. (1975). Probability and Statistics, 2nd ed. Reading, MA: Addison-Wesley.
• Dempster, A. P. (1997). The direct use of likelihood for significance testing. Stat. Comput. 7, 247–252.
• Freeman, P. (1993). The role of $p$-values in analyzing trial results. Stat. Med. 12, 1443–1452.
• Good, I. J. (1992). The Bayes/non-Bayes compromise: A brief review. J. Amer. Statist. Assoc. 87, 597–606.
• Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine 2, e124.
• Jeffreys, H. (1935). Some test of significance, treated by the theory of probability. Math. Proc. Cambridge Philos. Soc. 31, 203–222.
• Jeffreys, H. (1961). The Theory of Probability, 3rd ed. Oxford: Clarendon Press.
• Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90, 791.
• Lindley, D. V. (1957). A statistical paradox. Biometrika 44, 187–192.
• Lindley, D. V. and Phillips, L. D. (1976). Inference for a Bernoulli process (a Bayesian view). Amer. Statist. 30, 112–119.
• Pereira, C. A. B. (1985). Testing hypotheses of diferent dimensions: Bayesian view and classical interpretation (in Portuguese). Professor thesis, The Institute of Mathematics and Statistics, Univ. de São Paulo.
• Pereira, C. A. B. and Wechsler, S. (1993). On the concept of $p$-value. Braz. J. Probab. Stat. 7, 159–177.
• Perez, M. E. and Pericchi, L. R. (2014). Changing statistical significance as the amount of information changes: the adaptive $\alpha$ significance level. Statist. Probab. Lett. 85, 20–24.
• Pericchi, L. R. (2005). Model selection and hypothesis testing based on objective probabilities and Bayes factors. Handbook of Statist. 25, 115–149.
• Siegfried, T. (2010). Odds are, it’s wrong science fails to face the shortcomings of statistics. Science News 177, 26–29.
• Varuzza, L. and Pereira, C. A. B. (2010). Significance test for comparing digital gene expression profiles: Partial likelihood application. Chil. J. Stat. 1, 91–102.