The Neyman–Fisher controversy considered here originated with the 1935 presentation of Jerzy Neyman’s Statistical Problems in Agricultural Experimentation to the Royal Statistical Society. Neyman asserted that the standard ANOVA F-test for randomized complete block designs is valid, whereas the analogous test for Latin squares is invalid in the sense of detecting differentiation among the treatments, when none existed on average, more often than desired (i.e., having a higher Type I error than advertised). However, Neyman’s expressions for the expected mean residual sum of squares, for both designs, are generally incorrect. Furthermore, Neyman’s belief that the Type I error (when testing the null hypothesis of zero average treatment effects) is higher than desired, whenever the expected mean treatment sum of squares is greater than the expected mean residual sum of squares, is generally incorrect. Simple examples show that, without further assumptions on the potential outcomes, one cannot determine the Type I error of the F-test from expected sums of squares. Ultimately, we believe that the Neyman–Fisher controversy had a deleterious impact on the development of statistics, with a major consequence being that potential outcomes were ignored in favor of linear models and classical statistical procedures that are imprecise without applied contexts.
"Comments on the Neyman–Fisher Controversy and Its Consequences." Statist. Sci. 29 (2) 267 - 284, May 2014. https://doi.org/10.1214/13-STS454