Statistical Science

Comments on the Neyman–Fisher Controversy and Its Consequences

Arman Sabbaghi and Donald B. Rubin

Full-text: Open access


The Neyman–Fisher controversy considered here originated with the 1935 presentation of Jerzy Neyman’s Statistical Problems in Agricultural Experimentation to the Royal Statistical Society. Neyman asserted that the standard ANOVA F-test for randomized complete block designs is valid, whereas the analogous test for Latin squares is invalid in the sense of detecting differentiation among the treatments, when none existed on average, more often than desired (i.e., having a higher Type I error than advertised). However, Neyman’s expressions for the expected mean residual sum of squares, for both designs, are generally incorrect. Furthermore, Neyman’s belief that the Type I error (when testing the null hypothesis of zero average treatment effects) is higher than desired, whenever the expected mean treatment sum of squares is greater than the expected mean residual sum of squares, is generally incorrect. Simple examples show that, without further assumptions on the potential outcomes, one cannot determine the Type I error of the F-test from expected sums of squares. Ultimately, we believe that the Neyman–Fisher controversy had a deleterious impact on the development of statistics, with a major consequence being that potential outcomes were ignored in favor of linear models and classical statistical procedures that are imprecise without applied contexts.

Article information

Statist. Sci., Volume 29, Number 2 (2014), 267-284.

First available in Project Euclid: 18 August 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Analysis of variance Latin squares nonadditivity randomization tests randomized complete blocks


Sabbaghi, Arman; Rubin, Donald B. Comments on the Neyman–Fisher Controversy and Its Consequences. Statist. Sci. 29 (2014), no. 2, 267--284. doi:10.1214/13-STS454.

Export citation


  • Bartlett, M. S. (1947). The use of transformations. Biometrics 3 39–52.
  • Box, J. F. (1978). R. A. Fisher: The Life of a Scientist. Wiley, New York.
  • Box, G. E. P. (1984). Discussion of paper by D.R. Cox. International Statistical Review 52 26.
  • Box, G. E. P. and Cox, D. R. (1964). An analysis of transformations (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 26 211–252.
  • Cochran, W. G. (1947). Some consequences when the assumptions for the analysis of variance are not satisfied. Biometrics 3 22–38.
  • Cox, D. R. (1958). The interpretation of the effects of non-additivity in the Latin square. Biometrika 46 69–73.
  • Cox, D. R. (1984). Interaction. Internat. Statist. Rev. 52 1–31.
  • Cox, D. R. (1958). Planning of Experiments, 1st ed. Wiley, New York.
  • Cox, D. R. (2012). Statistical causality: Some historical remarks. In Causality: Statistical Perspectives and Applications (C. Berzuini, P. Dawid and L. Bernardinelli, eds.) 1–5. Wiley, New York.
  • Eisenhart, C. (1947). The assumptions underlying the analysis of variance. Biometrics 3 1–21.
  • Fienberg, S. E. and Tanur, J. M. (1996). Reconsidering the fundamental contributions of Fisher and Neyman on experimentation and sampling. International Statistical Review 64 237–253.
  • Fisher, R. A. (1935). Comment on “Statistical problems in agricultural experimentation (with discussion).” Suppl. J. Roy. Statist. Soc. Ser. B 2 154–157, 173.
  • Fisher, R. A. (1971). The Design of Experiments, 9th ed. Hafner Publishing Company, New York.
  • Gourlay, N. (1955a). F-test bias for experimental designs in educational research. Psychometrika 20 227–258.
  • Gourlay, N. (1955b). F-test bias for experimental designs of the latin square type. Psychometrika 20 273–287.
  • Hinkelmann, K. and Kempthorne, O. (2008). Design and Analysis of Experiments. Vol. 1: Introduction to Experimental Design, 2nd ed. Wiley, Hoboken, NJ.
  • Kempthorne, O. (1952). The Design and Analysis of Experiments. Wiley, New York.
  • Kempthorne, O. (1955). The randomization theory of experimental inference. J. Amer. Statist. Assoc. 50 946–967.
  • Lehmann, E. L. (2011). Fisher, Neyman, and the Creation of Classical Statistics. Springer, New York.
  • Mandel, J. (1961). Non-additivity in two-way analysis of variance. J. Amer. Statist. Assoc. 56 878–888.
  • Neyman, J. (1935). Statistical problems in agricultural experimentation (with discussion). Suppl. J. Roy. Statist. Soc. Ser. B 2 107–180.
  • Neyman, J. (1976). Emergence of mathematical statistics. In On the History of Statistics and Probability: Proceedings of a Symposium on the American Mathematical Heritage, to Celebrate the Bicentennial of the United States of America, Held at Southern Methodist University, May 2729, 1974 (D. B. Owen, W. G. Cochran, H. O. Hartley and J. Neyman, eds.) 149–185. Dekker, New York.
  • Pitman, E. (1938). Significance tests which may be applied to samples from any populations: III. The Analysis of Variance Test. Biometrika 29 322–335.
  • Reid, C. (1982). Neyman—from Life. Springer, New York.
  • Rojas, B. (1973). On Tukey’s test of additivity. Biometrics 29 45–52.
  • Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34–58.
  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
  • Rubin, D. B. (1990). Comment on J. Neyman and causal inference in experiments and observational studies: “On the application of probability theory to agricultural experiments. Essay on principles. Section 9” [Ann. Agric. Sci. 10 (1923) 1–51]. Statist. Sci. 5 472–480.
  • Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. J. Amer. Statist. Assoc. 100 322–331.
  • Sabbaghi, A. and Rubin, D. B. (2014) Supplement to “Comments on the Neyman–Fisher controversy and its consequences.” DOI:10.1214/13-STS454SUPP.
  • Splawa-Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 465–472.
  • Sukhatme, P. (1935). Comment on “Statistical problems in agricultural experimentation (with discussion).” Suppl. J. Roy. Statist. Soc. Ser. B 2 166–169.
  • Tukey, J. (1949). One degree of freedom for nonadditivity. Biometrics 5 232–242.
  • Tukey, J. (1955). Query 113. Biometrics 11 111–113.
  • Welch, B. (1937). On the z-test in randomized blocks and latin squares. Biometrika 29 21–52.
  • Wilk, M. B. (1955). The randomization analysis of a generalized randomized block design. Biometrika 42 70–79.
  • Wilk, M. B. and Kempthorne, O. (1957). Non-additivities in a Latin square design. J. Amer. Statist. Assoc. 52 218–236.
  • Wu, C. F. J. and Hamada, M. S. (2009). Experiments: Planning, Analysis, and Optimization, 2nd ed. Wiley, Hoboken, NJ.
  • Yates, F. (1935). Complex experiments. J. R. Stat. Soc. Ser. B Stat. Methodol. 2 181–247.
  • Yates, F. (1939). The comparative advantages of systematic and randomized arrangements in the design of agricultural and biological experiments. Biometrika 30 440–466.

Supplemental materials

  • Supplementary material: Supplementary materials for “Comments on the Neyman–Fisher Controversy and its Consequences”. The supplementary material contains our reworking of Neyman’s calculations, specifically expectations and variances of sample averages, and expectations of sums of squares for RCB and LS designs. These calculations form the basis of all results presented in this article. The supplementary material can be accessed via the following link: