It is now coming to be generally agreed that in testing for shift in the two-sample problem, certain tests based on ranks have considerable advantage over the classical $t$-test. From the beginning, rank tests were recognized to have one important advantage: their significance levels are exact under the sole assumption that the samples are randomly drawn (or that the assignment of treatments to subjects is performed at random), whereas the $t$-test in effect is exact only when we are dealing with random samples from normal distributions. On the other hand, it was felt that this advantage had to be balanced against the various optimum properties possessed by the $t$-test under the assumption of normality. It is now being recognized that these optimum properties are somewhat illusory and that, under realistic assumptions about extreme observations or gross errors, the $t$-test in practice may well be less efficient than such rank tests as the Wilcoxon or normal scores test , . Rank tests were naturally developed first for the simple two-sample problem, but in practice experiments for the evaluation of population or treatment differences are seldom of this form. To secure the advantages of increased homogeneity and the resultant increased precision, it is more customary to stratify the populations or to divide the experimental subjects into blocks, using a (generalized) randomized block design. In other experiments where additivity of certain effects is assumed, a design of the Latin square type may be appropriate. It is the purpose of the present paper to provide a method for constructing rank tests for such designs. The basic idea of the method is described in Section 3. The main body of the paper is concerned with the application of the method to the comparison of two treatments using a Wilcoxon-type test statistic. The exact distribution of this statistic is discussed in Section 4, and its asymptotic distribution in Sections 5 and 6. Some remarks concerning the efficiency of the test are given in Section 7. Finally, Section 8 illustrates the application of the method to the comparison of more than two treatments, for such designs as incomplete blocks and Latin squares.
"Rank Methods for Combination of Independent Experiments in Analysis of Variance." Ann. Math. Statist. 33 (2) 482 - 497, June, 1962. https://doi.org/10.1214/aoms/1177704575