Fisher (1943) claimed that the expected value of the sample variance of the number of species found in large samples, each of n specimens taken from the same population, is asymptotically . This is at odds with the value obtained directly from the Ewens Sampling Formula (ESF), where θ specifies the rate at which new species are found. To resolve this apparent contradiction, we assume the species frequency spectrum in the population is determined by the ESF and that the samples are disjoint subsets drawn sequentially from this single population. We find an explicit formula for the required expected value for p samples of arbitrary size; in the limit of large equally-sized samples, it indeed has the value . We obtain limit theorems for the sample variance of p samples of size n under various limiting regimes as or both tend to ∞. We discuss further the behavior of the number of species present in all samples, and revisit Fisher’s log-series distribution as the limiting distribution of the number of specimens observed in typical species in a future, large sample.
We thank Stephen Senn for bringing the Anscombe-Fisher correspondence to our attention. We thank two reviewers and the associate editor for comments that improved the paper. PHdS and ST were supported in part by NSF grant DMS-2030562.
Poly H. da Silva. Arash Jamshidpey. Peter McCullagh. Simon Tavaré. "Fisher’s measure of variability in repeated samples." Bernoulli 29 (2) 1166 - 1194, May 2023. https://doi.org/10.3150/22-BEJ1494