Deficiency

J. L. Hodges Jr.; E. L. Lehmann

doi:10.1214/aoms/1177696959

June, 1970 Deficiency

J. L. Hodges Jr., E. L. Lehmann

Ann. Math. Statist. 41(3): 783-801 (June, 1970). DOI: 10.1214/aoms/1177696959

Abstract

Consider a statistical procedure (Method $A$) which is based on $n$ observations, and a less effective procedure (Method $B$) which requires a larger number $k_n$ of observations to give equally good performance. Comparison of the two methods involves the comparison of $k_n$ with $n$, and this can be carried out in various ways. Perhaps the most natural quantity to examine is the difference $k_n - n$, the number of additional observations required by the less effective method. Such difference comparisons have been performed from time to time. (See, for example, Fisher (1925), Walsh (1949) and Pearson (1950).) Historically, however, comparisons have been based mainly on the ratio $k_n/n$. Thus, Fisher (1920), in comparing the mean absolute deviation with the mean squared deviation as estimates of a normal scale, found this ratio to be 1/1.14. Similarly in 1925 he found a large-sample ratio of $2/\pi$ for median compared with mean for estimating normal location, and the same value was found by Cochran (1937) for the sign test relative to the $t$-test in the normal case. The reason for using the ratio rather than the difference in these cases is of course that the ratio is stable in large samples, so that a single limit value, say $e = \lim_{n\rightarrow\infty} n/k_n$, known as the asymptotic relative efficiency or ARE of $B$ with respect to $A$, conveys a great deal of useful information in a compact form. When $e < 1$, the difference $k_n - n$, is not a useful measure in large samples because it tends to infinity with $n$. The situation is however different, and in a sense even the reverse, in the many important statistical problems in which $e = 1$. It is then possible also for the difference to be stable, and the main purpose of this paper is to point out a number of problems in which this is the case. For the additional number $k_n - n$ of observations needed by Method $B$ we suggest the term deficiency. If it exists, the limit value $d = \lim_{n\rightarrow\infty}(k_n - n)$, will be called the asymptotic deficiency. The number $d$ summarizes the comparison much more revealingly in these cases than does the fact that $e$ equals 1. (If it is not known a priori, the latter does not even tell us which of the two procedures is better.) Of course, $d$ is less easy to compute than $e$ since it in effect requires computing an additional term in the asymptotic expansions.