On the Analysis of Samples from $k$ Lists

Leo A. Goodman

doi:10.1214/aoms/1177729345

December, 1952 On the Analysis of Samples from $k$ Lists

Leo A. Goodman

Ann. Math. Statist. 23(4): 632-634 (December, 1952). DOI: 10.1214/aoms/1177729345

Abstract

Suppose we have $k$ lists of names, no name appearing more than once in each list. We are interested in estimating the following parameters: (a) the number of names occurring in common in pairs, triples, $\cdots$, of lists; (b) the number of names occurring in $1, 2, \cdots, k$ lists. This note presents unbiased estimators for these parameters when a random sample is drawn from each list. It is also observed that the estimators presented are the only real-valued statistics which are unbiased estimators of the parameters, and hence must be the minimum variance unbiased estimators. This yields another example in which "insufficient" statistics have been used to obtain minimum variance unbiased estimators. These unbiased estimators may at times give unreasonable estimates. In such cases, it is suggested that the statistics be modified so that the nearest reasonable estimate is used. Although this procedure introduces some bias, it usually reduces the mean square error. This problem arises when we are interested in tracing the interrelations of agencies through the individual members. The problem also arises in the work of H. H. Fussler and J. M. Dawson of the University Library, University of Chicago, who are interested in comparing the acquisitions of various libraries. For special problems other sampling schemes may be more economical or more efficient than taking a sample from each list. Professor F. F. Stephan of Princeton University pointed out to the author that, in the special case of the "library problem," the Book Catalog and author cards used by many libraries provide a convenient means of drawing matched samples. (There is a brief discussion of this kind of sampling problem on page 571 of [1].) A sampling scheme based on the last digit or two of the serial number of the cards could be used to search each library reference file for the same list of books. Special provision must be made for accessions made outside the sampling period and for books not covered by the Library of Congress cards. The analysis presented herein deals with the case in which (either for good, bad, or no reasons) a random sample has been drawn from each list. The restriction that no name appear more than once in each list may be weakened to obtain somewhat more general results. The problem discussed in this paper was brought to the author's attention by Professor W. Allen Wallis of the University of Chicago.*

Citation

Download Citation

Leo A. Goodman. "On the Analysis of Samples from $k$ Lists." Ann. Math. Statist. 23 (4) 632 - 634, December, 1952. https://doi.org/10.1214/aoms/1177729345