Abstract
A new approach is presented to the problem of determining whether an individual (the target) appears in a large file where individuals are identified by measurements subject to error. This approach attaches costs to searching and to missing the individual. It corresponds to testing a simple hypothesis, that the measurements on the target and an element in the library have a given joint distribution, against the alternative that they are independent. Certain measures of information from large deviation theory are relevant. There is a surprising reduction in effectiveness of information in the presence of error. Data compression issues are studied. Attention is paid to a two-stage search procedure where the file is subdivided into piles which are in turn subdivided into bins. Each pile is examined and either discarded or searched. If it is searched, each bin in it is examined and either discarded or searched. If a bin is searched, each element of the bin is compared with the target.
Citation
Herman Chernoff. "The Identification of an Element of a Large Population in the Presence of Noise." Ann. Statist. 8 (6) 1179 - 1197, November, 1980. https://doi.org/10.1214/aos/1176345193
Information