Genotypes of infectious organisms are becoming the foundation for epidemiologic studies of infectious disease. Central to the use of such data is a means for comparing genotypes. We develop methods for this purpose in the context of DN fingerprint genotyping of tuberculosis, but our approach is applicable to many fingerprintbased genotyping systems and/or organisms. Data available on replicate (laboratory) strains here reveal that (i) error in fingerprint band size is proportional to band size and (ii) errors are positively correlated within a fingerprint. Comparison (or matching) scores computed to account for this error structure need to be “standardized” in order to properly rank the comparisons. We demonstrate the utility of using extreme value distributions to effect such standardization. Several estimation issues for the extreme value parameters are discussed, including a lack of robustness of (approximate) maximum likelihood estimates. Interesting findings to emerge from examination of quantiles of standardized matching scores include (i) formal significance is not attainable when querying a database for a given fingerprint pattern and (ii) maximal matching probabilities are not necessarily monotonely decreasing with increasing numbers of fingerprint bands.
"Comparing DNA Fingerprints of Infectious Organisms." Statist. Sci. 15 (1) 27 - 45, 1 February 2000. https://doi.org/10.1214/ss/1009212672