Genotypes of infectious organisms are becoming the foundation for
epidemiologic studies of infectious disease. Central to the use of such data is
a means for comparing genotypes. We develop methods for this purpose in the
context of DN fingerprint genotyping of tuberculosis, but our approach is
applicable to many fingerprintbased genotyping systems and/or
organisms. Data available on replicate (laboratory) strains here reveal that
(i) error in fingerprint band size is proportional to band size and (ii) errors
are positively correlated within a fingerprint. Comparison (or matching) scores
computed to account for this error structure need to be
“standardized” in order to properly rank the comparisons. We
demonstrate the utility of using extreme value distributions to effect such
standardization. Several estimation issues for the extreme value parameters are
discussed, including a lack of robustness of (approximate) maximum likelihood
estimates. Interesting findings to emerge from examination of quantiles of
standardized matching scores include (i) formal significance is not attainable
when querying a database for a given fingerprint pattern and (ii) maximal
matching probabilities are not necessarily monotonely decreasing with
increasing numbers of fingerprint bands.
References
Altschul, S. F. and Erickson, B. W. (1986). A nonlinear measure of subalignment similarity and its significance levels. Bull. Math. Biol. 48 617-632.
Mathematical Reviews (MathSciNet):
MR884751
Altschul, S. F. and Gish, W. (1996). Local alignment statistics. Methods in Enzymology 266 460-480.
Bickel, P. and Doksum, K. (1977). Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day, San Francisco.
Centers for Disease Control and Prevention. (1994). Addressing Emerging Infectious Disease Threats: A Prevention Strategy for the United States. U.S. Department of Health and Human Services, Public Health Service, Atlanta, GA.
Dembo, A. and Karlin S. (1991). Strong limit theorems of empirical functionals for large exceedances of partial sums of iid variables. Ann. Probab. 19 1737-1755. Edlin, B. R., Tokars, J. I., Grieco, M. H., Crawford, J. T., Williams, J., Sordillo, E. M., Ong, K. R., Kilburn, J. O.,
Dooley, S. W. and Holmberg, S. D. (1992). An outbreak of multidrug-resistant tuberculosis among hospitalized patients with the acquired immunodeficiency syndrome. New England J. Medicine 326 1514-1521.
Efron, B., Halloran, E. and Holmes, S. (1996). Bootstrap confidence levels for phylogenetic trees. Proc. Nat. Acad. Sci. U.S.A. 93 7085-7090.
Eriksen, B. and Svensmark, O. (1993). DNA profiling of strains in criminal cases: analysis of measurement errors and band shift. Forensic Sci. Internat. 61 21-34.
Instituteof Medicine. (1992). Emerging Infections: Microbial Threats to Health in the United States. National Academy Press, Washington, DC.
Johnson, N. L. and Kotz, S. (1970). Distributions in Statistics: Continuous Univariate Distributions 1. Wiley, New York.
Kimball, B. F. (1956). The bias in certain estimates of the parameters of the extreme value distribution. Ann. Math. Statist. 27 758-767.
Mott, R. (1992). Maximum likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54 59-75.
Nei, M. and Li, W. H. (1979). Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Nat. Acad. Sci. U.S.A. 76 5269-5273.
Newton, M. A. (1996). Bootstrapping phylogenies: large deviations and dispersion effects. Biometrika 83 315-328.
Roeder, K. (1994). DNA fingerprinting: a review of the controversy. Statist. Sci. 9 222-278.
Salamon, H., Segal, M. R. and Small, P. M. (1998). Automated comparison and clustering of bacterial DNA fragment-based genotypes. Emerging Infectious Disease 4 159-168.
Small, P. M. (1995). Towards an understanding of the global migration of tuberculosis. J. Infectious Disease 171 1593- 1594. Small, P. M., Hopewell, P. C., Singh, S. P., Paz, A., Parsonnet, J., Ruston, D. C., Schecter, G. F., Daley, C. L. and
Schoolnik, G. K. (1994). The epidemiology of tuberculosis in San Francisco: a population-based study using conventional and molecular methods. N. England J. Medicine. 330 1703-1709. Small, P. M., Mcclenny, N. B., Singh, S. P., Schoolnik, G.
K., Tompkins, L. S. and Mickelsen, P. A. (1993). Molecular strain typing of Mycobacterium tuberculosis to confirm cross-contamination in the AFB laboratory and modification of procedures to minimize occurrence of false positive cultures. J. Clinical Microbiol. 31 1677-1682.
Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. J. Molecular Biol. 147 195- 197.
Sudbury, A. W., Marinopoulos, J. and Gunn, P. (1993). Assessing the evidential value of DNA profiles matching without the assumption of independent loci. J. Forensic Sci. Soc. 33 73-82.
Thom, H. C. S. (1968). Direct and Inverse Tables of the Gamma Distribution. Environmental Data Service, Silver Spring, MD. Van Embden, J. D. A., Cave, M. D., Crawford, J. T., Dale, J. W., Eisenach, K. D., Gicquel, B., Hermans,
P., Martin, C., McAdam, R. and Shinnick, T. M. (1993). Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. J. Clinical Microbiol. 31 406-409.
Waterman, M. S. and Vingron, M. (1994). Sequence comparison significance and Poisson approximation. Statist. Sci. 9 367- 381. Woellfer, G. B., Bradford, W. Z., Paz, A. and Small, P. M.
(1995). A computer assisted molecular epidemiologic approach for confronting the re-emergence of tuberculosis. Amer. J. Medical Sci. 311 17-22.