Comparing DNA Fingerprints of Infectious Organisms



Statistical Science

Comparing DNA Fingerprints of Infectious Organisms

Hugh Salamon, Mark R. Segal, and Peter M. Small

Source: Statist. Sci. Volume 15, Number 1 (2000), 27-45.

Abstract

Genotypes of infectious organisms are becoming the foundation for epidemiologic studies of infectious disease. Central to the use of such data is a means for comparing genotypes. We develop methods for this purpose in the context of DN fingerprint genotyping of tuberculosis, but our approach is applicable to many fingerprint­based genotyping systems and/or organisms. Data available on replicate (laboratory) strains here reveal that (i) error in fingerprint band size is proportional to band size and (ii) errors are positively correlated within a fingerprint. Comparison (or matching) scores computed to account for this error structure need to be “standardized” in order to properly rank the comparisons. We demonstrate the utility of using extreme value distributions to effect such standardization. Several estimation issues for the extreme value parameters are discussed, including a lack of robustness of (approximate) maximum likelihood estimates. Interesting findings to emerge from examination of quantiles of standardized matching scores include (i) formal significance is not attainable when querying a database for a given fingerprint pattern and (ii) maximal matching probabilities are not necessarily monotonely decreasing with increasing numbers of fingerprint bands.

Keywords: Extreme value distribution; genotyping; maximum likelihood estimation; moment estimation; tuberculosis

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1009212672
Digital Object Identifier: doi:10.1214/ss/1009212672

References

Altschul, S. F. and Erickson, B. W. (1986). A nonlinear measure of subalignment similarity and its significance levels. Bull. Math. Biol. 48 617-632.
Mathematical Reviews (MathSciNet): MR884751
Zentralblatt MATH: 0606.92015
Altschul, S. F. and Gish, W. (1996). Local alignment statistics. Methods in Enzymology 266 460-480.
Bickel, P. and Doksum, K. (1977). Mathematical Statistics: Basic Ideas and Selected Topics. Holden-Day, San Francisco.
Centers for Disease Control and Prevention. (1994). Addressing Emerging Infectious Disease Threats: A Prevention Strategy for the United States. U.S. Department of Health and Human Services, Public Health Service, Atlanta, GA.
Dembo, A. and Karlin S. (1991). Strong limit theorems of empirical functionals for large exceedances of partial sums of iid variables. Ann. Probab. 19 1737-1755. Edlin, B. R., Tokars, J. I., Grieco, M. H., Crawford, J. T., Williams, J., Sordillo, E. M., Ong, K. R., Kilburn, J. O.,
Dooley, S. W. and Holmberg, S. D. (1992). An outbreak of multidrug-resistant tuberculosis among hospitalized patients with the acquired immunodeficiency syndrome. New England J. Medicine 326 1514-1521.
Efron, B., Halloran, E. and Holmes, S. (1996). Bootstrap confidence levels for phylogenetic trees. Proc. Nat. Acad. Sci. U.S.A. 93 7085-7090.
Zentralblatt MATH: 0871.62092
Eriksen, B. and Svensmark, O. (1993). DNA profiling of strains in criminal cases: analysis of measurement errors and band shift. Forensic Sci. Internat. 61 21-34.
Instituteof Medicine. (1992). Emerging Infections: Microbial Threats to Health in the United States. National Academy Press, Washington, DC.
Johnson, N. L. and Kotz, S. (1970). Distributions in Statistics: Continuous Univariate Distributions 1. Wiley, New York.
Kimball, B. F. (1956). The bias in certain estimates of the parameters of the extreme value distribution. Ann. Math. Statist. 27 758-767.
Mathematical Reviews (MathSciNet): MR18,159b
Mott, R. (1992). Maximum likelihood estimation of the statistical distribution of Smith-Waterman local sequence similarity scores. Bull. Math. Biol. 54 59-75.
Nei, M. and Li, W. H. (1979). Mathematical model for studying genetic variation in terms of restriction endonucleases. Proc. Nat. Acad. Sci. U.S.A. 76 5269-5273.
Zentralblatt MATH: 0408.92003
Newton, M. A. (1996). Bootstrapping phylogenies: large deviations and dispersion effects. Biometrika 83 315-328.
Roeder, K. (1994). DNA fingerprinting: a review of the controversy. Statist. Sci. 9 222-278.
Mathematical Reviews (MathSciNet): MR1293296
Salamon, H., Segal, M. R. and Small, P. M. (1998). Automated comparison and clustering of bacterial DNA fragment-based genotypes. Emerging Infectious Disease 4 159-168.
Small, P. M. (1995). Towards an understanding of the global migration of tuberculosis. J. Infectious Disease 171 1593- 1594. Small, P. M., Hopewell, P. C., Singh, S. P., Paz, A., Parsonnet, J., Ruston, D. C., Schecter, G. F., Daley, C. L. and
Schoolnik, G. K. (1994). The epidemiology of tuberculosis in San Francisco: a population-based study using conventional and molecular methods. N. England J. Medicine. 330 1703-1709. Small, P. M., Mcclenny, N. B., Singh, S. P., Schoolnik, G.
K., Tompkins, L. S. and Mickelsen, P. A. (1993). Molecular strain typing of Mycobacterium tuberculosis to confirm cross-contamination in the AFB laboratory and modification of procedures to minimize occurrence of false positive cultures. J. Clinical Microbiol. 31 1677-1682.
Smith, T. F. and Waterman, M. S. (1981). Identification of common molecular subsequences. J. Molecular Biol. 147 195- 197.
Sudbury, A. W., Marinopoulos, J. and Gunn, P. (1993). Assessing the evidential value of DNA profiles matching without the assumption of independent loci. J. Forensic Sci. Soc. 33 73-82.
Thom, H. C. S. (1968). Direct and Inverse Tables of the Gamma Distribution. Environmental Data Service, Silver Spring, MD. Van Embden, J. D. A., Cave, M. D., Crawford, J. T., Dale, J. W., Eisenach, K. D., Gicquel, B., Hermans,
P., Martin, C., McAdam, R. and Shinnick, T. M. (1993). Strain identification of Mycobacterium tuberculosis by DNA fingerprinting: recommendations for a standardized methodology. J. Clinical Microbiol. 31 406-409.
Waterman, M. S. and Vingron, M. (1994). Sequence comparison significance and Poisson approximation. Statist. Sci. 9 367- 381. Woellfer, G. B., Bradford, W. Z., Paz, A. and Small, P. M.
Mathematical Reviews (MathSciNet): MR1325433
(1995). A computer assisted molecular epidemiologic approach for confronting the re-emergence of tuberculosis. Amer. J. Medical Sci. 311 17-22.

2008 © Institute of Mathematical Statistics