The Annals of Applied Statistics

Maximum likelihood estimates under k-allele models with selection can be numerically unstable

Erkan Ozge Buzbas and Paul Joyce

Source: Ann. Appl. Stat. Volume 3, Number 3 (2009), 1147-1162.

Abstract

The stationary distribution of allele frequencies under a variety of Wright–Fisher k-allele models with selection and parent independent mutation is well studied. However, the statistical properties of maximum likelihood estimates of parameters under these models are not well understood. Under each of these models there is a point in data space which carries the strongest possible signal for selection, yet, at this point, the likelihood is unbounded. This result remains valid even if all of the mutation parameters are assumed to be known. Therefore, standard simulation approaches used to approximate the sampling distribution of the maximum likelihood estimate produce numerically unstable results in the presence of substantial selection. We describe the Bayesian alternative where the posterior distribution tends to produce more accurate and reliable interval estimates for the selection intensity at a locus.

Keywords: Selective overdominance; heterozygote advantage; multiple allele models; maximum likelihood; posterior analysis; instability

Full-text: Access denied (no subscription detected)

In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1254773282
Digital Object Identifier: doi:10.1214/09-AOAS237

References

Allison, A. C. (1956). The sickle-cell and haemoglobin c genes in some African populations. Am. Hum. Genet. 21 67–89.
Cavalli-Sforza, L. L. and Bodmer, W. F. (1971). The Genetics of Human Populations. Dover, Mineola, NY.
Donnelly, P., Nordborg, M. and Joyce, P. (2001). Likelihood and simulation methods for a class of nonneutral population genetics models. Genetics 159 853–867.
Ewens, W. J. (2004). Mathematical Population Genetics: I. Theoretical Introduction, 2nd ed. Interdisciplinary Applied Mathematics 27. Springer, New York.
Fearnhead, P. and Donnelly, P. (2001). Estimating recombination rates from population genetic data. Genetics 159 1299–1318.
Genz, A. and Joyce, P. (2003). Computation of the normalization constant for exponentially weighted Dirichlet Distribution. Comput. Sci. Statist. 35 557–563.
Gillespie, J. (1999). The role of population size in molecular evolution. Theor. Popul. Biol. 55 145–156.
Griffiths, R. C. and Marjoram, P. (1997). An ancestral recombination graph. In Progress in Population Genetics and Human Evolution, IMA Volumes in Mathematics and Its Applications. (P. Donnelly and S. Tavaré, eds.) 257–270. Springer, Berlin.
Harding, R. M., Fullerton, S. M. and Griffiths, R. C. (1997). Archaic African and Asian lineages in the genetic ancestry of modern humans. Am. Jour. Hum. Genet. 60 772–789.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97–109.
Joyce, P., Genz, A. and Buzbas, E. O. (2009). Efficient simulation methods for a class of nonneutral population genetics models. Theor. Popul. Biol. To appear.
Joyce, P., Krone, S. M. and Kurtz, T. G. (2003). When can one detect overdominant selection in the infinite-alleles model? Ann. Appl. Probab. 13 181–212.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21 1087–1092.
Neuhauser, C. (1999). The Ancestral Graph and Gene Genealogy under Frequency-Dependent Selection. Theor. Popul. Biol. 56 203–214.
Nordborg, M. (2000). Linkage disequilibrium, gene trees and selfing: An ancestral recombination graph with partial self-fertilization. Genetics 154 923–929.
Norman, P. J., Cook, M. A., Carey, B. S., Carrington, C. V. F., Verity, D. H., Hameed, K., Ramdath, D. D., Chandanayingyong, D., Leppert, M., Stephens, H. A. F. and Vaughan, R. W. (2004). SNP haplotypes and allele frequencies show evidence for disruptive and balancing selection in the human leukocyte receptor complex. Immuno-Genetics 56 225–237.
Padhukasahasram, B., Marjoram, P., Wall, J. D., Bustamante, C. and Nordborg, M. (2008). Exploring population genetic models with recombination using efficient forward-time simulations. Genetics 178 2417–2427.
Qiu, W. G., Bosler, E., Campbell, J., Ugine, G., Wang, I., Benjamin, J. L. and Dykhuzen, D. E. (1997). A population genetic study of Borrelia burgdorferi sensu stricto from eastern long island, New York, suggested frequency-dependent selection, gene flow and host adaptation. Hereditas 127 203–216.
Snow, R. W., Guerra, C. A., Noor, A. M., Myint, H. Y. and Hay, S. I. (2005). The global distribution of clinical episodes of plasmodium falciparum malaria. Nature 434 214–217.
Wakeley, J. (2005). The limits of population genetics. Genetics 169 1–7.
Watterson, G. A. (1977). Heterosis or neutrality? Genetics 85 789–814.
Wright, S. (1949). Adaptation and selection. In Genetics, Paleontology, and Evolution (G. L. Jepson, G. G. Simpson and E. Mayr, eds.) 365–389. Princeton Univ. Press, Princeton.

2009 © Institute of Mathematical Statistics