Abstract
Hardy-Weinberg equilibrium and linkage equilibrium are fundamental concepts in population genetics. In practice, testing linkage equilibrium in haplotype data is equivalent to testing independence in a large, sparse, multidimensional contingency table. Testing Hardy-Weinberg and linkage equilibrium simultaneously on multilocus genotype data introduces the additional complications of missing information and symmetry constraints on marginal probabilities. To avoid unreliable large-sample approximations for sparse contingency tables, one can use exact tests like Fisher's classical test that condition on observed marginal totals. Unfortunately, computing p-values for exact tests is often infeasible because of the large number of tables consistent with the marginal totals of an observed table. We develop here Markov chains for sampling from the appropriate conditional distributions for testing genetic equilibrium. These chains compare favorably with a parallel, independent-sampling method that we present. For n haplotype observations on J loci, the Markov chains converge to their stationary distributions in $[(J - 1)n 1n n]/2 + O(n)$ steps and can be an efficient tool for estimating p-values. Our theoretical treatment of these results involves strong stationary stopping times, order statistics, large deviations and the embedding of Poisson processes. We include some general results on the application of strong stationary times to bounding the precision and bias of sample average estimators.
Citation
Laura C. Lazzeroni. Kenneth Lange. "Markov chains for Monte Carlo tests of genetic equilibrium in multidimensional contingency tables." Ann. Statist. 25 (1) 138 - 168, February 1997. https://doi.org/10.1214/aos/1034276624
Information