## The Annals of Statistics

### Markov chains for Monte Carlo tests of genetic equilibrium in multidimensional contingency tables

#### Abstract

Hardy-Weinberg equilibrium and linkage equilibrium are fundamental concepts in population genetics. In practice, testing linkage equilibrium in haplotype data is equivalent to testing independence in a large, sparse, multidimensional contingency table. Testing Hardy-Weinberg and linkage equilibrium simultaneously on multilocus genotype data introduces the additional complications of missing information and symmetry constraints on marginal probabilities. To avoid unreliable large-sample approximations for sparse contingency tables, one can use exact tests like Fisher's classical test that condition on observed marginal totals. Unfortunately, computing p-values for exact tests is often infeasible because of the large number of tables consistent with the marginal totals of an observed table. We develop here Markov chains for sampling from the appropriate conditional distributions for testing genetic equilibrium. These chains compare favorably with a parallel, independent-sampling method that we present. For n haplotype observations on J loci, the Markov chains converge to their stationary distributions in $[(J - 1)n 1n n]/2 + O(n)$ steps and can be an efficient tool for estimating p-values. Our theoretical treatment of these results involves strong stationary stopping times, order statistics, large deviations and the embedding of Poisson processes. We include some general results on the application of strong stationary times to bounding the precision and bias of sample average estimators.

#### Article information

Source
Ann. Statist. Volume 25, Number 1 (1997), 138-168.

Dates
First available in Project Euclid: 10 October 2002

http://projecteuclid.org/euclid.aos/1034276624

Digital Object Identifier
doi:10.1214/aos/1034276624

Mathematical Reviews number (MathSciNet)
MR1429920

Zentralblatt MATH identifier
0871.62094

#### Citation

Lazzeroni, Laura C.; Lange, Kenneth. Markov chains for Monte Carlo tests of genetic equilibrium in multidimensional contingency tables. Ann. Statist. 25 (1997), no. 1, 138--168. doi:10.1214/aos/1034276624. http://projecteuclid.org/euclid.aos/1034276624.

#### References

• AGRESTI, A. 1992. A survey of exact inference for contingency tables. Statist. Sci. 7 131 177. Z.
• ALDOUS, D. and DIACONIS, P. 1986. Shuffling cards and stopping times. Amer. Math. Monthly 93 333 348. Z.
• BARNARD, G. 1963. Discussion of The spectral analysis of point processes'' by M. S. Bartlett. J. Roy. Statist. Soc. Ser. B 25 294. Z.
• BESAG, J. and CLIFFORD, P. 1989. Generalized Monte Carlo significance tests. Biometrika 76 633 642. Z.
• BLOM, G. and HOLST, L. 1991. Embedding procedures for discrete problems in probability. Math. Sci. 16 27 40. Z.
• BOy ETT, J. M. 1979. Random R C tables with given row and columns totals. J. Roy. Statist. Soc. Ser. C 28 329 332. Z. CAVALLI-SFORZA, L. L. and BODMER, W. F. 1971. The Genetics of Human Populations. Freeman, San Francisco. Z.
• CROW, J. E. 1988. Eighty years ago: the beginnings of population genetics. Genetics 119 473 476. Z.
• DAVID, H. A. 1981. Order Statistics. Wiley, New York. Z.
• DIACONIS, P. 1988. Group Representations in Probability and Statistics. IMS, Hay ward, CA. Z.
• DIACONIS, P. and STURMFELS, B. 1996. Algebraic algorithms for sampling from conditional distributions. Unpublished manuscript. Z.
• ELSTON, R. and FORTHOFER, R. 1977. Testing for Hardy Weinberg equilibrium in small samples. Biometrics 33 536 542. Z.
• EMIGH, T. 1980. A comparison of tests for Hardy Weinberg equilibrium. Biometrics 36 627 642. Z.
• FELLER, W. 1968. An Introduction to Probability and Its Applications 1, 3rd ed. Wiley, New York. Z.
• GORADIA, T. M., LANGE, K., MILLER, P. L. and NADKARNI, P. M. 1992. Fast computation of genetic likelihoods on human pedigree data. Human Heredity 42 42 62. Z.
• GRAHAM, R. L., KNUTH, D. E. and PATASHNIK, O. 1989. Concrete Mathematics. Addison-Wesley, Reading, MA. Z.
• GUO, S. and THOMPSON, E. 1992. Performing the exact test of Hardy Weinberg proportion for multiple alleles. Biometrics 48 361 372. Z.
• HALDANE, J. 1954. An exact test for randomness of mating. Journal of Genetics 52 631 635. Z.
• HASTINGS, W. K. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97 109. Z.
• KARLIN, S. and TAy LOR, H. M. 1975. A First Course in Stochastic Processes. Academic Press, New York. Z.
• KELLY, F. P. 1979. Reversibility and Stochastic Networks. Wiley, New York. Z.
• KOLASSA, J. E. and TANNER, M. A. 1994. Approximate conditional inference in exponential families via the Gibbs sampler. J. Amer. Statist. Assoc. 89 697 702. Z.
• LANGE, K. 1993. A stochastic model for genetic linkage equilibrium. Theoret. Population Biol. 44 129 148.
• LAZZERONI, L. C., ARNHEM, N., SCHMITT, K. and LANGE, K. 1994. Multipoint mapping calculations for sperm-ty ping data. American Journal of Human Genetics 55 431 436. Z.
• LEVENE, H. 1949. On a matching problem arising in genetics. Ann. Math. Statist. 20 91 94. Z.
• LI, C. C. 1955. Population Genetics. Univ. Chicago Press. Z.
• LOUIS, E. and DEMPSTER, E. 1987. An exact test for Hardy Weinberg and multiple alleles. Biometrics 43 805 811. Z.
• MATTHEWS, P. 1988. A strong uniform time for random transpositions. J. Theoret. Probab. 1 411 423. Z.
• NIJENHUIS, A. and WILF, H. S. 1978. Combinatorial Algorithms. Academic Press, New York. Z.
• SEN, P. K. and SINGER, J. M. 1993. Large Sample Methods in Statistics. Chapman & Hall, New York. Z.
• VERBEEK, A. and KROONENBERG, P. M. 1985. A survey of algorithms for exact distributions of test statistics in r c contingency tables with fixed margins. Comput. Statist. Data Anal. 3 159 285. Z.
• WEIR, B. S. 1990. Genetic Data Analy sis. Sinauer Associates, Sunderland. Z.
• WEIR, B. S. and BROOKS, L. D. 1986. Disequilibrium on human chromosome 11p. Genetic Z. Epidemiology Supplement 1 177 183.
• STANFORD, CALIFORNIA 94305 ANN ARBOR, MICHIGAN 48109 E-MAIL: laura@play fair.stanford.edu E-MAIL: klange@sph.umich.edu