The Annals of Statistics

Algebraic algorithms for sampling from conditional distributions

Persi Diaconis and Bernd Sturmfels

Full-text: Open access

Abstract

We construct Markov chain algorithms for sampling from discrete exponential families conditional on a sufficient statistic. Examples include contingency tables, logistic regression, and spectral analysis of permutation data. The algorithms involve computations in polynomial rings using Gröbner bases.

Article information

Source
Ann. Statist. Volume 26, Number 1 (1998), 363-397.

Dates
First available: 28 August 2002

Permanent link to this document
http://projecteuclid.org/euclid.aos/1030563990

Mathematical Reviews number (MathSciNet)
MR1608156

Digital Object Identifier
doi:10.1214/aos/1030563990

Zentralblatt MATH identifier
0952.62088

Subjects
Primary: 6E17 13P10: Gröbner bases; other bases for ideals and modules (e.g., Janet and border bases)

Keywords
Conditional inference Monte Carlo Markov chain exponential families Gröbner bases

Citation

Diaconis, Persi; Sturmfels, Bernd. Algebraic algorithms for sampling from conditional distributions. The Annals of Statistics 26 (1998), no. 1, 363--397. doi:10.1214/aos/1030563990. http://projecteuclid.org/euclid.aos/1030563990.


Export citation

References

  • AGRESTI, A. 1990. Categorical Data Analy sis. Wiley, New York. Z.
  • AGRESTI, A. 1992. A survey of exact inference for contingency tables. Statist. Sci. 7 131 177. Z.
  • ALDOUS, D. 1987. On the Markov chain stimulation method for uniform combinatorial distributions and simulated annealing. Probab. Engrg. Infom. Sci. 1 33 46. Z.
  • ANDREWS, D. and HERZBERG, A. 1985. Data. Springer, New York. Z.
  • BAGLIVIO, J., OLIVIER, D. and PAGANO, M. 1988. Methods for the analysis of contingency tables with large and small cell counts. J. Amer. Statist. Assoc. 83 1006 1013. Z.
  • BAGLIVIO, J., OLIVIER, D. and PAGANO, M. 1992. Methods for exact goodness-of-fit tests. J. Amer. Statist. Assoc. 87 464 469.Z.
  • BAGLIVIO, J., OLIVIER, D. and PAGANO, M. 1993. Analy sis of discrete data: rerandomization methods and complexity. Technical report, Dept. Mathematics, Boston College. Z.
  • BAy ER, D. and STILLMAN, M. 1989. MACAULAY: a computer algebra sy stem for algebraic geometry. Available via anony mous ftp from zariski.harvard.edu. Z.
  • BELISLE, C., ROMEIJN, H. and SMITH, R. 1993. Hit and run algorithms for generating multivariate distributions. Math. Oper. Res. 18 255 266. Z.
  • BESAG, J. and CLIFFORD, P. 1989. Generalized Monte Carlo significance tests. Biometrika 76 633 642. Z.
  • BIRCH, B. W. 1963. Maximum likelihood in three-way contingency tables. J. Roy. Statist. Soc. Ser. B 25 220 233. Z.
  • BISHOP, Y., FINEBERG, S. and HOLLAND, P. 1975. Discrete Multivariate Analy sis. MIT Press. Z.
  • BJORNER, A., LAS VERGNAS, M., STURMFELS, B., WHITE, N. and ZIEGLER, G. 1993. Oriented ¨ Matroids. Cambridge Univ. Press. Z.
  • BOKOWSKI, J. and RICHTER-GEBERT, J. 1990. On the finding of final poly nomials. European J. Combin. 11 21 34. Z.
  • BOKOWSKI, J. and RICHTER-GEBERT, J. 1991. On the classification of non-realizable oriented matroids. II. Preprint, T. H. Darmstadt. Z.
  • BOKENHOLT, U. 1993. Applications of Thurstonian models to ranking data. Probability Models ¨ and Statistical Analy sis for Ranking Data. Lecture Notes in Statist. 80 157 172. Springer, New York. Z.
  • BROWN, L. D. 1990. An ancillarity paradox which appears in multiple linear regression. Ann. Statist. 18 471 538. Z.
  • CHRISTENSEN, R. 1990. Log-Linear Models. Springer, New York. Z.
  • CHUNG, F., GRAHAM, R. and YAU, S. T. 1996. On sampling with Markov chains. Random Structures Algorithms 9 55 77. Z.
  • COHEN, A., KEMPERMAN, J. and SACKROWITZ, H. 1994. Unbiased testing in exponential family regression. Ann. Statist. 22 1931 1946. Z.
  • CONTI, P. and TRAVERSO, C. 1991. Buchberger algorithm and integer programming. Proceedings AAECC-9. Lecture Notes in Comp. Sci. 539 130 139. Springer, New York. Z.
  • COX, D. 1958. Some problems connected with statistical inference. Ann. Math. Statist. 29 357 372. Z.
  • COX, D. 1988. Some aspects of conditional and asy mptotic inference. Sankhy a Ser. A 50 314 337.
  • COX, D., LITTLE, J. and O'SHEA, D. 1992. Ideals, Varieties, and Algorithms. Springer, New York. Z.
  • CROON, M. 1989. Latent class models for the analysis of rankings. In New Developments in Z. Psy chological Choice Modeling G. De Solte, H. Feger and K. C. Klauer, eds. 99 121. North-Holland, Amsterdam. Z.
  • DARROCH, J., LAURITZEN, S. and SPEED T. 1980. Markov fields a log-linear interaction models for contingency tables. Ann. Statist. 8 522 539. Z.
  • DIACONIS, P. 1988. Group Representations in Probability and Statistics. IMS, Hay ward, CA. Z.
  • DIACONIS, P. 1989. A generalization of spectral analysis with application to ranked data. Ann. Statist. 17 949 979. Z.
  • DIACONIS, P. and EFRON, B. 1985. Testing for independence in a two-way table: new interpretations for the chi-square statistic. Ann. Statist. 13 845 905. Z.
  • DIACONIS, P. and EFRON, B. 1987. Probabilistic-geometric theorems arising from the analysis of contingency tables. In Contributions to the Theory and Application of Statistics: A Z. Volume in Honor of Herbert Solomon A. Gelfand, ed.. Academic Press, New York. Z.
  • DIACONIS, P., EISENBUD, D. and HOLMES, S. 1997. Speeding up algebraic random walks. Dept. Mathematics, Brandeis Univ. Preprint. Z.
  • DIACONIS, P., EISENBUD, D. and STURMFELS, B. 1996. Lattice walks and primary decompositions. Z. In Proceedings of the Rota Fest B. Sagan, ed.. To appear. Z.
  • DIACONIS, P. and FREEDMAN, D. 1987. A dozen deFinetti-sty le results in search of a theory. Ann. Inst. H. Poincare 23 397 423. ´Z.
  • DIACONIS, P. and GANGOLLI, A. 1995. Rectangular array s with fixed margins. In Discrete Z. Probability and Algorithms D. Aldous, et al., eds.. 15 41. Springer, New York. Z.
  • DIACONIS, P., GRAHAM, R. and STURMFELS, B. 1996. Primitive partition identities. Combinatorics. Paul Erdos Is Eighty 2 173 192. Z.
  • DIACONIS, P., HOLMES, S. and NEALE, R. 1997. A nonreversible Markov chain sampling method. Technical report, Biometry, Cornell Univ. Z.
  • DIACONIS, P. and RABINOWITZ, A. 1997. Conditional inference for logistic regression. Technical report, Stanford Univ. Z.
  • DIACONIS, P. and SALOFF-COSTE, L. 1995a. Random walk on contingency tables with fixed row and column sums. Dept. Mathematics, Harvard Univ., Preprint. Z.
  • DIACONIS, P. and SALOFF-COSTE, L. 1995b. What do we know about the Metropolis algorithm. Technical report, Dept. Mathematics, Harvard Univ. Z.
  • DIACONIS, P. and SALOFF-COSTE, L. 1996a. Nash inequalities for finite Markov chains. J. Theoret. Probab. 9 459 510. Z.
  • DIACONIS, P. and SALOFF-COSTE, L. 1996b. Logarithmic Sobolev inequalities for finite Markov chains. Ann. Appl. Probab. 6 695 750. Z.
  • DIACONIS, P. and STROOCK, D. 1991. Geometric bounds for eigenvalues of Markov chains. Ann. Appl. Probab. 1 36 61. Z.
  • Dy ER, R., KANNAN, R. and MOUNT, J. 1995. Sampling contingency tables. Random Structures Algorithms. To appear. Z.
  • EFRON, B. and HINKLEY, D. 1978. Assessing the accuracy of the MLE: observed versus expected Z. Fisher information with discussion. Biometrika 65 457 487. Z.
  • FARRELL, R. 1971. The necessity that a conditional procedure be almost every where admissible. Z. Wahrsch. Verw. Gebiete 19 57 66. Z. Z.
  • FISHER, R. 1925. Statistical Methods for Research Workers, 1st ed. 14th ed. 1970. Oliver and Boy d, Edinburgh. Z.
  • FISHER, R. 1950. The significance of deviations from expectation in a Poisson series. Biometrics 6 17 24. Z.
  • FISHER, R., THORNTON, H. and MACKENZIE, N. 1922. The accuracy of the plating method of estimating the density of bacterial populations. Ann. Appl. Biology 9 325 359. Z.
  • FULTON, W. 1993. Introduction to Toric Varieties. Princeton Univ. Press. Z.
  • GANGOLLI, A. 1991. Convergence bounds for Markov chains and applications to sampling. Ph.D. thesis, Dept. Computer Science, Stanford Univ.
  • GLONEK, G. 1987. Some aspects of log linear models. Ph.D. thesis, School of Math. Sciences, Flinders Univ. South Australia. Z.
  • GOODMAN, L. 1970. The multivariate analysis of qualitative data: interactions among multiple classifications. J. Amer. Statist. Assoc. 65 226 256. Z.
  • GUO, S. and THOMPSON, E. 1992. Performing the exact test for Hardy Weinberg proportion for multiple alleles. Biometrics 48 361 372. Z.
  • HABERMAN, S. 1978. Analy sis of Qualitative Data 1, 2. Academic Press, Orlando, FL. Z.
  • HAMMERSLY, J. and HANDSCOMB, D. 1964. Monte Carlo Methods. Wiley, New York. Z.
  • HARRIS, J. 1992. Algebraic Geometry: A First Course. Springer, New York. Z.
  • HERNEK, D. 1997. Random generation and counting of rectangular array s with fixed margins. Dept. Mathematics, Preprint, UCLA. Z.
  • HOLMES, R. and JONES, L. 1996. On uniform generation of two-way tables with fixed margins and the conditional volume test of Diaconis and Efron. Ann. Statist. 24 64 68. Z.
  • HOLMES, S. 1995. Examples for Stein's method. Preprint, Dept. Statistics, Stanford Univ. Z.
  • JENSEN, J. 1991. Uniform saddlepoint approximations and log-convex densities. J. Roy. Statist. Soc. Ser. B 157 172. Z. Z.
  • KIEFER, J. 1977. Conditional confidence statements and confidence estimators with discussion. J. Amer. Statist. Assoc. 72 789 827. Z.
  • KOLASSA, J. and TANNER, M. 1994. Approximate conditional inference in exponential families via the Gibbs sample. J. Amer. Statist. Assoc. 89 697 702. Z.
  • KOLASSA, J. and TANNER, M. 1996. Approximate Monte Carlo conditional inference. Dept. Statistics, Northwestern Univ. Preprint. Z.
  • KONG, F. 1993. Edgeworth expansions for conditional distributions in logistic regression models. Technical report, Dept. Statistics, Columbia Univ. Z.
  • KONG, F. and LEVIN, B. 1993. Edgeworth expansions for the sum of discrete random vectors and their applications in generalized linear models. Technical report, Dept. Statistics, Columbia Univ. Z.
  • LANGE, K. and LAZZERONI, L. 1997. Markov chains for Monte Carlo tests of genetic equilibrium in multidimensional contingency tables. Ann. Statist. To appear. Z.
  • LARNTZ, K. 1978. Small-sample comparison of exact levels for chi-squared goodness-of-fit statistics. J. Amer. Statist. Assoc. 73 253 263. Z.
  • LAURITZEN, S. 1996. Graphical Models. Oxford Univ. Press. Z.
  • LEHMANN, E. 1986. Testing Statistical Hy potheses, 2nd ed. Wiley, New York. Z.
  • LEVIN, B. 1992. On calculations involving the maximum cell frequency. Comm. Statist. Z.
  • LEVIN, B. 1992. Tests of odds ratio homogeneity with improved power in sparse fourfold tables. Comm. Statist. Theory Methods 21 1469 1500. Z.
  • MARDEN, J. 1995. Analy zing and Modeling Rank Data. Chapman and Hall, London. Z.
  • MAy R, E. and MEy ER, A. 1982. The complexity of the word problem for commutative semigroups and poly nomial ideals. Adv. in Math. 46 305 329. Z.
  • MCCULLOGH, P. 1985. On the asy mptotic distribution of Pearson's statistic in linear exponential family models. International Statistical Review 53 61 67. Z.
  • MCCULLOUGH, P. 1986. The conditional distribution of goodness-to-fit statistics for discrete data. J. Amer. Statist. Assoc. 81 104 107. Z.
  • MEHTA, C. and PATEL, N. 1983. A network algorithm for performing Fisher's exact test in r c contingency tables. J. Amer. Statist. Assoc. 78 427 434. Z.
  • NEy MAN, J. 1937. Outline of a theory of statistical estimation based on the classical theory of probability. Philos. Trans. 236 333 380. Z.
  • ODOROFF, C. 1970. A comparison of minimum logit chi-square estimation and maximum likelihood estimation in 2 2 2 and 3 2 2 contingency tables: tests for interaction. J. Amer. Statist. Assoc. 65 1617 1631. Z.
  • PROPP, J. and WILSON, D. 1986. Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures Algorithms 9 232 252. Z.
  • REID, N. 1995. The roles of conditioning in inference. Statist. Sci. 10 138 199. Z. Z.
  • SAVAGE, L. 1976. On rereading R. A. Fisher with discussion. Ann. Statist. 4 441 450. Z.
  • SCHRIJVER, A. 1986. Theory of Linear and Integer Programming. Wiley, New York.
  • SINCLAIR, A. 1993. Algorithms for Random Generation and Counting: A Markov Chain Approach. Birkhauser, Boston. ¨ Z.
  • SKOVGAARD, I. 1987. Saddlepoint expansions for conditional distributions. J. Appl. Probab. 24 875 887. Z. SNEE 1974. Graphical display of two-way contingency tables. Amer. Statist. 38 9 12. Z.
  • STANLEY, R. 1980. Decompositoin of rational convex poly topes. Ann. Discrete Math. 6 333 342. Z.
  • STEIN, C. 1986. Approximate Computation of Expectations. IMS, Hay ward, CA. Z.
  • STURMFELS, B. 1991. Grobner bases of toric varieties. Tohoko Math. J. 43 249 261. ¨ Z.
  • STURMFELS, B. 1992. Asy mptotic analysis of toric ideals. Mem. Fac. Sci. Ky ushu Univ. Ser. A 46 217 228. Z.
  • STURMFELS, B. 1996. Grobner Bases and Convex Poly topes. Amer. Math. Soc., Providence, RI. ¨ Z.
  • THOMAS, R. 1995. A geometric Buchberger algorithm for integer programming. Math. Oper. Res. 20 864 884. Z.
  • VIRAG, B. 1997. Random walks on finite convex sets of lattice points. Technical report, Dept. Statistics, Univ. California, Berkeley. Z.
  • WEISPFENNING, V. 1987. Admissible orders and linear forms. ACM SIGSAM Bulletin 21 16 18. Z. 2
  • YARNOLD, J. 1970. The minimum expectation in X goodness-of-fit tests and the accuracy of approximations for the null distribution. J. Amer. Statist. Assoc. 65 864 886. Z.
  • YATES, F. 1984. Tests of significance for 2 2 contingency tables. J. Roy. Statist. Soc. Ser. A 147 426 463.
  • CORNELL UNIVERSITY BERKELEY, CALIFORNIA 94720
  • ITHACA, NEW YORK 14853 E-MAIL: ims@math.cornell.edu