The Annals of Statistics

Exact sampling and counting for fixed-margin matrices

Jeffrey W. Miller and Matthew T. Harrison

Full-text: Open access

Abstract

The uniform distribution on matrices with specified row and column sums is often a natural choice of null model when testing for structure in two-way tables (binary or nonnegative integer). Due to the difficulty of sampling from this distribution, many approximate methods have been developed. We will show that by exploiting certain symmetries, exact sampling and counting is in fact possible in many nontrivial real-world cases. We illustrate with real datasets including ecological co-occurrence matrices and contingency tables.

Article information

Source
Ann. Statist., Volume 41, Number 3 (2013), 1569-1592.

Dates
First available in Project Euclid: 1 August 2013

Permanent link to this document
https://projecteuclid.org/euclid.aos/1375362560

Digital Object Identifier
doi:10.1214/13-AOS1131

Mathematical Reviews number (MathSciNet)
MR3113822

Zentralblatt MATH identifier
1292.62083

Subjects
Primary: 62H15: Hypothesis testing
Secondary: 62H17: Contingency tables 05A15: Exact enumeration problems, generating functions [See also 33Cxx, 33Dxx]

Keywords
Exact sampling exact counting binary matrix contingency table integer points in polyhedra

Citation

Miller, Jeffrey W.; Harrison, Matthew T. Exact sampling and counting for fixed-margin matrices. Ann. Statist. 41 (2013), no. 3, 1569--1592. doi:10.1214/13-AOS1131. https://projecteuclid.org/euclid.aos/1375362560


Export citation

References

  • Anand, H., Dumir, V. C. and Gupta, H. (1966). A combinatorial distribution problem. Duke Math. J. 33 757–769.
  • Atmar, W. and Patterson, B. D. (1995). The nestedness temperature calculator: A visual basic program, including 294 presence–absence matrices. AICS Research Incorporated and The Field Museum. Available at http://www.aics-research.com/nestedness/tempcalc.html.
  • Barvinok, A. I. (1994). A polynomial time algorithm for counting integral points in polyhedra when the dimension is fixed. Math. Oper. Res. 19 769–779.
  • Beck, M. and Pixton, D. (2003). The Ehrhart polynomial of the Birkhoff polytope. Discrete Comput. Geom. 30 623–637.
  • Canfield, E. R., Greenhill, C. and McKay, B. D. (2008). Asymptotic enumeration of dense 0–1 matrices with specified line sums. J. Combin. Theory Ser. A 115 32–66.
  • Canfield, E. R. and McKay, B. D. (2005). Asymptotic enumeration of dense 0–1 matrices with equal row sums and equal column sums. Electron. J. Combin. 12 Research Paper 29, 31 pp. (electronic).
  • Chen, Y., Diaconis, P., Holmes, S. P. and Liu, J. S. (2005). Sequential Monte Carlo methods for statistical analysis of tables. J. Amer. Statist. Assoc. 100 109–120.
  • Connor, E. F. and Simberloff, D. (1979). The assembly of species communities: Chance or competition? Ecology 60 1132–1140.
  • De Loera, J. A. and Sturmfels, B. (2003). Algebraic unimodular counting. Math. Program. 96 183–203.
  • De Loera, J. A., Hemmecke, R., Tauzer, J. and Yoshida, R. (2004). Effective lattice point counting in rational convex polytopes. J. Symbolic Comput. 38 1273–1302.
  • Diaconis, P. and Efron, B. (1985). Testing for independence in a two-way table: New interpretations of the chi-square statistic. Ann. Statist. 13 845–913.
  • Diaconis, P. and Gangolli, A. (1995). Rectangular arrays with fixed margins. In Discrete Probability and Algorithms (Minneapolis, MN, 1993). IMA Vol. Math. Appl. 72 15–41. Springer, New York.
  • Diamond, J. M. (1975). Assembly of species communities. In Ecology and Evolution of Communities 342–444. Harvard Univ. Press, Cambridge, MA.
  • Dyer, M., Kannan, R. and Mount, J. (1997). Sampling contingency tables. Random Structures Algorithms 10 487–506.
  • Gail, M. and Mantel, N. (1977). Counting the number of $r\times c$ contingency tables with fixed margins. J. Amer. Statist. Assoc. 72 859–862.
  • Gale, D. (1957). A theorem on flows in networks. Pacific J. Math. 7 1073–1082.
  • Galton, F. (1889). Natural Inheritance. MacMillan, New York.
  • Gessel, I. M. (1987). Enumerative applications of symmetric functions. Séminaire Lotharingien de Combinatoire B17a 5–21.
  • Gessel, I. M. (1990). Symmetric functions and P-recursiveness. J. Combin. Theory Ser. A 53 257–285.
  • Gotelli, N. J. and McCabe, D. J. (2002). Species co-occurrence: A meta-analysis of JM Diamond’s assembly rules model. Ecology 83 2091–2096.
  • Greenhill, C., McKay, B. D. and Wang, X. (2006). Asymptotic enumeration of sparse 0–1 matrices with irregular row and column sums. J. Combin. Theory Ser. A 113 291–324.
  • Harrison, M. T. and Miller, J. W. (2013). Importance sampling for weighted binary random matrices with specified margins. Preprint. Available at arXiv:1301.3928 [stat.CO].
  • Holmes, R. B. and Jones, L. K. (1996). On uniform generation of two-way tables with fixed margins and the conditional volume test of Diaconis and Efron. Ann. Statist. 24 64–68.
  • Johnsen, B. and Straume, E. (1987). Counting binary matrices with given row and column sums. Math. Comp. 48 737–750.
  • MacMahon, P. A. (1915). Combinatory Analysis I, II. Cambridge Univ. Press, London.
  • McKay, B. D. (1983). Applications of a technique for labelled enumeration. In Proceedings of the Fourteenth Southeastern Conference on Combinatorics, Graph Theory and Computing (Boca Raton, Fla., 1983) Congr. Numer. 40 207–221.
  • McKay, B. D. and Wormald, N. C. (1990). Uniform generation of random regular graphs of moderate degree. J. Algorithms 11 52–67.
  • Miller, J. W. and Harrison, M. T. (2011). Exact enumeration and sampling of matrices with specified margins. Unpublished manuscript. Available at arXiv:1104.0323 [stat.CO].
  • Mount, J. (2000). Fast unimodular counting. Combin. Probab. Comput. 9 277–285.
  • Patterson, B. and Atmar, W. (1986). Nested subsets and the structure of insular mammalian faunas and archipelagos. Biological Journal of the Linnean Society 28 65–82.
  • Pérez-Salvador, B. R., de-los Cobos-Silva, S., Gutiérrez-Andrade, M. A. andTorres-Chazaro, A. (2002). A reduced formula for the precise number of $(0,1)$-matrices in ${\mathscr{A}}(\mathbf{R},\mathbf{S})$. Discrete Math. 256 361–372.
  • Read, R. C. (1959). The enumeration of locally restricted graphs. I. J. London Math. Soc. 34 417–436.
  • Read, R. C. (1960). The enumeration of locally restricted graphs. II. J. London Math. Soc. 35 344–351.
  • Redfield, J. H. (1927). The theory of group-reduced distributions. Amer. J. Math. 49 433–455.
  • Roberts, A. and Stone, L. (1990). Island-sharing by archipelago species. Oecologia 83 560–567.
  • Ryser, H. J. (1957). Combinatorial properties of matrices of zeros and ones. Canad. J. Math. 9 371–377.
  • Stanley, R. P. (1973). Linear homogeneous Diophantine equations and magic labelings of graphs. Duke Math. J. 40 607–632.
  • Ulrich, W. and Gotelli, N. J. (2007). Null model analysis of species nestedness patterns. Ecology 88 1824–1831.
  • Wang, B. Y. (1988). Precise number of $(0,1)$-matrices in $U(R,S)$. Scientia Sinica, Series A XXXI 1–6.
  • Wang, B.-Y. and Zhang, F. (1998). On the precise number of $(0,1)$-matrices in $U(R,S)$. Discrete Math. 187 211–220.