The Annals of Applied Probability

Efficient importance sampling for binary contingency tables

Jose H. Blanchet

Source: Ann. Appl. Probab. Volume 19, Number 3 (2009), 949-982.

Abstract

Importance sampling has been reported to produce algorithms with excellent empirical performance in counting problems. However, the theoretical support for its efficiency in these applications has been very limited. In this paper, we propose a methodology that can be used to design efficient importance sampling algorithms for counting and test their efficiency rigorously. We apply our techniques after transforming the problem into a rare-event simulation problem—thereby connecting complexity analysis of counting problems with efficiency in the context of rare-event simulation. As an illustration of our approach, we consider the problem of counting the number of binary tables with fixed column and row sums, cj’s and ri’s, respectively, and total marginal sums d=∑jcj. Assuming that max jcj=o(d1/2), ∑cj2=O(d) and the rj’s are bounded, we show that a suitable importance sampling algorithm, proposed by Chen et al. [J. Amer. Statist. Assoc. 100 (2005) 109–120], requires O(d3ɛ−2δ−1) operations to produce an estimate that has ɛ-relative error with probability 1−δ. In addition, if max jcj=o(d1/4−δ0) for some δ0>0, the same coverage can be guaranteed with O(d3ɛ−2log(δ−1)) operations.

Primary Subjects: 68W20, 60J20
Secondary Subjects: 05A16, 05C30, 62Q05
Keywords: Approximate counting; bipartate graphs; binary tables; importance sampling; Markov processes; Doob h-transform; changes-of-measure; rare-event simulation

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoap/1245071015
Digital Object Identifier: doi:10.1214/08-AAP558
Zentralblatt MATH identifier: 05580229
Mathematical Reviews number (MathSciNet): MR2537195

References

Asmussen, S. and Glynn, P. W. (2007). Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability 57. Springer, New York.
Mathematical Reviews (MathSciNet): MR2331321
Bayati, M., Kim, J. and Saberi, A. (2007). A Sequential Algorithm for Generating Random Graphs. Lecture Notes in Computer Science 4627. 326–340. Springer, Berlin.
Békéssy, A., Békéssy, P. and Komlós, J. (1972). Asymptotic enumeration of regular matrices. Studia Sci. Math. Hungar. 7 343–353.
Bezáková, I., Bhatnagar, N. and Vigoda, E. (2006). Sampling binary contingency tables with a greedy start. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms 414–423. ACM, New York.
Bezáková, I., Sinclair, A., Štefankovič, D. and Vigoda, E. (2007). Negative examples for sequential importance sampling of binary contingency tables. In Algorithms—ESA 2006. Lecture Notes in Computer Science 4168 136–147. Springer, Berlin.
Blanchet, J. and Glynn, P. (2008). Efficient rare-event simulation for the maximum of heavy-tailed random walks. Ann. Appl. Probab. 18 1351–1378.
Mathematical Reviews (MathSciNet): MR2434174
Digital Object Identifier: doi:10.1214/07-AAP485
Project Euclid: euclid.aoap/1216677125
Blanchet, J. and Liu, J. C. (2008). State-dependent importance sampling for regularly varying random walks. Adv. in Appl. Probab. 40 1104–1128.
Mathematical Reviews (MathSciNet): MR2488534
Digital Object Identifier: doi:10.1239/aap/1231340166
Project Euclid: euclid.aap/1231340166
Blitzstein, J. and Diaconis, P. (2008). A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Preprint.
Botev, Z. I. and Kroese, D. P. (2008). Non-asymptotic bandwidth selection for density estimation of discrete data. Methodol. Comput. Appl. Probab. 10 435–451.
Mathematical Reviews (MathSciNet): MR2415127
Digital Object Identifier: doi:10.1007/s11009-007-9057-z
Bucklew, J. A. (2004). Introduction to Rare Event Simulation. Springer, New York.
Mathematical Reviews (MathSciNet): MR2045385
Chen, S. X. and Liu, J. S. (1997). Statistical applications of the Poisson-binomial and conditional Bernoulli distributions. Statist. Sinica 7 875–892.
Mathematical Reviews (MathSciNet): MR1488647
Chen, X.-H., Dempster, A. P. and Liu, J. S. (1994). Weighted finite population sampling to maximize entropy. Biometrika 81 457–469.
Mathematical Reviews (MathSciNet): MR1311090
Digital Object Identifier: doi:10.1093/biomet/81.3.457
Chen, Y., Diaconis, P., Holmes, S. P. and Liu, J. S. (2005). Sequential Monte Carlo methods for statistical analysis of tables. J. Amer. Statist. Assoc. 100 109–120.
Mathematical Reviews (MathSciNet): MR2156822
Digital Object Identifier: doi:10.1198/016214504000001303
Doob, J. L. (1957). Conditional Brownian motion and the boundary limits of harmonic functions. Bull. Soc. Math. France 85 431–458.
Mathematical Reviews (MathSciNet): MR109961
Glynn, P. W. and Iglehart, D. L. (1989). Importance sampling for stochastic simulations. Management Sci. 35 1367–1392.
Mathematical Reviews (MathSciNet): MR1024494
Digital Object Identifier: doi:10.1287/mnsc.35.11.1367
Greenhill, C., McKay, B. D. and Wang, X. (2006). Asymptotic enumeration of sparse 0–1 matrices with irregular row and column sums. J. Combin. Theory Ser. A 113 291–324.
Mathematical Reviews (MathSciNet): MR2199276
Digital Object Identifier: doi:10.1016/j.jcta.2005.03.005
Jerrum, M. (2003). Counting, Sampling and Integrating: Algorithms and Complexity. Birkhäuser, Basel.
Mathematical Reviews (MathSciNet): MR1960003
Juneja, S. and Shahabuddin, P. (2006). Rare event simulation techniques: An introduction and recent advances. In Handbook on Simulation (S. Henderson and B. Nelson, eds.) 291–350. Elsevier, Amsterdam.
Kannan, R., Tetali, P. and Vempala, S. (1997). Simple Markov-chain algorithms for generating bipartite graphs and tournaments (extended abstract). In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (New Orleans, LA, 1997) 193–200. ACM, New York.
Mathematical Reviews (MathSciNet): MR1447665
Kim, J. H. and Vu, V. H. (2003). Generating random regular graphs. In Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing 213–222. ACM, New York.
Mathematical Reviews (MathSciNet): MR2121044
L’Ecuyer, P., Blanchet, J., Glynn, P. and Tuffin, B. (2008). Efficient rare-event simulation for the maximum of heavy-tailed random walks. Ann. Appl. Probab. 18 1351–1378.
Mathematical Reviews (MathSciNet): MR2434174
Digital Object Identifier: doi:10.1214/07-AAP485
Project Euclid: euclid.aoap/1216677125
Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, New York.
Mathematical Reviews (MathSciNet): MR1842342
McKay, B. D. (1984). Asymptotics for 0–1 matrices with prescribed line sums. In Enumeration and Design (Waterloo, Ont., 1982) 225–238. Academic Press, Toronto, ON.
Mathematical Reviews (MathSciNet): MR782316
Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.
Mathematical Reviews (MathSciNet): MR1287609
Mitzenmacher, M. and Upfal, E. (2005). Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR2144605
Rubinstein, R. Y. (2007). How many needles are in a hay stack or how to solve fast #P-complete counting problems. Methodol. Comput. Appl. Probab. 11 5–49.
Sinclair, A. (1993). Algorithms for Random Generation and Counting. Birkhäuser Boston, Boston, MA.
Mathematical Reviews (MathSciNet): MR1201590
Valiant, L. G. (1979). The complexity of computing the permanent. Theoret. Comput. Sci. 8 189–201.
Mathematical Reviews (MathSciNet): MR526203
Digital Object Identifier: doi:10.1016/0304-3975(79)90044-6

2009 © Institute of Mathematical Statistics