Importance sampling has been reported to produce algorithms with excellent empirical performance in counting problems. However, the theoretical support for its efficiency in these applications has been very limited. In this paper, we propose a methodology that can be used to design efficient importance sampling algorithms for counting and test their efficiency rigorously. We apply our techniques after transforming the problem into a rare-event simulation problem—thereby connecting complexity analysis of counting problems with efficiency in the context of rare-event simulation. As an illustration of our approach, we consider the problem of counting the number of binary tables with fixed column and row sums, cj’s and ri’s, respectively, and total marginal sums d=∑jcj. Assuming that max jcj=o(d1/2), ∑cj2=O(d) and the rj’s are bounded, we show that a suitable importance sampling algorithm, proposed by Chen et al. [J. Amer. Statist. Assoc. 100 (2005) 109–120], requires O(d3ɛ−2δ−1) operations to produce an estimate that has ɛ-relative error with probability 1−δ. In addition, if max jcj=o(d1/4−δ0) for some δ0>0, the same coverage can be guaranteed with O(d3ɛ−2log(δ−1)) operations.
References
Asmussen, S. and Glynn, P. W. (2007). Stochastic Simulation: Algorithms and Analysis. Stochastic Modelling and Applied Probability 57. Springer, New York.
Bayati, M., Kim, J. and Saberi, A. (2007). A Sequential Algorithm for Generating Random Graphs. Lecture Notes in Computer Science 4627. 326–340. Springer, Berlin.
Békéssy, A., Békéssy, P. and Komlós, J. (1972). Asymptotic enumeration of regular matrices. Studia Sci. Math. Hungar. 7 343–353.
Bezáková, I., Bhatnagar, N. and Vigoda, E. (2006). Sampling binary contingency tables with a greedy start. In Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms 414–423. ACM, New York.
Bezáková, I., Sinclair, A., Štefankovič, D. and Vigoda, E. (2007). Negative examples for sequential importance sampling of binary contingency tables. In Algorithms—ESA 2006. Lecture Notes in Computer Science 4168 136–147. Springer, Berlin.
Blanchet, J. and Glynn, P. (2008). Efficient rare-event simulation for the maximum of heavy-tailed random walks. Ann. Appl. Probab. 18 1351–1378.
Blanchet, J. and Liu, J. C. (2008). State-dependent importance sampling for regularly varying random walks. Adv. in Appl. Probab. 40 1104–1128.
Blitzstein, J. and Diaconis, P. (2008). A sequential importance sampling algorithm for generating random graphs with prescribed degrees. Preprint.
Botev, Z. I. and Kroese, D. P. (2008). Non-asymptotic bandwidth selection for density estimation of discrete data. Methodol. Comput. Appl. Probab. 10 435–451.
Bucklew, J. A. (2004). Introduction to Rare Event Simulation. Springer, New York.
Chen, S. X. and Liu, J. S. (1997). Statistical applications of the Poisson-binomial and conditional Bernoulli distributions. Statist. Sinica 7 875–892.
Chen, X.-H., Dempster, A. P. and Liu, J. S. (1994). Weighted finite population sampling to maximize entropy. Biometrika 81 457–469.
Chen, Y., Diaconis, P., Holmes, S. P. and Liu, J. S. (2005). Sequential Monte Carlo methods for statistical analysis of tables. J. Amer. Statist. Assoc. 100 109–120.
Doob, J. L. (1957). Conditional Brownian motion and the boundary limits of harmonic functions. Bull. Soc. Math. France 85 431–458.
Mathematical Reviews (MathSciNet):
MR109961
Glynn, P. W. and Iglehart, D. L. (1989). Importance sampling for stochastic simulations. Management Sci. 35 1367–1392.
Greenhill, C., McKay, B. D. and Wang, X. (2006). Asymptotic enumeration of sparse 0–1 matrices with irregular row and column sums. J. Combin. Theory Ser. A 113 291–324.
Jerrum, M. (2003). Counting, Sampling and Integrating: Algorithms and Complexity. Birkhäuser, Basel.
Juneja, S. and Shahabuddin, P. (2006). Rare event simulation techniques: An introduction and recent advances. In Handbook on Simulation (S. Henderson and B. Nelson, eds.) 291–350. Elsevier, Amsterdam.
Kannan, R., Tetali, P. and Vempala, S. (1997). Simple Markov-chain algorithms for generating bipartite graphs and tournaments (extended abstract). In Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms (New Orleans, LA, 1997) 193–200. ACM, New York.
Kim, J. H. and Vu, V. H. (2003). Generating random regular graphs. In Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing 213–222. ACM, New York.
L’Ecuyer, P., Blanchet, J., Glynn, P. and Tuffin, B. (2008). Efficient rare-event simulation for the maximum of heavy-tailed random walks. Ann. Appl. Probab. 18 1351–1378.
Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, New York.
McKay, B. D. (1984). Asymptotics for 0–1 matrices with prescribed line sums. In Enumeration and Design (Waterloo, Ont., 1982) 225–238. Academic Press, Toronto, ON.
Mathematical Reviews (MathSciNet):
MR782316
Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Springer, London.
Mitzenmacher, M. and Upfal, E. (2005). Probability and Computing: Randomized Algorithms and Probabilistic Analysis. Cambridge Univ. Press, Cambridge.
Rubinstein, R. Y. (2007). How many needles are in a hay stack or how to solve fast #P-complete counting problems. Methodol. Comput. Appl. Probab. 11 5–49.
Sinclair, A. (1993). Algorithms for Random Generation and Counting. Birkhäuser Boston, Boston, MA.
Valiant, L. G. (1979). The complexity of computing the permanent. Theoret. Comput. Sci. 8 189–201.
Mathematical Reviews (MathSciNet):
MR526203