Includes: Ming-Hui Chen, Bruce W. Schmeiser. Discussion.
Includes: Oliver B. Downs. Discussion.
Includes: Antonietta Mira, Gareth O. Roberts. Discussion.
Includes: John Skilling, David J. C. MacKay. Discussion.
Includes: S. G. Walker. Discussion.
Includes: Radford M. Neal. Rejoinder.
References
ADLER, S. L. (1981). Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions. Phy s. Rev. D 23 2901-2904.
BARONE, P. and FRIGESSI, A. (1990). Improving stochastic relaxation for Gaussian random fields. Probab. Engrg. Inform. Sci. 4 369-389.
BESAG, J. and GREEN, P. J. (1993). Spatial statistics and Bayesian computation (with discussion). J. Roy. Statist. Soc. Ser. B 55 25-37, 53-102.
CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statist. 7 1-22.
DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344.
DIACONIS, P., HOLMES, S. and NEAL, R. M. (2000). Analy sis of a non-reversible Markov chain sampler. Ann. Appl. Probab. 10 726-752.
DOWNS, O. B., MACKAY, D. J. C. and LEE, D. D. (2000). The nonnegative Boltzmann machine. In Advances in Neural Information Processing Sy stems (S. A. Solla, T. K. Leen and K.-R. Muller, eds.) 428-434. MIT Press, Cambridge, MA.
DUANE, S., KENNEDY, A. D., PENDLETON, B. J. and ROWETH, D. (1987). Hy brid Monte Carlo. Phy s. Lett. B 195 216-222.
EDWARDS, R. G. and SOKAL, A. D. (1988). Generalization of the Fortuin-Kasteley n-Swendsen- Wang representation and Monte Carlo algorithm. Phy s. Rev. D 38 2009-2012.
FREY, B. J. (1997). Continuous sigmoidal belief networks trained using slice sampling. In Advances in Neural Information Processing Sy stems (M. C. Mozer, M. I. Jordan and T. Petsche, eds.). MIT Press, Cambridge, MA.
GELFAND, A. E. and SMITH, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398-409.
GEy ER, C. J. and THOMPSON, E. A. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909-920.
GILKS, W. R. (1992). Derivative-free adaptive rejection sampling for Gibbs sampling. In Bayesian Statistics (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 641-649. Oxford Univ. Press.
GILKS, W. R., BEST, N. G. and TAN, K. K. C. (1995). Adaptive rejection Metropolis sampling within Gibbs sampling. Appl. Statist. 44 455-472.
GILKS, W. R., NEAL, R. M., BEST, N. G. and TAN, K. K. C. (1997). Corrigendum: Adaptive rejection Metropolis sampling. Appl. Statist. 46 541-542.
GILKS, W. R. and WILD, P. (1992). Adaptive rejection sampling for Gibbs sampling. Appl. Statist. 41 337-348.
GREEN, P. J. and HAN, X. (1992). Metropolis methods, Gaussian proposals and antithetic variables. Stochastic Models, Statistical Methods, and Algorithms in Image Analy sis (P. Barone et al., eds.). Lecture Notes in Statist. 74 142-164. Springer, New York.
GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053.
HASTINGS, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97-109.
HIGDON, D. M. (1996). Auxiliary variable methods for Markov chain Monte Carlo with applications. ISDS discussion paper 96-17, Duke Univ.
HOROWITZ, A. M. (1991). A generalized guided Monte Carlo algorithm. Phy s. Lett. B 268 247-252.
LUNN, D. J., THOMAS, A., BEST, N. and SPIEGELHALTER, D. (2000). WinBUGSa Bayesian modelling framework: Concepts, structure, and extensibility. Statist. Comput. 10 325-337.
METROPOLIS, N., ROSENBLUTH, A. W., ROSENBLUTH, M. N., TELLER, A. H. and TELLER, E.
(1953). Equation of state calculations by fast computing machines. J. Chem. Phy s. 21 1087-1092.
MIRA, A. (1998). Ordering, splicing and splitting Monte Carlo Markov chains. Ph.D. dissertation, School of Statistics, Univ. Minnesota.
MIRA, A. and TIERNEY, L. (2002). Efficiency and convergence properties of slice samplers. Scand. J. Statist. 29 1-12.
NEAL, R. M. (1994). An improved acceptance procedure for the hy brid Monte Carlo algorithm. J. Comput. Phy s. 111 194-203.
NEAL, R. M. (1996). Bayesian Learning for Neural Networks. Lecture Notes in Statist. 118. Springer, New York.
NEAL, R. M. (1998). Suppressing random walks in Markov chain Monte Carlo using ordered overrelaxation. In Learning in Graphical Models (M. I. Jordan, ed.) 205-228. Kluwer, Dordrecht.
NEAL, R. M. (2001). Annealed importance sampling. Statist. Comput. 11 125-139.
ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643-660.
THOMAS, A., SPIEGELHALTER, D. J. and GILKS, W. R. (1992). BUGS: A program to perform Bayesian inference using Gibbs sampling. In Bayesian Statistics (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 837-842. Oxford Univ. Press.
TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
TORONTO, ONTARIO CANADA M5S 3G3 E-MAIL: radford@stat.utoronto.ca WEB: http://www.cs.utoronto.ca/ radford/
and Schmeiser (1998). For single-variable slice sampling, the variation of slice sampling proposed by Neal operates analogously to Gibbs sampling in the sense that to obtain the next point x1, y is generated from the conditional distribution [y|x0] given the current point x0 and then x1 is drawn from [x|y]. Both [y|x0] and [x|y] are uniform distributions. Since the closed form of the support of [x|y] is not available, sampling directly from [x|y] is not possible. A clever development is Neal's sophisticated (but relatively expensive) sampling procedure to generate x1 from the "slice" S = {x : y < f (x)}. In Chen and Schmeiser (1998), we proposed random-direction interior point (RDIP), a general sampler designed to be "black box" in the sense that the user need not tune the sampler to the problem. RDIP samples from the uniform distribution defined over the region U below the curve of the surface defined by f (x). Both slice sampling and RDIP require evaluations of f (x). Slice sampling, however, can be more expensive than RDIP because slice sampling requires evaluating f (x) more than once per iteration. The intention of RDIP's design is to use as much free information as possible. For the high-dimensional case, the hy perrectangle idea in slice sampling could be inefficient. For example, suppose f (x) is the bivariate normal density with a high correlation. Then, the hy perrectangle idea essentially mimics the Gibbs sampler, which suffers slow convergence; see Chen and Schmeiser (1993) for a detailed discussion. Aligning the hy perrectangle (or ellipses) to the shape of f (x), along the lines of Kaufman and Smith (1998), seems like a good idea. As Neal mentions, the computational efficiency of our "black-box" sampler RDIP depends on the normalization constant. Our goal was to be automatic and reasonably efficient, rather than to tune the sampler to the problem. If, however,
CHEN, M.-H. and SCHMEISER, B. W. (1993). Performance of the Gibbs, hit-and-run, and Metropolis samplers. J. Comput. Graph. Statist. 2 251-272.
CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statist. 7 1-22.
NANDRAM, B. and CHEN, M.-H. (1996). Reparameterizing the generalized linear model to accelerate Gibbs sampler convergence. J. Statist. Comput. Simulation 54 129-144.
KAUFMAN, D. E. and SMITH, R. L. (1998). Direction choice for accelerated convergence in hit-and-run sampling. Oper. Res. 46 84-95.
STORRS, CONNECTICUT 06269-4120 E-MAIL: mhchen@stat.uconn.edu SCHOOL OF INDUSTRIAL ENGINEERING PURDUE UNIVERSITY 1287 GRISSOM HALL
WEST LAFAy ETTE, INDIANA 47907-1287 E-MAIL: bruce@purdue.edu
DOWNS, O. B. (2001). High-temperature expansions for learning models of nonnegative data. In Advances in Neural Information Processing Sy stems 13 (T. K. Leen, T. G. Dietterich and V. Tresp, eds.) 465-471. MIT Press, Cambridge, MA.
DOWNS, O. B., MACKAY, D. J. C. and LEE, D. D. (2000). The nonnegative Boltzmann machine. In Advances in Neural Information Processing Sy stems 12 (S. A. Solla, T. K. Leen and K.-R. Muller, eds.) 428-434. MIT Press, Cambridge, MA.
HINTON, G. E. and SEJNOWSKI, T. J. (1983). Optimal perceptual learning. In IEEE Conference on Computer Vision and Pattern Recognition 448-453. Washington.
KAPPEN, H. J. and RODRIGUEZ, F. B. (1998). Efficient learning in Boltzmann machines using linear response theory. Neural Computation 10 1137-1156.
LEE, D. D. and SEUNG, H. S. (1999). Learning the parts of objects by nonnegative matrix factorization. Nature 401 788-791.
MACKAY, D. J. C. (1998). Introduction to Monte Carlo methods. In Learning in Graphical Models (M. I. Jordan, ed.) 175-204. Kluwer, Dordrecht.
NEAL, R. M. (1997). Markov chain Monte Carlo methods based on "slicing" the density function. Technical Report 9722, Dept. Statistics, Univ. Toronto.
REDMOND, WASHINGTON 98052 E-MAIL: t-odowns@microsoft.com
Rosenthal (1999). Generalizing just a little from the setting described in Section 4 of the paper, suppose that our target density can be written as
the methodology of Roberts and Tweedie (2000). As an illustration of these results, it can be shown that, for the case where f0 is constant (i.e., in the single-variable slice sampler), and f1 is a real-valued log-concave function, 525 iterations suffice for convergence from all starting points x with f1(x) 0.01 supy f1(y). Similar results can be deduced for multidimensional log-concave distributions but the bounds worsen as dimension increases reflecting a genuine curse of dimensionality in this problem (despite the fact that this is inherently a twodimensional Gibbs sampler). To counteract this issue, Roberts and Rosenthal (2002) introduces the polar slice sampler in d dimensions, where f0(x) is chosen
can be constructed in the spirit of Kendall and Møller (2000). Reversibility of the slice sampler allows easy simulation of these processes backwards in time to identify the starting point of the maximal and minimal chains. The beauty of the perfect slice sampling construction relies on the possibility of coupling the maximal and minimal chains even on a continuous state space. This is achieved because, thanks to monotonicity, the minimal horizontal slice (i.e., the one defined by the minimal chain) is alway s a superset of the maximal horizontal slice. If, when sampling over the minimal horizontal slice, a point is selected that belongs to the intersection of the minimal and the maximal horizontal slices, instantaneous coupling happens. Examples of applications given in Mira, Møller and Roberts (2001) include the Ising model on a two-dimensional grid at the critical temperature and various other automodels. In Casella, Mengersen, Robert and Titterington (2002) a further application of the perfect slice sampler construction to mixture of distributions is studied.
to Fill (1998). A further improvement of the algorithm in Mira, Møller and Roberts (2001) allows the use of read once random numbers as introduced in
CASELLA, G., MENGERSEN, K. L., ROBERT, C. P. and TITTERINGTON, D. M. (2002). Perfect samplers for mixtures of distributions. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 777-790.
FILL, J. A. (1998). An interruptible algorithm for perfect sampling via Markov chains. Ann. Appl. Probab. 8 131-162.
GREEN, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732.
GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053.
KENDALL, W. S. and MØLLER, J. (2000). Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes. Adv. in Appl. Probab. 32 844-865.
MIRA, A. (1998). Ordering, slicing and splitting Monte Carlo Markov chains. Ph.D. dessertation, School of Statistics, Univ. of Minnesota.
MIRA, A. (2002). On Metropolis-Hastings algorithms with delay ed rejection. Metron 59 231-241.
MIRA, A., MØLLER, J. and ROBERTS, G. O. (2001). Perfect slice samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 593-606.
MIRA, A. and TIERNEY, L. (2002). Efficiency and convergence properties of slice samplers. Scand. J. Statist. 29 1-12.
PESKUN, P. H. (1973). Optimum Monte Carlo sampling using Markov chains. Biometrika 60 607-612.
PROPP, J. and WILSON, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures Algorithms 9 223-252.
ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643-660.
ROBERTS, G. O. and ROSENTHAL, J. S. (2001). Markov chains and deinitialising processes. Scand. J. Statist. 28 489-505.
ROBERTS, G. O. and ROSENTHAL, J. S. (2002). The polar slice sampler. Stoch. Models 18 257-280.
ROBERTS, G. O. and TWEEDIE, R. L. (2000). Rates of convergence of stochastically monotone and continuous time Markov models. J. Appl. Probab. 37 359-373.
TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
WILSON, D. B. (2000). How to couple from the past using a read-once source of randomness. Random Structures Algorithms 16 85-113.
which appear in Damien, Wakefield and Walker (1999). I took Neal's example in Section 8 and used a many-variable slice sampler on it. In fact I took a latent variable for each data point, the idea criticized by Neal. The overall joint density is given by
DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344.
ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Statist. Soc. Ser. B 61 643-660.
BATH, BA2 7AY UNITED KINGDOM E-MAIL: S.G.Walker@bath.ac.uk
CARACCIOLO, S., PELISSETTO, A. and SOKAL, A. D. (1994). A general limitation on Monte Carlo algorithms of Metropolis ty pe. Phy s. Rev. Lett. 72 179-182.
CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statistics 7 1-22.
DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344.
EDWARDS, R. G. and SOKAL, A. D. (1988). Generalization of the Fortuin-Kasteley n-Swendsen- Wang representation and Monte Carlo algorithm. Phy s. Rev. D 38 2009-2012.
GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053.
MIRA, A. (1998). Ordering, splicing and splitting Monte Carlo Markov chains. Ph.D. dissertation, School of Statistics, Univ. Minnesota.
NEAL, R. M. (1997). Markov chain Monte Carlo methods based on "slicing" the density function. Technical Report 9722, Dept. Statistics, Univ. Toronto.
ROBERTS, G. O. and ROSENTHAL, J. S. (2002). The polar slice sampler. Stoch. Models 18 257-280.
TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
TORONTO, ONTARIO CANADA M5S 3G3 E-MAIL: radford@stat.utoronto.ca