The Annals of Statistics

Slice sampling

Radford M. Neal
Source: Ann. Statist. Volume 31, Number 3 (2003), 705-767.

Abstract

Markov chain sampling methods that adapt to characteristics of the distribution being sampled can be constructed using the principle that one can ample from a distribution by sampling uniformly from the region under the plot of its density function. A Markov chain that converges to this uniform distribution can be constructed by alternating uniform sampling in the vertical direction with uniform sampling from the horizontal "slice" defined by the current vertical position, or more generally, with some update that leaves the uniform distribution over this slice invariant. Such "slice sampling" methods are easily implemented for univariate distributions, and can be used to sample from a multivariate distribution by updating each variable in turn. This approach is often easier to implement than Gibbs sampling and more efficient than simple Metropolis updates, due to the ability of slice sampling to adaptively choose the magnitude of changes made. It is therefore attractive for routine and automated use. Slice sampling methods that update all variables simultaneously are also possible. These methods can adaptively choose the magnitudes of changes made to each variable, based on the local properties of the density function. More ambitiously, such methods could potentially adapt to the dependencies between variables by constructing local quadratic approximations. Another approach is to improve sampling efficiency by suppressing random walks. This can be done for univariate slice sampling by "overrelaxation," and for multivariate slice sampling by "reflection" from the edges of the slice.

First Page: Show Hide

Related Works:

Primary Subjects: 65C60, 65C05
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1056562461
Digital Object Identifier: doi:10.1214/aos/1056562461
Mathematical Reviews number (MathSciNet): MR1994729
Zentralblatt MATH identifier: 02002230

References

ADLER, S. L. (1981). Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions. Phy s. Rev. D 23 2901-2904.
BARONE, P. and FRIGESSI, A. (1990). Improving stochastic relaxation for Gaussian random fields. Probab. Engrg. Inform. Sci. 4 369-389.
BESAG, J. and GREEN, P. J. (1993). Spatial statistics and Bayesian computation (with discussion). J. Roy. Statist. Soc. Ser. B 55 25-37, 53-102.
Mathematical Reviews (MathSciNet): MR1210422
CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statist. 7 1-22.
Mathematical Reviews (MathSciNet): MR1628255
Digital Object Identifier: doi:10.2307/1390766
DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344.
Mathematical Reviews (MathSciNet): MR1680334
Zentralblatt MATH: 0913.62028
Digital Object Identifier: doi:10.1111/1467-9868.00179
DIACONIS, P., HOLMES, S. and NEAL, R. M. (2000). Analy sis of a non-reversible Markov chain sampler. Ann. Appl. Probab. 10 726-752.
Mathematical Reviews (MathSciNet): MR2001i:60114
Zentralblatt MATH: 01906286
Digital Object Identifier: doi:10.1214/aoap/1019487508
Project Euclid: euclid.aoap/1019487508
DOWNS, O. B., MACKAY, D. J. C. and LEE, D. D. (2000). The nonnegative Boltzmann machine. In Advances in Neural Information Processing Sy stems (S. A. Solla, T. K. Leen and K.-R. Muller, eds.) 428-434. MIT Press, Cambridge, MA.
DUANE, S., KENNEDY, A. D., PENDLETON, B. J. and ROWETH, D. (1987). Hy brid Monte Carlo. Phy s. Lett. B 195 216-222.
EDWARDS, R. G. and SOKAL, A. D. (1988). Generalization of the Fortuin-Kasteley n-Swendsen- Wang representation and Monte Carlo algorithm. Phy s. Rev. D 38 2009-2012.
Mathematical Reviews (MathSciNet): MR965465
Digital Object Identifier: doi:10.1103/PhysRevD.38.2009
FREY, B. J. (1997). Continuous sigmoidal belief networks trained using slice sampling. In Advances in Neural Information Processing Sy stems (M. C. Mozer, M. I. Jordan and T. Petsche, eds.). MIT Press, Cambridge, MA.
GELFAND, A. E. and SMITH, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398-409.
Zentralblatt MATH: 0702.62020
Mathematical Reviews (MathSciNet): MR1141740
Digital Object Identifier: doi:10.2307/2289776
GEy ER, C. J. and THOMPSON, E. A. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909-920.
GILKS, W. R. (1992). Derivative-free adaptive rejection sampling for Gibbs sampling. In Bayesian Statistics (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 641-649. Oxford Univ. Press.
Mathematical Reviews (MathSciNet): MR1380266
GILKS, W. R., BEST, N. G. and TAN, K. K. C. (1995). Adaptive rejection Metropolis sampling within Gibbs sampling. Appl. Statist. 44 455-472.
Zentralblatt MATH: 0893.62110
GILKS, W. R., NEAL, R. M., BEST, N. G. and TAN, K. K. C. (1997). Corrigendum: Adaptive rejection Metropolis sampling. Appl. Statist. 46 541-542.
GILKS, W. R. and WILD, P. (1992). Adaptive rejection sampling for Gibbs sampling. Appl. Statist. 41 337-348.
Zentralblatt MATH: 0825.62407
GREEN, P. J. and HAN, X. (1992). Metropolis methods, Gaussian proposals and antithetic variables. Stochastic Models, Statistical Methods, and Algorithms in Image Analy sis (P. Barone et al., eds.). Lecture Notes in Statist. 74 142-164. Springer, New York.
Mathematical Reviews (MathSciNet): MR1188477
GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053.
Mathematical Reviews (MathSciNet): MR1872218
Zentralblatt MATH: 1099.60508
HASTINGS, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97-109.
Zentralblatt MATH: 0219.65008
HIGDON, D. M. (1996). Auxiliary variable methods for Markov chain Monte Carlo with applications. ISDS discussion paper 96-17, Duke Univ.
HOROWITZ, A. M. (1991). A generalized guided Monte Carlo algorithm. Phy s. Lett. B 268 247-252.
LUNN, D. J., THOMAS, A., BEST, N. and SPIEGELHALTER, D. (2000). WinBUGSa Bayesian modelling framework: Concepts, structure, and extensibility. Statist. Comput. 10 325-337.
METROPOLIS, N., ROSENBLUTH, A. W., ROSENBLUTH, M. N., TELLER, A. H. and TELLER, E.
(1953). Equation of state calculations by fast computing machines. J. Chem. Phy s. 21 1087-1092.
MIRA, A. (1998). Ordering, splicing and splitting Monte Carlo Markov chains. Ph.D. dissertation, School of Statistics, Univ. Minnesota.
MIRA, A. and TIERNEY, L. (2002). Efficiency and convergence properties of slice samplers. Scand. J. Statist. 29 1-12.
Mathematical Reviews (MathSciNet): MR2003b:62154
Digital Object Identifier: doi:10.1111/1467-9469.00267
NEAL, R. M. (1994). An improved acceptance procedure for the hy brid Monte Carlo algorithm. J. Comput. Phy s. 111 194-203.
Mathematical Reviews (MathSciNet): MR94k:65010
Zentralblatt MATH: 0797.65115
Digital Object Identifier: doi:10.1006/jcph.1994.1054
NEAL, R. M. (1996). Bayesian Learning for Neural Networks. Lecture Notes in Statist. 118. Springer, New York.
NEAL, R. M. (1998). Suppressing random walks in Markov chain Monte Carlo using ordered overrelaxation. In Learning in Graphical Models (M. I. Jordan, ed.) 205-228. Kluwer, Dordrecht.
NEAL, R. M. (2001). Annealed importance sampling. Statist. Comput. 11 125-139.
Mathematical Reviews (MathSciNet): MR1837132
Digital Object Identifier: doi:10.1023/A:1008923215028
ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643-660.
Zentralblatt MATH: 0929.62098
Mathematical Reviews (MathSciNet): MR1707866
Digital Object Identifier: doi:10.1111/1467-9868.00198
THOMAS, A., SPIEGELHALTER, D. J. and GILKS, W. R. (1992). BUGS: A program to perform Bayesian inference using Gibbs sampling. In Bayesian Statistics (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 837-842. Oxford Univ. Press.
TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
TORONTO, ONTARIO CANADA M5S 3G3 E-MAIL: radford@stat.utoronto.ca WEB: http://www.cs.utoronto.ca/ radford/
and Schmeiser (1998). For single-variable slice sampling, the variation of slice sampling proposed by Neal operates analogously to Gibbs sampling in the sense that to obtain the next point x1, y is generated from the conditional distribution [y|x0] given the current point x0 and then x1 is drawn from [x|y]. Both [y|x0] and [x|y] are uniform distributions. Since the closed form of the support of [x|y] is not available, sampling directly from [x|y] is not possible. A clever development is Neal's sophisticated (but relatively expensive) sampling procedure to generate x1 from the "slice" S = x : y < f (x). In Chen and Schmeiser (1998), we proposed random-direction interior point (RDIP), a general sampler designed to be "black box" in the sense that the user need not tune the sampler to the problem. RDIP samples from the uniform distribution defined over the region U below the curve of the surface defined by f (x). Both slice sampling and RDIP require evaluations of f (x). Slice sampling, however, can be more expensive than RDIP because slice sampling requires evaluating f (x) more than once per iteration. The intention of RDIP's design is to use as much free information as possible. For the high-dimensional case, the hy perrectangle idea in slice sampling could be inefficient. For example, suppose f (x) is the bivariate normal density with a high correlation. Then, the hy perrectangle idea essentially mimics the Gibbs sampler, which suffers slow convergence; see Chen and Schmeiser (1993) for a detailed discussion. Aligning the hy perrectangle (or ellipses) to the shape of f (x), along the lines of Kaufman and Smith (1998), seems like a good idea. As Neal mentions, the computational efficiency of our "black-box" sampler RDIP depends on the normalization constant. Our goal was to be automatic and reasonably efficient, rather than to tune the sampler to the problem. If, however,
Chen (1996).
CHEN, M.-H. and SCHMEISER, B. W. (1993). Performance of the Gibbs, hit-and-run, and Metropolis samplers. J. Comput. Graph. Statist. 2 251-272.
Mathematical Reviews (MathSciNet): MR1272394
Digital Object Identifier: doi:10.2307/1390645
CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statist. 7 1-22.
Mathematical Reviews (MathSciNet): MR1628255
Digital Object Identifier: doi:10.2307/1390766
NANDRAM, B. and CHEN, M.-H. (1996). Reparameterizing the generalized linear model to accelerate Gibbs sampler convergence. J. Statist. Comput. Simulation 54 129-144.
Mathematical Reviews (MathSciNet): MR2000b:62142
Zentralblatt MATH: 0925.62309
Digital Object Identifier: doi:10.1080/00949659608811724
KAUFMAN, D. E. and SMITH, R. L. (1998). Direction choice for accelerated convergence in hit-and-run sampling. Oper. Res. 46 84-95.
Zentralblatt MATH: 1009.62597
STORRS, CONNECTICUT 06269-4120 E-MAIL: mhchen@stat.uconn.edu SCHOOL OF INDUSTRIAL ENGINEERING PURDUE UNIVERSITY 1287 GRISSOM HALL
WEST LAFAy ETTE, INDIANA 47907-1287 E-MAIL: bruce@purdue.edu
DOWNS, O. B. (2001). High-temperature expansions for learning models of nonnegative data. In Advances in Neural Information Processing Sy stems 13 (T. K. Leen, T. G. Dietterich and V. Tresp, eds.) 465-471. MIT Press, Cambridge, MA.
DOWNS, O. B., MACKAY, D. J. C. and LEE, D. D. (2000). The nonnegative Boltzmann machine. In Advances in Neural Information Processing Sy stems 12 (S. A. Solla, T. K. Leen and K.-R. Muller, eds.) 428-434. MIT Press, Cambridge, MA.
HINTON, G. E. and SEJNOWSKI, T. J. (1983). Optimal perceptual learning. In IEEE Conference on Computer Vision and Pattern Recognition 448-453. Washington.
KAPPEN, H. J. and RODRIGUEZ, F. B. (1998). Efficient learning in Boltzmann machines using linear response theory. Neural Computation 10 1137-1156.
LEE, D. D. and SEUNG, H. S. (1999). Learning the parts of objects by nonnegative matrix factorization. Nature 401 788-791.
MACKAY, D. J. C. (1998). Introduction to Monte Carlo methods. In Learning in Graphical Models (M. I. Jordan, ed.) 175-204. Kluwer, Dordrecht.
NEAL, R. M. (1997). Markov chain Monte Carlo methods based on "slicing" the density function. Technical Report 9722, Dept. Statistics, Univ. Toronto.
REDMOND, WASHINGTON 98052 E-MAIL: t-odowns@microsoft.com
Rosenthal (1999). Generalizing just a little from the setting described in Section 4 of the paper, suppose that our target density can be written as
the methodology of Roberts and Tweedie (2000). As an illustration of these results, it can be shown that, for the case where f0 is constant (i.e., in the single-variable slice sampler), and f1 is a real-valued log-concave function, 525 iterations suffice for convergence from all starting points x with f1(x) 0.01 supy f1(y). Similar results can be deduced for multidimensional log-concave distributions but the bounds worsen as dimension increases reflecting a genuine curse of dimensionality in this problem (despite the fact that this is inherently a twodimensional Gibbs sampler). To counteract this issue, Roberts and Rosenthal (2002) introduces the polar slice sampler in d dimensions, where f0(x) is chosen
can be constructed in the spirit of Kendall and Møller (2000). Reversibility of the slice sampler allows easy simulation of these processes backwards in time to identify the starting point of the maximal and minimal chains. The beauty of the perfect slice sampling construction relies on the possibility of coupling the maximal and minimal chains even on a continuous state space. This is achieved because, thanks to monotonicity, the minimal horizontal slice (i.e., the one defined by the minimal chain) is alway s a superset of the maximal horizontal slice. If, when sampling over the minimal horizontal slice, a point is selected that belongs to the intersection of the minimal and the maximal horizontal slices, instantaneous coupling happens. Examples of applications given in Mira, Møller and Roberts (2001) include the Ising model on a two-dimensional grid at the critical temperature and various other automodels. In Casella, Mengersen, Robert and Titterington (2002) a further application of the perfect slice sampler construction to mixture of distributions is studied.
to Fill (1998). A further improvement of the algorithm in Mira, Møller and Roberts (2001) allows the use of read once random numbers as introduced in
Wilson (2000).
CASELLA, G., MENGERSEN, K. L., ROBERT, C. P. and TITTERINGTON, D. M. (2002). Perfect samplers for mixtures of distributions. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 777-790.
Mathematical Reviews (MathSciNet): MR1979386
Zentralblatt MATH: 1067.62028
Digital Object Identifier: doi:10.1111/1467-9868.00360
FILL, J. A. (1998). An interruptible algorithm for perfect sampling via Markov chains. Ann. Appl. Probab. 8 131-162.
Mathematical Reviews (MathSciNet): MR99g:60113
Zentralblatt MATH: 0939.60084
Digital Object Identifier: doi:10.1214/aoap/1027961037
Project Euclid: euclid.aoap/1027961037
GREEN, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732.
Zentralblatt MATH: 0861.62023
Mathematical Reviews (MathSciNet): MR1380810
Digital Object Identifier: doi:10.1093/biomet/82.4.711
GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053.
Mathematical Reviews (MathSciNet): MR1872218
Zentralblatt MATH: 1099.60508
KENDALL, W. S. and MØLLER, J. (2000). Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes. Adv. in Appl. Probab. 32 844-865.
Mathematical Reviews (MathSciNet): MR2001h:62176
Zentralblatt MATH: 1123.60309
Digital Object Identifier: doi:10.1239/aap/1013540247
Project Euclid: euclid.aap/1013540247
MIRA, A. (1998). Ordering, slicing and splitting Monte Carlo Markov chains. Ph.D. dessertation, School of Statistics, Univ. of Minnesota.
MIRA, A. (2002). On Metropolis-Hastings algorithms with delay ed rejection. Metron 59 231-241.
Mathematical Reviews (MathSciNet): MR1889712
MIRA, A., MØLLER, J. and ROBERTS, G. O. (2001). Perfect slice samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 593-606.
Mathematical Reviews (MathSciNet): MR2002j:62011
Zentralblatt MATH: 0993.65015
Digital Object Identifier: doi:10.1111/1467-9868.00301
MIRA, A. and TIERNEY, L. (2002). Efficiency and convergence properties of slice samplers. Scand. J. Statist. 29 1-12.
Mathematical Reviews (MathSciNet): MR2003b:62154
Digital Object Identifier: doi:10.1111/1467-9469.00267
PESKUN, P. H. (1973). Optimum Monte Carlo sampling using Markov chains. Biometrika 60 607-612.
Mathematical Reviews (MathSciNet): MR50:15261
Zentralblatt MATH: 0271.62041
Digital Object Identifier: doi:10.1093/biomet/60.3.607
PROPP, J. and WILSON, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures Algorithms 9 223-252.
Mathematical Reviews (MathSciNet): MR99k:60176
Zentralblatt MATH: 0859.60067
ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643-660.
Zentralblatt MATH: 0929.62098
Mathematical Reviews (MathSciNet): MR1707866
Digital Object Identifier: doi:10.1111/1467-9868.00198
ROBERTS, G. O. and ROSENTHAL, J. S. (2001). Markov chains and deinitialising processes. Scand. J. Statist. 28 489-505.
Mathematical Reviews (MathSciNet): MR1858413
Digital Object Identifier: doi:10.1111/1467-9469.00250
ROBERTS, G. O. and ROSENTHAL, J. S. (2002). The polar slice sampler. Stoch. Models 18 257-280.
Zentralblatt MATH: 1006.65004
Mathematical Reviews (MathSciNet): MR1904830
Digital Object Identifier: doi:10.1081/STM-120004467
ROBERTS, G. O. and TWEEDIE, R. L. (2000). Rates of convergence of stochastically monotone and continuous time Markov models. J. Appl. Probab. 37 359-373.
Zentralblatt MATH: 0979.60060
Mathematical Reviews (MathSciNet): MR1780996
Digital Object Identifier: doi:10.1239/jap/1014842542
Project Euclid: euclid.jap/1014842542
TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
WILSON, D. B. (2000). How to couple from the past using a read-once source of randomness. Random Structures Algorithms 16 85-113.
Zentralblatt MATH: 0952.60072
Mathematical Reviews (MathSciNet): MR1728354
which appear in Damien, Wakefield and Walker (1999). I took Neal's example in Section 8 and used a many-variable slice sampler on it. In fact I took a latent variable for each data point, the idea criticized by Neal. The overall joint density is given by
DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344.
Mathematical Reviews (MathSciNet): MR1680334
Zentralblatt MATH: 0913.62028
Digital Object Identifier: doi:10.1111/1467-9868.00179
ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Statist. Soc. Ser. B 61 643-660.
Zentralblatt MATH: 0929.62098
Mathematical Reviews (MathSciNet): MR1707866
Digital Object Identifier: doi:10.1111/1467-9868.00198
BATH, BA2 7AY UNITED KINGDOM E-MAIL: S.G.Walker@bath.ac.uk
CARACCIOLO, S., PELISSETTO, A. and SOKAL, A. D. (1994). A general limitation on Monte Carlo algorithms of Metropolis ty pe. Phy s. Rev. Lett. 72 179-182.
Mathematical Reviews (MathSciNet): MR94i:65008
Zentralblatt MATH: 0973.65500
Digital Object Identifier: doi:10.1103/PhysRevLett.72.179
CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statistics 7 1-22.
Mathematical Reviews (MathSciNet): MR1628255
Digital Object Identifier: doi:10.2307/1390766
DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344.
Mathematical Reviews (MathSciNet): MR1680334
Zentralblatt MATH: 0913.62028
Digital Object Identifier: doi:10.1111/1467-9868.00179
EDWARDS, R. G. and SOKAL, A. D. (1988). Generalization of the Fortuin-Kasteley n-Swendsen- Wang representation and Monte Carlo algorithm. Phy s. Rev. D 38 2009-2012.
Mathematical Reviews (MathSciNet): MR965465
Digital Object Identifier: doi:10.1103/PhysRevD.38.2009
GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053.
Mathematical Reviews (MathSciNet): MR1872218
Zentralblatt MATH: 1099.60508
MIRA, A. (1998). Ordering, splicing and splitting Monte Carlo Markov chains. Ph.D. dissertation, School of Statistics, Univ. Minnesota.
NEAL, R. M. (1997). Markov chain Monte Carlo methods based on "slicing" the density function. Technical Report 9722, Dept. Statistics, Univ. Toronto.
ROBERTS, G. O. and ROSENTHAL, J. S. (2002). The polar slice sampler. Stoch. Models 18 257-280.
Zentralblatt MATH: 1006.65004
Mathematical Reviews (MathSciNet): MR1904830
Digital Object Identifier: doi:10.1081/STM-120004467
TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
TORONTO, ONTARIO CANADA M5S 3G3 E-MAIL: radford@stat.utoronto.ca

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?