The Annals of Statistics

Slice sampling

Radford M. Neal

Full-text: Open access

Abstract

Markov chain sampling methods that adapt to characteristics of the distribution being sampled can be constructed using the principle that one can ample from a distribution by sampling uniformly from the region under the plot of its density function. A Markov chain that converges to this uniform distribution can be constructed by alternating uniform sampling in the vertical direction with uniform sampling from the horizontal "slice" defined by the current vertical position, or more generally, with some update that leaves the uniform distribution over this slice invariant. Such "slice sampling" methods are easily implemented for univariate distributions, and can be used to sample from a multivariate distribution by updating each variable in turn. This approach is often easier to implement than Gibbs sampling and more efficient than simple Metropolis updates, due to the ability of slice sampling to adaptively choose the magnitude of changes made. It is therefore attractive for routine and automated use. Slice sampling methods that update all variables simultaneously are also possible. These methods can adaptively choose the magnitudes of changes made to each variable, based on the local properties of the density function. More ambitiously, such methods could potentially adapt to the dependencies between variables by constructing local quadratic approximations. Another approach is to improve sampling efficiency by suppressing random walks. This can be done for univariate slice sampling by "overrelaxation," and for multivariate slice sampling by "reflection" from the edges of the slice.

Article information

Source
Ann. Statist. Volume 31, Number 3 (2003), 705-767.

Dates
First available in Project Euclid: 25 June 2003

Permanent link to this document
http://projecteuclid.org/euclid.aos/1056562461

Digital Object Identifier
doi:10.1214/aos/1056562461

Mathematical Reviews number (MathSciNet)
MR1994729

Zentralblatt MATH identifier
02002230

Subjects
Primary: 65C60: Computational problems in statistics 65C05: Monte Carlo methods

Keywords
Markov chain Monte Carlo auxiliary variables adaptive methods Gibbs sampling Metropolis algorithm overrelaxation dynamical methods

Citation

Neal, Radford M. Slice sampling. The Annals of Statistics 31 (2003), no. 3, 705--767. doi:10.1214/aos/1056562461. http://projecteuclid.org/euclid.aos/1056562461.


Export citation

References

  • ADLER, S. L. (1981). Over-relaxation method for the Monte Carlo evaluation of the partition function for multiquadratic actions. Phy s. Rev. D 23 2901-2904.
  • BARONE, P. and FRIGESSI, A. (1990). Improving stochastic relaxation for Gaussian random fields. Probab. Engrg. Inform. Sci. 4 369-389.
  • BESAG, J. and GREEN, P. J. (1993). Spatial statistics and Bayesian computation (with discussion). J. Roy. Statist. Soc. Ser. B 55 25-37, 53-102.
  • CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statist. 7 1-22.
  • DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344.
  • DIACONIS, P., HOLMES, S. and NEAL, R. M. (2000). Analy sis of a non-reversible Markov chain sampler. Ann. Appl. Probab. 10 726-752.
  • DOWNS, O. B., MACKAY, D. J. C. and LEE, D. D. (2000). The nonnegative Boltzmann machine. In Advances in Neural Information Processing Sy stems (S. A. Solla, T. K. Leen and K.-R. Muller, eds.) 428-434. MIT Press, Cambridge, MA.
  • DUANE, S., KENNEDY, A. D., PENDLETON, B. J. and ROWETH, D. (1987). Hy brid Monte Carlo. Phy s. Lett. B 195 216-222.
  • EDWARDS, R. G. and SOKAL, A. D. (1988). Generalization of the Fortuin-Kasteley n-Swendsen- Wang representation and Monte Carlo algorithm. Phy s. Rev. D 38 2009-2012.
  • FREY, B. J. (1997). Continuous sigmoidal belief networks trained using slice sampling. In Advances in Neural Information Processing Sy stems (M. C. Mozer, M. I. Jordan and T. Petsche, eds.). MIT Press, Cambridge, MA.
  • GELFAND, A. E. and SMITH, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398-409.
  • GEy ER, C. J. and THOMPSON, E. A. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909-920.
  • GILKS, W. R. (1992). Derivative-free adaptive rejection sampling for Gibbs sampling. In Bayesian Statistics (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 641-649. Oxford Univ. Press.
  • GILKS, W. R., BEST, N. G. and TAN, K. K. C. (1995). Adaptive rejection Metropolis sampling within Gibbs sampling. Appl. Statist. 44 455-472.
  • GILKS, W. R., NEAL, R. M., BEST, N. G. and TAN, K. K. C. (1997). Corrigendum: Adaptive rejection Metropolis sampling. Appl. Statist. 46 541-542.
  • GILKS, W. R. and WILD, P. (1992). Adaptive rejection sampling for Gibbs sampling. Appl. Statist. 41 337-348.
  • GREEN, P. J. and HAN, X. (1992). Metropolis methods, Gaussian proposals and antithetic variables. Stochastic Models, Statistical Methods, and Algorithms in Image Analy sis (P. Barone et al., eds.). Lecture Notes in Statist. 74 142-164. Springer, New York.
  • GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053.
  • HASTINGS, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97-109.
  • HIGDON, D. M. (1996). Auxiliary variable methods for Markov chain Monte Carlo with applications. ISDS discussion paper 96-17, Duke Univ.
  • HOROWITZ, A. M. (1991). A generalized guided Monte Carlo algorithm. Phy s. Lett. B 268 247-252.
  • LUNN, D. J., THOMAS, A., BEST, N. and SPIEGELHALTER, D. (2000). WinBUGSa Bayesian modelling framework: Concepts, structure, and extensibility. Statist. Comput. 10 325-337.
  • METROPOLIS, N., ROSENBLUTH, A. W., ROSENBLUTH, M. N., TELLER, A. H. and TELLER, E.
  • (1953). Equation of state calculations by fast computing machines. J. Chem. Phy s. 21 1087-1092.
  • MIRA, A. (1998). Ordering, splicing and splitting Monte Carlo Markov chains. Ph.D. dissertation, School of Statistics, Univ. Minnesota.
  • MIRA, A. and TIERNEY, L. (2002). Efficiency and convergence properties of slice samplers. Scand. J. Statist. 29 1-12.
  • NEAL, R. M. (1994). An improved acceptance procedure for the hy brid Monte Carlo algorithm. J. Comput. Phy s. 111 194-203.
  • NEAL, R. M. (1996). Bayesian Learning for Neural Networks. Lecture Notes in Statist. 118. Springer, New York.
  • NEAL, R. M. (1998). Suppressing random walks in Markov chain Monte Carlo using ordered overrelaxation. In Learning in Graphical Models (M. I. Jordan, ed.) 205-228. Kluwer, Dordrecht.
  • NEAL, R. M. (2001). Annealed importance sampling. Statist. Comput. 11 125-139.
  • ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643-660.
  • THOMAS, A., SPIEGELHALTER, D. J. and GILKS, W. R. (1992). BUGS: A program to perform Bayesian inference using Gibbs sampling. In Bayesian Statistics (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 837-842. Oxford Univ. Press.
  • TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
  • TORONTO, ONTARIO CANADA M5S 3G3 E-MAIL: radford@stat.utoronto.ca WEB: http://www.cs.utoronto.ca/ radford/
  • and Schmeiser (1998). For single-variable slice sampling, the variation of slice sampling proposed by Neal operates analogously to Gibbs sampling in the sense that to obtain the next point x1, y is generated from the conditional distribution [y|x0] given the current point x0 and then x1 is drawn from [x|y]. Both [y|x0] and [x|y] are uniform distributions. Since the closed form of the support of [x|y] is not available, sampling directly from [x|y] is not possible. A clever development is Neal's sophisticated (but relatively expensive) sampling procedure to generate x1 from the "slice" S = x : y < f (x). In Chen and Schmeiser (1998), we proposed random-direction interior point (RDIP), a general sampler designed to be "black box" in the sense that the user need not tune the sampler to the problem. RDIP samples from the uniform distribution defined over the region U below the curve of the surface defined by f (x). Both slice sampling and RDIP require evaluations of f (x). Slice sampling, however, can be more expensive than RDIP because slice sampling requires evaluating f (x) more than once per iteration. The intention of RDIP's design is to use as much free information as possible. For the high-dimensional case, the hy perrectangle idea in slice sampling could be inefficient. For example, suppose f (x) is the bivariate normal density with a high correlation. Then, the hy perrectangle idea essentially mimics the Gibbs sampler, which suffers slow convergence; see Chen and Schmeiser (1993) for a detailed discussion. Aligning the hy perrectangle (or ellipses) to the shape of f (x), along the lines of Kaufman and Smith (1998), seems like a good idea. As Neal mentions, the computational efficiency of our "black-box" sampler RDIP depends on the normalization constant. Our goal was to be automatic and reasonably efficient, rather than to tune the sampler to the problem. If, however,
  • Chen (1996).
  • CHEN, M.-H. and SCHMEISER, B. W. (1993). Performance of the Gibbs, hit-and-run, and Metropolis samplers. J. Comput. Graph. Statist. 2 251-272.
  • CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statist. 7 1-22.
  • NANDRAM, B. and CHEN, M.-H. (1996). Reparameterizing the generalized linear model to accelerate Gibbs sampler convergence. J. Statist. Comput. Simulation 54 129-144.
  • KAUFMAN, D. E. and SMITH, R. L. (1998). Direction choice for accelerated convergence in hit-and-run sampling. Oper. Res. 46 84-95.
  • STORRS, CONNECTICUT 06269-4120 E-MAIL: mhchen@stat.uconn.edu SCHOOL OF INDUSTRIAL ENGINEERING PURDUE UNIVERSITY 1287 GRISSOM HALL
  • WEST LAFAy ETTE, INDIANA 47907-1287 E-MAIL: bruce@purdue.edu
  • DOWNS, O. B. (2001). High-temperature expansions for learning models of nonnegative data. In Advances in Neural Information Processing Sy stems 13 (T. K. Leen, T. G. Dietterich and V. Tresp, eds.) 465-471. MIT Press, Cambridge, MA.
  • DOWNS, O. B., MACKAY, D. J. C. and LEE, D. D. (2000). The nonnegative Boltzmann machine. In Advances in Neural Information Processing Sy stems 12 (S. A. Solla, T. K. Leen and K.-R. Muller, eds.) 428-434. MIT Press, Cambridge, MA.
  • HINTON, G. E. and SEJNOWSKI, T. J. (1983). Optimal perceptual learning. In IEEE Conference on Computer Vision and Pattern Recognition 448-453. Washington.
  • KAPPEN, H. J. and RODRIGUEZ, F. B. (1998). Efficient learning in Boltzmann machines using linear response theory. Neural Computation 10 1137-1156.
  • LEE, D. D. and SEUNG, H. S. (1999). Learning the parts of objects by nonnegative matrix factorization. Nature 401 788-791.
  • MACKAY, D. J. C. (1998). Introduction to Monte Carlo methods. In Learning in Graphical Models (M. I. Jordan, ed.) 175-204. Kluwer, Dordrecht.
  • NEAL, R. M. (1997). Markov chain Monte Carlo methods based on "slicing" the density function. Technical Report 9722, Dept. Statistics, Univ. Toronto.
  • REDMOND, WASHINGTON 98052 E-MAIL: t-odowns@microsoft.com
  • Rosenthal (1999). Generalizing just a little from the setting described in Section 4 of the paper, suppose that our target density can be written as
  • the methodology of Roberts and Tweedie (2000). As an illustration of these results, it can be shown that, for the case where f0 is constant (i.e., in the single-variable slice sampler), and f1 is a real-valued log-concave function, 525 iterations suffice for convergence from all starting points x with f1(x) 0.01 supy f1(y). Similar results can be deduced for multidimensional log-concave distributions but the bounds worsen as dimension increases reflecting a genuine curse of dimensionality in this problem (despite the fact that this is inherently a twodimensional Gibbs sampler). To counteract this issue, Roberts and Rosenthal (2002) introduces the polar slice sampler in d dimensions, where f0(x) is chosen
  • can be constructed in the spirit of Kendall and Møller (2000). Reversibility of the slice sampler allows easy simulation of these processes backwards in time to identify the starting point of the maximal and minimal chains. The beauty of the perfect slice sampling construction relies on the possibility of coupling the maximal and minimal chains even on a continuous state space. This is achieved because, thanks to monotonicity, the minimal horizontal slice (i.e., the one defined by the minimal chain) is alway s a superset of the maximal horizontal slice. If, when sampling over the minimal horizontal slice, a point is selected that belongs to the intersection of the minimal and the maximal horizontal slices, instantaneous coupling happens. Examples of applications given in Mira, Møller and Roberts (2001) include the Ising model on a two-dimensional grid at the critical temperature and various other automodels. In Casella, Mengersen, Robert and Titterington (2002) a further application of the perfect slice sampler construction to mixture of distributions is studied.
  • to Fill (1998). A further improvement of the algorithm in Mira, Møller and Roberts (2001) allows the use of read once random numbers as introduced in
  • Wilson (2000).
  • CASELLA, G., MENGERSEN, K. L., ROBERT, C. P. and TITTERINGTON, D. M. (2002). Perfect samplers for mixtures of distributions. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 777-790.
  • FILL, J. A. (1998). An interruptible algorithm for perfect sampling via Markov chains. Ann. Appl. Probab. 8 131-162.
  • GREEN, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711-732.
  • GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053.
  • KENDALL, W. S. and MØLLER, J. (2000). Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes. Adv. in Appl. Probab. 32 844-865.
  • MIRA, A. (1998). Ordering, slicing and splitting Monte Carlo Markov chains. Ph.D. dessertation, School of Statistics, Univ. of Minnesota.
  • MIRA, A. (2002). On Metropolis-Hastings algorithms with delay ed rejection. Metron 59 231-241.
  • MIRA, A., MØLLER, J. and ROBERTS, G. O. (2001). Perfect slice samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 593-606.
  • MIRA, A. and TIERNEY, L. (2002). Efficiency and convergence properties of slice samplers. Scand. J. Statist. 29 1-12.
  • PESKUN, P. H. (1973). Optimum Monte Carlo sampling using Markov chains. Biometrika 60 607-612.
  • PROPP, J. and WILSON, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. Random Structures Algorithms 9 223-252.
  • ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643-660.
  • ROBERTS, G. O. and ROSENTHAL, J. S. (2001). Markov chains and deinitialising processes. Scand. J. Statist. 28 489-505.
  • ROBERTS, G. O. and ROSENTHAL, J. S. (2002). The polar slice sampler. Stoch. Models 18 257-280.
  • ROBERTS, G. O. and TWEEDIE, R. L. (2000). Rates of convergence of stochastically monotone and continuous time Markov models. J. Appl. Probab. 37 359-373.
  • TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
  • WILSON, D. B. (2000). How to couple from the past using a read-once source of randomness. Random Structures Algorithms 16 85-113.
  • which appear in Damien, Wakefield and Walker (1999). I took Neal's example in Section 8 and used a many-variable slice sampler on it. In fact I took a latent variable for each data point, the idea criticized by Neal. The overall joint density is given by
  • DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344.
  • ROBERTS, G. O. and ROSENTHAL, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Statist. Soc. Ser. B 61 643-660.
  • BATH, BA2 7AY UNITED KINGDOM E-MAIL: S.G.Walker@bath.ac.uk
  • CARACCIOLO, S., PELISSETTO, A. and SOKAL, A. D. (1994). A general limitation on Monte Carlo algorithms of Metropolis ty pe. Phy s. Rev. Lett. 72 179-182.
  • CHEN, M.-H. and SCHMEISER, B. W. (1998). Toward black-box sampling: A random-direction interior-point Markov chain approach. J. Comput. Graph. Statistics 7 1-22.
  • DAMIEN, P., WAKEFIELD, J. C. and WALKER, S. G. (1999). Gibbs sampling for Bayesian nonconjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331-344.
  • EDWARDS, R. G. and SOKAL, A. D. (1988). Generalization of the Fortuin-Kasteley n-Swendsen- Wang representation and Monte Carlo algorithm. Phy s. Rev. D 38 2009-2012.
  • GREEN, P. J. and MIRA, A. (2001). Delay ed rejection in reversible jump Metropolis-Hastings. Biometrika 88 1035-1053.
  • MIRA, A. (1998). Ordering, splicing and splitting Monte Carlo Markov chains. Ph.D. dissertation, School of Statistics, Univ. Minnesota.
  • NEAL, R. M. (1997). Markov chain Monte Carlo methods based on "slicing" the density function. Technical Report 9722, Dept. Statistics, Univ. Toronto.
  • ROBERTS, G. O. and ROSENTHAL, J. S. (2002). The polar slice sampler. Stoch. Models 18 257-280.
  • TIERNEY, L. and MIRA, A. (1999). Some adaptive Monte Carlo methods for Bayesian inference. Statistics in Medicine 18 2507-2515.
  • TORONTO, ONTARIO CANADA M5S 3G3 E-MAIL: radford@stat.utoronto.ca

See also

  • Includes: Ming-Hui Chen, Bruce W. Schmeiser. Discussion.
  • Includes: Oliver B. Downs. Discussion.
  • Includes: Antonietta Mira, Gareth O. Roberts. Discussion.
  • Includes: John Skilling, David J. C. MacKay. Discussion.
  • Includes: S. G. Walker. Discussion.
  • Includes: Radford M. Neal. Rejoinder.