Statistical Science

A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data

Christian Robert and George Casella

Full-text: Open access


We attempt to trace the history and development of Markov chain Monte Carlo (MCMC) from its early inception in the late 1940s through its use today. We see how the earlier stages of Monte Carlo (MC, not MCMC) research have led to the algorithms currently in use. More importantly, we see how the development of this methodology has not only changed our solutions to problems, but has changed the way we think about problems.

Article information

Statist. Sci., Volume 26, Number 1 (2011), 102-115.

First available in Project Euclid: 9 June 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Gibbs sampling Metropolis–Hasting algorithm hierarchical models Bayesian methods


Robert, Christian; Casella, George. A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data. Statist. Sci. 26 (2011), no. 1, 102--115. doi:10.1214/10-STS351.

Export citation


  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • Andrieu, C., de Freitas, N., Doucet, A. and Jordan, M. (2004). An introduction to MCMC for machine learning. Machine Learning 50 5–43.
  • Athreya, K. B. and Ney, P. (1978). A new approach to the limit theory of recurrent Markov chains. Trans. Amer. Math. Soc. 245 493–501.
  • Athreya, K. B., Doss, H. and Sethuraman, J. (1996). On the convergence of the Markov chain simulation method. Ann. Statist. 24 69–100.
  • Barker, A. (1965). Monte Carlo calculations of the radial distribution functions for a proton electron plasma. Aust. J. Physics 18 119–133.
  • Berthelsen, K. and Møller, J. (2003). Likelihood and non-parametric Bayesian MCMC inference for spatial point processes based on perfect simulation and path sampling. Scand. J. Statist. 30 549–564.
  • Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems (with discussion). J. Roy. Statist. Soc. Ser. B 36 192–236.
  • Besag, J. (1975). Statistical analysis of non-lattice data. The Statistician 24 179–195.
  • Besag, J. (1986). On the statistical analysis of dirty pictures. J. Roy. Statist. Soc. Ser. B 48 259–302.
  • Besag, J. and Clifford, P. (1989). Generalized Monte Carlo significance tests. Biometrika 76 633–642.
  • Besag, J., York, J. and Mollié, A. (1991). Bayesian image restoration, with two applications in spatial statistics (with discussion). Ann. Inst. Statist. Math. 43 1–59.
  • Billingsley, P. (1995). Probability and Measure, 3rd ed. Wiley, New York.
  • Broniatowski, M., Celeux, G. and Diebolt, J. (1984). Reconnaissance de mélanges de densités par un algorithme d’apprentissage probabiliste. In Data Analysis and Informatics, III (Versailles, 1983) 359–373. North-Holland, Amsterdam.
  • Brooks, S. P., Giudici, P. and Roberts, G. O. (2003). Efficient construction of reversible jump Markov chain Monte Carlo proposal distributions (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 65 3–55.
  • Cappé, O., Robert, C. P. and Rydén, T. (2003). Reversible jump, birth-and-death and more general continuous time Markov chain Monte Carlo samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 679–700.
  • Carlin, B. and Chib, S. (1995). Bayesian model choice through Markov chain Monte Carlo. J. Roy. Statist. Soc. Ser. B 57 473–484.
  • Carlin, B., Gelfand, A. and Smith, A. (1992). Hierarchical Bayesian analysis of change point problems. Appl. Statist. 41 389–405.
  • Carpenter, J., Clifford, P. and Fernhead, P. (1997). Building robust simulation-based filters for evolving datasets. Technical report, Dept. Statistics, Oxford Univ.
  • Casella, G. and George, E. I. (1992). Explaining the Gibbs sampler. Amer. Statist. 46 167–174.
  • Casella, G., Lavine, M. and Robert, C. P. (2001). Explaining the perfect sampler. Amer. Statist. 55 299–305.
  • Celeux, G. and Diebolt, J. (1985). The SEM algorithm: A probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Comput. Statist. Quater 2 73–82.
  • Celeux, G. and Diebolt, J. (1990). Une version de type recuit simulé de l’algorithme EM. C. R. Acad. Sci. Paris Sér. I Math. 310 119–124.
  • Chan, K. and Geyer, C. (1994). Discussion of “Markov chains for exploring posterior distribution.” Ann. Statist. 22 1747–1758.
  • Chen, M.-H., Shao, Q.-M. and Ibrahim, J. G. (2000). Monte Carlo Methods in Bayesian Computation. Springer, New York.
  • Churchill, G. (1995). Accurate restoration of DNA sequences (with discussion). In Case Studies in Bayesian Statistics ( C. Gatsonis and J. S. Hodges, eds.) 2 90–148. Springer, New York.
  • Cressie, N. A. C. (1993). Statistics for Spatial Data. Wiley, New York. Revised reprint of the 1991 edition.
  • Damien, P., Wakefield, J. and Walker, S. (1999). Gibbs sampling for Bayesian non-conjugate and hierarchical models by using auxiliary variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 331–344.
  • Del Moral, P., Doucet, A. and Jasra, A. (2006). Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 411–436.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1–38.
  • Diebolt, J. and Robert, C. P. (1994). Estimation of finite mixture distributions through Bayesian sampling. J. Roy. Statist. Soc. Ser. B 56 363–375.
  • Dimakos, X. K. (2001). A guide to exact simulation. Internat. Statist. Rev. 69 27–48.
  • Doucet, A., de Freitas, N. and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. Springer, New York.
  • Doucet, A., Godsill, S. and Andrieu, C. (2000). On sequential Monte Carlo sampling methods for Bayesian filtering. Statist. Comput. 10 197–208.
  • Dupuis, J. A. (1995). Bayesian estimation of movement and survival probabilities from capture–recapture data. Biometrika 82 761–772.
  • Eckhardt, R. (1987). Stan Ulam, John von Neumann, and the Monte Carlo method. Los Alamos Sci. 15 Special Issue 131–137.
  • Ehrman, J. R., Fosdick, L. D. and Handscomb, D. C. (1960). Computation of order parameters in an Ising lattice by the Monte Carlo method. J. Math. Phys. 1 547–558.
  • Fernández, R., Ferrari, P. and Garcia, N. L. (1999). Perfect simulation for interacting point processes, loss networks and Ising models. Technical report, Laboratoire Raphael Salem, Univ. de Rouen.
  • Fill, J. A. (1998a). An interruptible algorithm for perfect sampling via Markov chains. Ann. Appl. Probab. 8 131–162.
  • Fill, J. (1998b). The move-to front rule: A case study for two perfect sampling algorithms. Prob. Eng. Info. Sci 8 131–162.
  • Fismen, M. (1998). Exact simulation using Markov chains. Technical Report 6/98, Institutt for Matematiske Fag, Oslo. Diploma-thesis.
  • Gelfand, A. E. and Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. J. Roy. Statist. Soc. Ser. B 56 501–514.
  • Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409.
  • Gelfand, A. E., Smith, A. F. M. and Lee, T.-M. (1992). Bayesian analysis of constrained parameter and truncated data problems using Gibbs sampling. J. Amer. Statist. Assoc. 87 523–532.
  • Gelfand, A., Hills, S., Racine-Poon, A. and Smith, A. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. J. Amer. Statist. Assoc. 85 972–982.
  • Gelman, A. and Rubin, D. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statist. Sci. 7 457–511.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6 721–741.
  • George, E. and McCulloch, R. (1993). Variable selection via Gibbbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • George, E. I. and Robert, C. P. (1992). Capture–recapture estimation via Gibbs sampling. Biometrika 79 677–683.
  • Geyer, C. (1992). Practical Monte Carlo Markov chain (with discussion). Statist. Sci. 7 473–511.
  • Geyer, C. J. and Møller, J. (1994). Simulation procedures and likelihood inference for spatial point processes. Scand. J. Statist. 21 359–373.
  • Geyer, C. and Thompson, E. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909–920.
  • Gilks, W. (1992). Derivative-free adaptive rejection sampling for Gibbs sampling. In Bayesian Statistics 4 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.) 641–649. Oxford Univ. Press, Oxford.
  • Gilks, W. R. and Berzuini, C. (2001). Following a moving target—Monte Carlo inference for dynamic Bayesian models. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 127–146.
  • Gilks, W., Best, N. and Tan, K. (1995). Adaptive rejection Metropolis sampling within Gibbs sampling. Appl. Statist. 44 455–472.
  • Gilks, W. R., Roberts, G. O. and Sahu, S. K. (1998). Adaptive Markov chain Monte Carlo through regeneration. J. Amer. Statist. Assoc. 93 1045–1054.
  • Gordon, N., Salmond, J. and Smith, A. (1993). A novel approach to non-linear/non-Gaussian Bayesian state estimation. IEEE Proceedings on Radar and Signal Processing 140 107–113.
  • Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • Guihenneuc-Jouyaux, C. and Robert, C. P. (1998). Discretization of continuous Markov chains and Markov chain Monte Carlo convergence assessment. J. Amer. Statist. Assoc. 93 1055–1067.
  • Hammersley, J. M. (1974). Discussion of Mr Besag’s paper. J. Roy. Statist. Soc. Ser. B 36 230–231.
  • Hammersley, J. M. and Handscomb, D. C. (1964). Monte Carlo Methods. Wiley, New York.
  • Hammersley, J. M. and Morton, K. W. (1954). Poor man’s Monte Carlo. J. Roy. Statist. Soc. Ser. B. 16 23–38.
  • Handschin, J. E. and Mayne, D. Q. (1969). Monte Carlo techniques to estimate the conditional expectation in multi-stage non-linear filtering. Internat. J. Control 9 547–559.
  • Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their application. Biometrika 57 97–109.
  • Hitchcock, D. B. (2003). A history of the Metropolis–Hastings algorithm. Amer. Statist. 57 254–257.
  • Hobert, J. P. and Casella, G. (1996). The effect of improper priors on Gibbs sampling in hierarchical linear mixed models. J. Amer. Statist. Assoc. 91 1461–1473.
  • Hobert, J. P. and Marchev, D. (2008). A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms. Ann. Statist. 36 532–554.
  • Hobert, J. P., Jones, G. L., Presnell, B. and Rosenthal, J. S. (2002). On the applicability of regenerative simulation in Markov chain Monte Carlo. Biometrika 89 731–743.
  • Jones, G. L. and Hobert, J. P. (2001). Honest exploration of intractable probability distributions via Markov chain Monte Carlo. Statist. Sci. 16 312–334.
  • Jones, G. L., Haran, M., Caffo, B. S. and Neath, R. (2006). Fixed-width output analysis for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 101 1537–1547.
  • Kemeny, J. G. and Snell, J. L. (1960). Finite Markov Chains. Van Nostrand, Princeton, NJ.
  • Kendall, W. S. and Møller, J. (2000). Perfect simulation using dominating processes on ordered spaces, with application to locally stable point processes. Adv. in Appl. Probab. 32 844–865.
  • Kipnis, C. and Varadhan, S. R. S. (1986). Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions. Comm. Math. Phys. 104 1–19.
  • Kirkpatrick, S., Gelatt, C. D. Jr. and Vecchi, M. P. (1983). Optimization by simulated annealing. Science 220 671–680.
  • Kitagawa, G. (1996). Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. J. Comput. Graph. Statist. 5 1–25.
  • Kong, A., Liu, J. and Wong, W. (1994). Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc. 89 278–288.
  • Kuhn, T. (1996). The Structure of Scientific Revolutions, 3rd ed. Univ. Chicago Press, Chicago.
  • Landau, D. P. and Binder, K. (2005). A Guide to Monte Carlo Simulations in Statistical Physics. Cambridge Univ. Press, Cambridge.
  • Lange, N., Carlin, B. P. and Gelfand, A. E. (1992). Hierarchical Bayes models for the progression of HIV infection using longitudinal CD4 T-cell numbers. J. Amer. Statist. Assoc. 87 615–626.
  • Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262 208–214.
  • Liu, J. and Chen, R. (1995). Blind deconvolution via sequential imputations. J. Amer. Statist. Assoc. 90 567–576.
  • Liu, C. and Rubin, D. B. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika 81 633–648.
  • Liu, J. S. and Wu, Y. N. (1999). Parameter expansion for data augmentation. J. Amer. Statist. Assoc. 94 1264–1274.
  • Liu, J. S., Wong, W. H. and Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81 27–40.
  • Liu, J. S., Wong, W. H. and Kong, A. (1995). Covariance structure and convergence rate of the Gibbs sampler with various scans. J. Roy. Statist. Soc. Ser. B 57 157–169.
  • Madras, N. and Slade, G. (1993). The Self-Avoiding Walk. Birkhäuser, Boston, MA.
  • Marin, J.-M. and Robert, C. P. (2007). Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New York.
  • Marshall, A. (1965). The use of multi-stage sampling schemes in Monte Carlo computations. In Symposium on Monte Carlo Methods. Wiley, New York.
  • Meng, X.-L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80 267–278.
  • Meng, X.-L. and van Dyk, D. A. (1999). Seeking efficient data augmentation schemes via conditional and marginal augmentation. Biometrika 86 301–320.
  • Mengersen, K. L. and Tweedie, R. L. (1996). Rates of convergence of the Hastings and Metropolis algorithms. Ann. Statist. 24 101–121.
  • Metropolis, N. (1987). The beginning of the Monte Carlo method. Los Alamos Sci. 15 Special Issue 125–130.
  • Metropolis, N. and Ulam, S. (1949). The Monte Carlo method. J. Amer. Statist. Assoc. 44 335–341.
  • Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A. and Teller, E. (1953). Equations of state calculations by fast computing machines. J. Chem. Phys. 21 1087–1092.
  • Møller, J. and Waagepetersen, R. (2003). Statistical Inference and Simulation for Spatial Point Processes. Chapman & Hall/CRC, Boca Raton.
  • Moussouris, J. (1974). Gibbs and Markov random systems with constraints. J. Statist. Phys. 10 11–33.
  • Mykland, P., Tierney, L. and Yu, B. (1995). Regeneration in Markov chain samplers. J. Amer. Statist. Assoc. 90 233–241.
  • Neal, R. (1996). Sampling from multimodal distributions using tempered transitions. Statist. Comput. 6 353–356.
  • Neal, R. M. (2003). Slice sampling (with discussion). Ann. Statist. 31 705–767.
  • Pearl, J. (1987). Evidential reasoning using stochastic simulation of causal models. Artificial Intelligence 32 245–257.
  • Peskun, P. H. (1973). Optimum Monte-Carlo sampling using Markov chains. Biometrika 60 607–612.
  • Peskun, P. H. (1981). Guidelines for choosing the transition matrix in Monte Carlo methods using Markov chains. J. Comput. Phys. 40 327–344.
  • Propp, J. G. and Wilson, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. In Proceedings of the Seventh International Conference on Random Structures and Algorithms (Atlanta, GA, 1995) 9 223–252.
  • Qian, W. and Titterington, D. M. (1990). Parameter estimation for hidden Gibbs chains. Statist. Probab. Lett. 10 49–58.
  • Raftery, A. and Banfield, J. (1991). Stopping the Gibbs sampler, the use of morphology, and other issues in spatial statistics. Ann. Inst. Statist. Math. 43 32–43.
  • Richardson, S. and Green, P. J. (1997). On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. Roy. Statist. Soc. Ser. B 59 731–792.
  • Ripley, B. D. (1987). Stochastic Simulation. Wiley, New York.
  • Robert, C. P. (1995). Convergence control methods for Markov chain Monte Carlo algorithms. Statist. Sci. 10 231–253.
  • Robert, C. P. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer, New York.
  • Roberts, G. O. and Rosenthal, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643–660.
  • Roberts, G. O. and Rosenthal, J. S. (2007). Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Probab. 44 458–475.
  • Roberts, G. O., Gelman, A. and Gilks, W. R. (1997). Weak convergence and optimal scaling of random walk Metropolis algorithms. Ann. Appl. Probab. 7 110–120.
  • Rosenbluth, M. and Rosenbluth, A. (1955). Monte Carlo calculation of the average extension of molecular chains. J. Chem. Phys. 23 356–359.
  • Rosenthal, J. S. (1995). Minorization conditions and convergence rates for Markov chain Monte Carlo. J. Amer. Statist. Assoc. 90 558–566.
  • Rubin, D. (1978). Multiple imputation in sample surveys: A phenomenological Bayesian approach to nonresponse. In Imputation and Editing of Faulty or Missing Survey Data. U.S. Department of Commerce, Washington, DC.
  • Smith, A. F. M. and Gelfand, A. E. (1992). Bayesian statistics without tears: A sampling-resampling perspective. Amer. Statist. 46 84–88.
  • Stephens, D. A. (1994). Bayesian retrospective multiple-changepoint identification. Appl. Statist. 43 159–178.
  • Stephens, M. (2000). Bayesian analysis of mixture models with an unknown number of components—an alternative to reversible jump methods. Ann. Statist. 28 40–74.
  • Stephens, D. A. and Smith, A. F. M. (1993). Bayesian inference in multipoint gene mapping. Ann. Hum. Genetics 57 65–82.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528–550.
  • Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701–1786.
  • Titterington, D. and Cox, D. R. (2001). Biometrika: One Hundred Years. Oxford Univ. Press, Oxford.
  • Wakefield, J., Smith, A., Racine-Poon, A. and Gelfand, A. (1994). Bayesian analysis of linear and non-linear population models using the Gibbs sampler. Appl. Statist. 43 201–222.
  • Wang, C. S., Rutledge, J. J. and Gianola, D. (1993). Marginal inferences about variance-components in a mixed linear model using Gibbs sampling. Gen. Sel. Evol. 25 41–62.
  • Wang, C. S., Rutledge, J. J. and Gianola, D. (1994). Bayesian analysis of mixed limear models via Gibbs sampling with an application to litter size in Iberian pigs. Gen. Sel. Evol. 26 91–115.
  • Wei, G. and Tanner, M. (1990). A Monte Carlo implementation of the EM algorithm and the poor man’s data augmentation algorithm. J. Amer. Statist. Assoc. 85 699–704.
  • Zeger, S. L. and Karim, M. R. (1991). Generalized linear models with random effects; a Gibbs sampling approach. J. Amer. Statist. Assoc. 86 79–86.