Statistical Science

From EM to Data Augmentation: The Emergence of MCMC Bayesian Computation in the 1980s

Martin A. Tanner and Wing H. Wong

Full-text: Open access

Abstract

It was known from Metropolis et al. [J. Chem. Phys. 21 (1953) 1087–1092] that one can sample from a distribution by performing Monte Carlo simulation from a Markov chain whose equilibrium distribution is equal to the target distribution. However, it took several decades before the statistical community embraced Markov chain Monte Carlo (MCMC) as a general computational tool in Bayesian inference. The usual reasons that are advanced to explain why statisticians were slow to catch on to the method include lack of computing power and unfamiliarity with the early dynamic Monte Carlo papers in the statistical physics literature. We argue that there was a deeper reason, namely, that the structure of problems in the statistical mechanics and those in the standard statistical literature are different. To make the methods usable in standard Bayesian problems, one had to exploit the power that comes from the introduction of judiciously chosen auxiliary variables and collective moves. This paper examines the development in the critical period 1980–1990, when the ideas of Markov chain simulation from the statistical physics literature and the latent variable formulation in maximum likelihood computation (i.e., EM algorithm) came together to spark the widespread application of MCMC methods in Bayesian computation.

Article information

Source
Statist. Sci. Volume 25, Number 4 (2010), 506-516.

Dates
First available in Project Euclid: 14 March 2011

Permanent link to this document
http://projecteuclid.org/euclid.ss/1300108234

Digital Object Identifier
doi:10.1214/10-STS341

Mathematical Reviews number (MathSciNet)
MR2807767

Citation

Tanner, Martin A.; Wong, Wing H. From EM to Data Augmentation: The Emergence of MCMC Bayesian Computation in the 1980s. Statist. Sci. 25 (2010), no. 4, 506--516. doi:10.1214/10-STS341. http://projecteuclid.org/euclid.ss/1300108234.


Export citation

References

  • Achcar, J. A., Bolfarine, H. and Pericchi, L. R. (1987). Transformation of survival data to an extreme value distribution. J. Roy. Statist. Soc. Ser. D Statistician 36 229–234.
  • Albert, J. (1988). Bayesian estimation of Poisson means using a hierarchical log-linear model. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 519–531. Oxford Univ. Press, Oxford.
  • Bernardo, J. M., Degroot, M. H., Lindley, D. V. and Smith, A. F. M., eds. (1988). Bayesian Statistics 3. Oxford Univ. Press, Oxford.
  • Besag, J. (1986). On the statistical analysis of dirty pictures. J. R. Stat. Soc. Ser. B Stat. Methodol. 48 259–302.
  • Binder, K. (1978). Monte Carlo Methods in Statistical Physics. Springer, New York.
  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 39 1–38.
  • DuMouchel, W. (1988). A Bayesian model and a graphical elicitation procedure for multiple comparisons. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 127–146. Oxford Univ. Press, Oxford.
  • Efron, B. (1979). Bootstrap methods: Another look at the Jackknife. Ann. Statist. 7 1–26.
  • Gelfand, A. E., Hills, S. E., Racine-Poon, A. and Smith, A. F. M. (1990). Illustration of Bayesian inference in normal data models using Gibbs sampling. J. Amer. Statist. Assoc. 85 972–985.
  • Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409.
  • Gelman, A. and King, G. (1990). Estimating the electoral consequences of legislative redistricting. J. Amer. Statist. Assoc. 85 274–282.
  • Geman, S. (1988a). Experiments in Bayesian image analysis. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 159–171. Oxford Univ. Press, Oxford.
  • Geman, S. (1988b). Stochastic relaxation methods for image restoration and expert systems. In Maximum Entropy and Bayesian Methods in Science and Engineering (Vol. 2) (G. J. Erickson and C. R. Smith, eds.). Kluwer, New York.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6 721–741.
  • Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57 1317–1339.
  • Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proceedings of the 23rd Symposium on the Interface (E. Keramidas, ed.) 156–163. Interface Foundation, Fairfax Station.
  • Geyer, C. J. (1995). Conditioning in Markov chain Monte Carlo. J. Comput. Graph. Statist. 4 148–154.
  • Goel, P. K. (1988). Software for Bayesian analysis: Current status and additional need. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 173–188. Oxford Univ. Press, Oxford.
  • Grieve, A. P. (1987). Applications of Bayesian software: Two examples. J. Roy. Statist. Soc. Ser. D Statistician 36 283–288.
  • Grieve, A. P. (1988). A Bayesian approach to the analysis of LD50 experiments. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 617–630. Oxford Univ. Press, Oxford.
  • Gubernatis, J. E., ed. (2003). The Monte Carlo Method in the Physical Sciences: Celebrating the 50th Anniversary of the Metropolis Algorithm. Amer. Inst. Phys., New York.
  • Hammersley, J. M. and Handscomb, D. C. (1964). Monte Carlo Methods, 2nd ed. Chapman and Hall, London.
  • Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97–109.
  • Hitchcock, D. B. (2003). A history of the Metropolis–Hastings algorithm. Amer. Statist. 57 254–257.
  • Hukushima, K. and Nemoto, K. (1996). Exchange Monte Carlo method and application to spin glass simulations. J. Phys. Soc. Japan 65 1604–1608.
  • Karlin, S. and Taylor, H. M. (1975). A First Course in Stochastic Processes, 2nd ed. Academic Press, New York.
  • Kass, R. E. (1997). Review of “Markov chain Monte Carlo in practice.” J. Amer. Statist. Assoc. 92 1645–1646.
  • Kass, R. E., Tierney, L. and Kadane, J. B. (1988). Asymptotics in Bayesian computation. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 261–278. Oxford Univ. Press, Oxford.
  • Kim, C. E. and Schervish, M. J. (1988). Stochastic models of incarceration careers. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 279–305. Oxford Univ. Press, Oxford.
  • Kloek, T. and van Dijk, H. K. (1978). Bayesian estimates of equation system parameters: An application of integration by Monte Carlo. Econometrica 46 1–19.
  • Kloek, T. and van Dijk, H. K. (1980). Further experience in Bayesian analysis using Monte Carlo integration. J. Econometrics 14 307–328.
  • Li, K. H. (1988). Imputation using Markov chains. J. Statist. Comput. Simul. 30 57–79.
  • Liu, C., Rubin, D. B. and Wu, Y. N. (1998). Parameter expansion to accelerate EM: The PX-EM algorith. Biometrika 85 755–770.
  • Liu, J. S., Wong, W. H. and Kong, A. (1994). Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes. Biometrika 81 27–40.
  • Liu, J. S. and Wu, Y. N. (1999). Parameter expansion scheme for data augmentation. J. Amer. Statist. Assoc. 94 1264–1274.
  • Lunn, D., Spiegelhalter, D. J., Thomas, A. and Best, N. (2009). The BUGS project: Evolution, critique and future directions. Stat. Med. 28 3049–3067.
  • Marriott, J. (1987). Bayesian numerical and graphical methods for Box–Jenkins time series. J. Roy. Statist. Soc. Ser. D Statistician 36 265–268.
  • Marriott, J. (1988). Reparametrization for Bayesian inference in ARMA time series. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 701–704. Oxford Univ. Press, Oxford.
  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equation of state calculations by fast computing machines. J. Chem. Phys. 21 1087–1092.
  • Morris, C. N. (1987). Comment on “The calculation of posterior distributions by data augmentation” by M. A. Tanner and W. H. Wong. J. Amer. Statist. Assoc. 82 542–543.
  • Morris, C. N. (1988). Approximating posterior distributions and posterior moments. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 327–344. Oxford Univ. Press, Oxford.
  • Naylor, J. C. (1987). Bayesian alternatives to t-tests. J. Roy. Statist. Soc. Ser. D Statistician 36 241–246.
  • Naylor, J. C. and Smith, A. F. M. (1982). Applications of a method for the efficient computation of posterior distributions. J. Roy. Statist. Soc. Ser. C Appl. Statist. 31 214–225.
  • O’Hagan, A. (1987). Monte Carlo is fundamentally unsound. J. Roy. Statist. Soc. Ser. D Statistician 36 247–249.
  • Pearl, J. (1987). Evidential reasoning using stochastic simulation of causal models. Artif. Intell. 32 245–257.
  • Poirier, D. J. (1988). Bayesian diagnostic testing in the general linear normal regression model. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 725–732. Oxford Univ. Press, Oxford.
  • Pole, A. (1988). Transfer response models: a numerical approach. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 733–745. Oxford Univ. Press, Oxford.
  • Ripley, B. D. (1987). Stochastic Simulation. Wiley, New York.
  • Robert, C. and Casella, G. (2010). A short history of Markov chain Monte Carlo—subjective recollections from incomplete data. In Handbook on Markov Chain Monte Carlo. Chapman and Hall/CRC Press, Boca Raton, FL.
  • Rubin, D. B. (1988). Using the SIR algorithm to simulate posterior distributions. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 395–402. Oxford Univ. Press, Oxford.
  • Rubinstein, R. Y. (1981). Simulation and the Monte Carlo Method, 1st ed. Wiley, New York.
  • Schnatter, S. (1988). Bayesian forecasting of time series by Gaussian sum approximation. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 757–764. Oxford Univ. Press, Oxford.
  • Shaw, J. E. H. (1987). Numerical Bayesian analysis of some flexible regression models. J. Roy. Statist. Soc. Ser. D Statistician 36 147–153.
  • Shaw, J. E. H. (1988a). A quasirandom approach to integration in Bayesian statistics. Ann. Statist. 16 895–914.
  • Shaw, J. E. H. (1988b). Aspects of numerical integration and summarisation. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 411–428. Oxford Univ. Press, Oxford.
  • Smith, A. F. M. (1988). What should be Bayesian about Bayesian software? In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 429–435. Oxford Univ. Press, Oxford.
  • Smith, A. F. M. (1991). Bayesian computational methods. Philos. Trans. Roy. Soc. Lond. Ser. A 337 369–386.
  • Smith, A. F. M., Skene, A. M., Shaw, J. E. H. and Naylor, J. C. (1987). Progress with numerical and graphical methods for practical Bayesian statistics. J. Roy. Statist. Soc. Ser. D Statistician 36 75–82.
  • Smith, A. F. M., Skene, A. M., Shaw, J. E. H., Naylor, J. C. and Dransfield, M. (1985). The implementation of the Bayesian paradigm. Commun. Stat. Theory Methods 14 1079–1102.
  • Spiegelhalter, D. J. (1987). Coherent evidence propagation in expert systems. J. Roy. Statist. Soc. Ser. D Statistician 36 201–210.
  • Spiegelhalter, D. J. and Lauritzen, S. L. (1990). Sequential updating of conditional probabilities on directed graphical structures. Networks 20 579–605.
  • Stewart, L. (1987). Hierarchical Bayesian analysis using Monte Carlo integration: Computing posterior distributions when there are many possible models. J. Roy. Statist. Soc. Ser. D Statistician 36 211–219.
  • Sweeting, T. J. (1988). Approximate posterior distributions in censored regression models. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 791–799. Oxford Univ. Press, Oxford.
  • Swendsen, R. H. and Wang, J. S. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett. 58 86–88.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528–550.
  • Tierney, L. and Kadane, J. B. (1986). Accurate approximations for posterior moments and marginal densities. J. Amer. Statist. Assoc. 81 82–86.
  • van der Merwe, A. J. and Groenewald, P. C. N. (1987). Bayes and empirical Bayes confidence intervals in applied research. J. Roy. Statist. Soc. Ser. D Statistician 36 171–179.
  • van Dijk, H. K. (1988). Discussion of Goel. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 187–188. Oxford Univ. Press, Oxford.
  • van Dijk, H. K., Hop, J. P. and Louter, A. S. (1987). An algorithm for the computation of posterior moments and densities using simple importance sampling. J. Roy. Statist. Soc. Ser. D Statistician 36 83–90.
  • van Dyk, D. A. and Meng, X. L. (2001). The art of data augmentation. J. Comput. Graph. Statist. 10 1–50.
  • Zellner, A. (1988). A Bayesian era. In Bayesian Statistics 3 (J. M. Bernardo, M. H. Degroot, D. V. Lindley and A. F. M. Smith, eds.) 509–516. Oxford Univ. Press, Oxford.