We introduce a new sampling algorithm, the equi-energy sampler, for efficient statistical sampling and estimation. Complementary to the widely used temperature-domain methods, the equi-energy sampler, utilizing the temperature–energy duality, targets the energy directly. The focus on the energy function not only facilitates efficient sampling, but also provides a powerful means for statistical estimation, for example, the calculation of the density of states and microcanonical averages in statistical mechanics. The equi-energy sampler is applied to a variety of problems, including exponential regression in statistics, motif sampling in computational biology and protein folding in biophysics.
References
Bailey, T. L. and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Second International Conference on Intelligent Systems for Molecular Biology 2 28--36. AAAI Press, Menlo Park, CA.
Berg, B. A. and Neuhaus, T. (1991). Multicanonical algorithms for first order phase-transitions. Phys. Lett. B 267 249--253.
Besag, J. and Green, P. J. (1993). Spatial statistics and Bayesian computation. J. Roy. Statist. Soc. Ser. B 55 25--37.
Dill, K. A. and Chan, H. S. (1997). From Levinthal to pathways to funnels. Nature Structural Biology 4 10--19.
Edwards, R. G. and Sokal, A. D. (1988). Generalization of the Fortuin--Kasteleyn--Swendsen--Wang representation and Monte Carlo algorithm. Phys. Rev. D (3) 38 2009--2012.
Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398--409.
Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Analysis and Machine Intelligence 6 721--741.
Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proc. 23rd Symposium on the Interface (E. M. Keramidas, ed.) 156--163. Interface Foundation, Fairfax Station, VA.
Geyer, C. J. (1994). Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo. Technical Report 568, School of Statistics, Univ. Minnesota.
Geyer, C. J. and Thompson, E. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909--920.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97--109.
Higdon, D. M. (1998). Auxiliary variable methods for Markov chain Monte Carlo with applications. J. Amer. Statist. Assoc. 93 585--595.
Hukushima, K. and Nemoto, K. (1996). Exchange Monte Carlo and application to spin glass simulations. J. Phys. Soc. Japan 65 1604--1608.
Jensen, S. T., Liu, X. S., Zhou, Q. and Liu, J. S. (2004). Computational discovery of gene regulatory binding motifs: A Bayesian perspective. Statist. Sci. 19 188--204.
Kong, A., Liu, J. S. and Wong, W. H. (1994). Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc. 89 278--288.
Kou, S. C., Oh, J. and Wong, W. H. (2006). A study of density of states and ground states in hydrophobic-hydrophilic protein folding models by equi-energy sampling. J. Chemical Physics 124 244903.
Kou, S. C., Xie, X. S. and Liu, J. S. (2005). Bayesian analysis of single-molecule experimental data (with discussion). Appl. Statist. 54 469--506.
Landau, D. P. and Binder, K. (2000). A Guide to Monte Carlo Simulations in Statistical Physics. Cambridge Univ. Press.
Lau, K. F. and Dill, K. A. (1989). A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22 3986--3997.
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262 208--214.
Lawrence, C. E. and Reilly, A. A. (1990). An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7 41--51.
Li, K.-H. (1988). Imputation using Markov chains. J. Statist. Comput. Simulation 30 57--79.
Liang, F. and Wong, W. H. (2001). Real-parameter evolutionary Monte Carlo with applications to Bayesian mixture models. J. Amer. Statist. Assoc. 96 653--666.
Liu, J. S. (1994). The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Amer. Statist. Assoc. 89 958--966.
Liu, J. S., Neuwald, A. F. and Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statist. Assoc. 90 1156--1170.
Liu, X., Brutlag, D. L. and Liu, J. S. (2001). BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In Pacific Symp. Biocomputing 6 127--138.
Marinari, E. and Parisi, G. (1992). Simulated tempering: A new Monte Carlo scheme, Europhys. Lett. 19 451--458.
Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statist. Sinica 6 831--860.
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equations of state calculations by fast computing machines. J. Chemical Physics 21 1087--1091.
Mira, A., Moller, J. and Roberts, G. (2001). Perfect slice samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 593--606.
Neal, R. M. (2003). Slice sampling (with discussion). Ann. Statist. 31 705--767.
Roberts, G. and Rosenthal, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643--660.
Roth, F. P., Hughes, J. D., Estep, P. W. and Church, G. M. (1998). Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnology 16 939--945.
Schneider, T. D. and Stephens, R. M. (1990). Sequence logos: A new way to display consensus sequences. Nucleic Acids Research 18 6097--6100.
Sela, M., White, F. H. and Anfinsen, C. B. (1957). Reductive cleavage of disulfide bridges in ribonuclease. Science 125 691--692.
Stormo, G. D. and Hartzell, G. W. (1989). Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86 1183--1187.
Swendsen, R. H. and Wang, J.-S. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett. 58 86--88.
Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528--550.
Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701--1762.
Wang, F. and Landau, D. P. (2001). Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram. Phys. Rev. E 64 056101.
Zhou, Q. and Wong, W. H. (2004). CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. USA 101 12114--12119.