The Annals of Statistics

Equi-energy sampler with applications in statistical inference and statistical mechanics

S. C. Kou, Qing Zhou, and Wing Hung Wong

Full-text: Open access

Abstract

We introduce a new sampling algorithm, the equi-energy sampler, for efficient statistical sampling and estimation. Complementary to the widely used temperature-domain methods, the equi-energy sampler, utilizing the temperature–energy duality, targets the energy directly. The focus on the energy function not only facilitates efficient sampling, but also provides a powerful means for statistical estimation, for example, the calculation of the density of states and microcanonical averages in statistical mechanics. The equi-energy sampler is applied to a variety of problems, including exponential regression in statistics, motif sampling in computational biology and protein folding in biophysics.

Article information

Source
Ann. Statist. Volume 34, Number 4 (2006), 1581-1619.

Dates
First available in Project Euclid: 3 November 2006

Permanent link to this document
http://projecteuclid.org/euclid.aos/1162567622

Digital Object Identifier
doi:10.1214/009053606000000515

Mathematical Reviews number (MathSciNet)
MR2283711

Zentralblatt MATH identifier
06075560

Subjects
Primary: 65C05: Monte Carlo methods
Secondary: 65C40: Computational Markov chains 82B80: Numerical methods (Monte Carlo, series resummation, etc.) [See also 65-XX, 81T80] 62F15: Bayesian inference

Keywords
Sampling estimation temperature energy density of states microcanonical distribution motif sampling protein folding

Citation

Kou, S. C.; Zhou, Qing; Wong, Wing Hung. Equi-energy sampler with applications in statistical inference and statistical mechanics. Ann. Statist. 34 (2006), no. 4, 1581--1619. doi:10.1214/009053606000000515. http://projecteuclid.org/euclid.aos/1162567622.


Export citation

References

  • Bailey, T. L. and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Second International Conference on Intelligent Systems for Molecular Biology 2 28--36. AAAI Press, Menlo Park, CA.
  • Berg, B. A. and Neuhaus, T. (1991). Multicanonical algorithms for first order phase-transitions. Phys. Lett. B 267 249--253.
  • Besag, J. and Green, P. J. (1993). Spatial statistics and Bayesian computation. J. Roy. Statist. Soc. Ser. B 55 25--37.
  • Dill, K. A. and Chan, H. S. (1997). From Levinthal to pathways to funnels. Nature Structural Biology 4 10--19.
  • Edwards, R. G. and Sokal, A. D. (1988). Generalization of the Fortuin--Kasteleyn--Swendsen--Wang representation and Monte Carlo algorithm. Phys. Rev. D (3) 38 2009--2012.
  • Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398--409.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Analysis and Machine Intelligence 6 721--741.
  • Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proc. 23rd Symposium on the Interface (E. M. Keramidas, ed.) 156--163. Interface Foundation, Fairfax Station, VA.
  • Geyer, C. J. (1994). Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo. Technical Report 568, School of Statistics, Univ. Minnesota.
  • Geyer, C. J. and Thompson, E. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909--920.
  • Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97--109.
  • Higdon, D. M. (1998). Auxiliary variable methods for Markov chain Monte Carlo with applications. J. Amer. Statist. Assoc. 93 585--595.
  • Hukushima, K. and Nemoto, K. (1996). Exchange Monte Carlo and application to spin glass simulations. J. Phys. Soc. Japan 65 1604--1608.
  • Jensen, S. T., Liu, X. S., Zhou, Q. and Liu, J. S. (2004). Computational discovery of gene regulatory binding motifs: A Bayesian perspective. Statist. Sci. 19 188--204.
  • Kong, A., Liu, J. S. and Wong, W. H. (1994). Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc. 89 278--288.
  • Kou, S. C., Oh, J. and Wong, W. H. (2006). A study of density of states and ground states in hydrophobic-hydrophilic protein folding models by equi-energy sampling. J. Chemical Physics 124 244903.
  • Kou, S. C., Xie, X. S. and Liu, J. S. (2005). Bayesian analysis of single-molecule experimental data (with discussion). Appl. Statist. 54 469--506.
  • Landau, D. P. and Binder, K. (2000). A Guide to Monte Carlo Simulations in Statistical Physics. Cambridge Univ. Press.
  • Lau, K. F. and Dill, K. A. (1989). A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22 3986--3997.
  • Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262 208--214.
  • Lawrence, C. E. and Reilly, A. A. (1990). An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7 41--51.
  • Li, K.-H. (1988). Imputation using Markov chains. J. Statist. Comput. Simulation 30 57--79.
  • Liang, F. and Wong, W. H. (2001). Real-parameter evolutionary Monte Carlo with applications to Bayesian mixture models. J. Amer. Statist. Assoc. 96 653--666.
  • Liu, J. S. (1994). The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Amer. Statist. Assoc. 89 958--966.
  • Liu, J. S., Neuwald, A. F. and Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statist. Assoc. 90 1156--1170.
  • Liu, X., Brutlag, D. L. and Liu, J. S. (2001). BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In Pacific Symp. Biocomputing 6 127--138.
  • Marinari, E. and Parisi, G. (1992). Simulated tempering: A new Monte Carlo scheme, Europhys. Lett. 19 451--458.
  • Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statist. Sinica 6 831--860.
  • Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equations of state calculations by fast computing machines. J. Chemical Physics 21 1087--1091.
  • Mira, A., Moller, J. and Roberts, G. (2001). Perfect slice samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 593--606.
  • Neal, R. M. (2003). Slice sampling (with discussion). Ann. Statist. 31 705--767.
  • Roberts, G. and Rosenthal, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643--660.
  • Roth, F. P., Hughes, J. D., Estep, P. W. and Church, G. M. (1998). Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnology 16 939--945.
  • Schneider, T. D. and Stephens, R. M. (1990). Sequence logos: A new way to display consensus sequences. Nucleic Acids Research 18 6097--6100.
  • Sela, M., White, F. H. and Anfinsen, C. B. (1957). Reductive cleavage of disulfide bridges in ribonuclease. Science 125 691--692.
  • Stormo, G. D. and Hartzell, G. W. (1989). Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86 1183--1187.
  • Swendsen, R. H. and Wang, J.-S. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett. 58 86--88.
  • Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528--550.
  • Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701--1762.
  • Wang, F. and Landau, D. P. (2001). Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram. Phys. Rev. E 64 056101.
  • Zhou, Q. and Wong, W. H. (2004). CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. USA 101 12114--12119.