The Annals of Statistics

Equi-energy sampler with applications in statistical inference and statistical mechanics

S. C. Kou, Qing Zhou, and Wing Hung Wong

Source: Ann. Statist. Volume 34, Number 4 (2006), 1581-1619.

Abstract

We introduce a new sampling algorithm, the equi-energy sampler, for efficient statistical sampling and estimation. Complementary to the widely used temperature-domain methods, the equi-energy sampler, utilizing the temperature–energy duality, targets the energy directly. The focus on the energy function not only facilitates efficient sampling, but also provides a powerful means for statistical estimation, for example, the calculation of the density of states and microcanonical averages in statistical mechanics. The equi-energy sampler is applied to a variety of problems, including exponential regression in statistics, motif sampling in computational biology and protein folding in biophysics.

Primary Subjects: 65C05
Secondary Subjects: 65C40, 82B80, 62F15
Keywords: Sampling; estimation; temperature; energy; density of states; microcanonical distribution; motif sampling; protein folding

Full-text: Open access

Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1162567622
Digital Object Identifier: doi:10.1214/009053606000000515
Mathematical Reviews number (MathSciNet): MR2283711

References

Bailey, T. L. and Elkan, C. (1994). Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proc. Second International Conference on Intelligent Systems for Molecular Biology 2 28--36. AAAI Press, Menlo Park, CA.
Berg, B. A. and Neuhaus, T. (1991). Multicanonical algorithms for first order phase-transitions. Phys. Lett. B 267 249--253.
Besag, J. and Green, P. J. (1993). Spatial statistics and Bayesian computation. J. Roy. Statist. Soc. Ser. B 55 25--37.
Mathematical Reviews (MathSciNet): MR1210422
Dill, K. A. and Chan, H. S. (1997). From Levinthal to pathways to funnels. Nature Structural Biology 4 10--19.
Edwards, R. G. and Sokal, A. D. (1988). Generalization of the Fortuin--Kasteleyn--Swendsen--Wang representation and Monte Carlo algorithm. Phys. Rev. D (3) 38 2009--2012.
Mathematical Reviews (MathSciNet): MR0965465
Digital Object Identifier: doi:10.1103/PhysRevD.38.2009
Gelfand, A. E. and Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398--409.
Mathematical Reviews (MathSciNet): MR1141740
Digital Object Identifier: doi:10.2307/2289776
Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Analysis and Machine Intelligence 6 721--741.
Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. In Computing Science and Statistics: Proc. 23rd Symposium on the Interface (E. M. Keramidas, ed.) 156--163. Interface Foundation, Fairfax Station, VA.
Zentralblatt MATH: 0751.12004
Geyer, C. J. (1994). Estimating normalizing constants and reweighting mixtures in Markov chain Monte Carlo. Technical Report 568, School of Statistics, Univ. Minnesota.
Geyer, C. J. and Thompson, E. (1995). Annealing Markov chain Monte Carlo with applications to ancestral inference. J. Amer. Statist. Assoc. 90 909--920.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57 97--109.
Higdon, D. M. (1998). Auxiliary variable methods for Markov chain Monte Carlo with applications. J. Amer. Statist. Assoc. 93 585--595.
Hukushima, K. and Nemoto, K. (1996). Exchange Monte Carlo and application to spin glass simulations. J. Phys. Soc. Japan 65 1604--1608.
Jensen, S. T., Liu, X. S., Zhou, Q. and Liu, J. S. (2004). Computational discovery of gene regulatory binding motifs: A Bayesian perspective. Statist. Sci. 19 188--204.
Mathematical Reviews (MathSciNet): MR2082154
Digital Object Identifier: doi:10.1214/088342304000000107
Project Euclid: euclid.ss/1089808282
Kong, A., Liu, J. S. and Wong, W. H. (1994). Sequential imputations and Bayesian missing data problems. J. Amer. Statist. Assoc. 89 278--288.
Kou, S. C., Oh, J. and Wong, W. H. (2006). A study of density of states and ground states in hydrophobic-hydrophilic protein folding models by equi-energy sampling. J. Chemical Physics 124 244903.
Kou, S. C., Xie, X. S. and Liu, J. S. (2005). Bayesian analysis of single-molecule experimental data (with discussion). Appl. Statist. 54 469--506.
Mathematical Reviews (MathSciNet): MR2137252
Digital Object Identifier: doi:10.1111/j.1467-9876.2005.00509.x
Landau, D. P. and Binder, K. (2000). A Guide to Monte Carlo Simulations in Statistical Physics. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1781083
Zentralblatt MATH: 0998.82504
Lau, K. F. and Dill, K. A. (1989). A lattice statistical mechanics model of the conformational and sequence spaces of proteins. Macromolecules 22 3986--3997.
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. and Wootton, J. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262 208--214.
Lawrence, C. E. and Reilly, A. A. (1990). An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7 41--51.
Li, K.-H. (1988). Imputation using Markov chains. J. Statist. Comput. Simulation 30 57--79.
Mathematical Reviews (MathSciNet): MR1005883
Digital Object Identifier: doi:10.1080/00949658808811085
Liang, F. and Wong, W. H. (2001). Real-parameter evolutionary Monte Carlo with applications to Bayesian mixture models. J. Amer. Statist. Assoc. 96 653--666.
Mathematical Reviews (MathSciNet): MR1946432
Digital Object Identifier: doi:10.1198/016214501753168325
Liu, J. S. (1994). The collapsed Gibbs sampler in Bayesian computations with applications to a gene regulation problem. J. Amer. Statist. Assoc. 89 958--966.
Mathematical Reviews (MathSciNet): MR1294740
Digital Object Identifier: doi:10.2307/2290921
Liu, J. S., Neuwald, A. F. and Lawrence, C. E. (1995). Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statist. Assoc. 90 1156--1170.
Liu, X., Brutlag, D. L. and Liu, J. S. (2001). BioProspector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. In Pacific Symp. Biocomputing 6 127--138.
Marinari, E. and Parisi, G. (1992). Simulated tempering: A new Monte Carlo scheme, Europhys. Lett. 19 451--458.
Meng, X.-L. and Wong, W. H. (1996). Simulating ratios of normalizing constants via a simple identity: A theoretical exploration. Statist. Sinica 6 831--860.
Mathematical Reviews (MathSciNet): MR1422406
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H. and Teller, E. (1953). Equations of state calculations by fast computing machines. J. Chemical Physics 21 1087--1091.
Mira, A., Moller, J. and Roberts, G. (2001). Perfect slice samplers. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 593--606.
Mathematical Reviews (MathSciNet): MR1858405
Digital Object Identifier: doi:10.1111/1467-9868.00301
Neal, R. M. (2003). Slice sampling (with discussion). Ann. Statist. 31 705--767.
Mathematical Reviews (MathSciNet): MR1994729
Digital Object Identifier: doi:10.1214/aos/1056562461
Project Euclid: euclid.aos/1056562461
Roberts, G. and Rosenthal, J. S. (1999). Convergence of slice sampler Markov chains. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 643--660.
Mathematical Reviews (MathSciNet): MR1707866
Digital Object Identifier: doi:10.1111/1467-9868.00198
Roth, F. P., Hughes, J. D., Estep, P. W. and Church, G. M. (1998). Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnology 16 939--945.
Schneider, T. D. and Stephens, R. M. (1990). Sequence logos: A new way to display consensus sequences. Nucleic Acids Research 18 6097--6100.
Sela, M., White, F. H. and Anfinsen, C. B. (1957). Reductive cleavage of disulfide bridges in ribonuclease. Science 125 691--692.
Stormo, G. D. and Hartzell, G. W. (1989). Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. USA 86 1183--1187.
Swendsen, R. H. and Wang, J.-S. (1987). Nonuniversal critical dynamics in Monte Carlo simulations. Phys. Rev. Lett. 58 86--88.
Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussion). J. Amer. Statist. Assoc. 82 528--550.
Mathematical Reviews (MathSciNet): MR0898357
Digital Object Identifier: doi:10.2307/2289457
Tierney, L. (1994). Markov chains for exploring posterior distributions (with discussion). Ann. Statist. 22 1701--1762.
Mathematical Reviews (MathSciNet): MR1329166
Digital Object Identifier: doi:10.1214/aos/1176325750
Project Euclid: euclid.aos/1176325750
Wang, F. and Landau, D. P. (2001). Determining the density of states for classical statistical models: A random walk algorithm to produce a flat histogram. Phys. Rev. E 64 056101.
Zhou, Q. and Wong, W. H. (2004). CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. USA 101 12114--12119.

2010 © Institute of Mathematical Statistics