The Annals of Applied Statistics

Exploring the conformational space for protein folding with sequential Monte Carlo

Samuel W. K. Wong, Jun S. Liu, and S. C. Kou

Full-text: Open access


Computational methods for protein structure prediction from amino acid sequence are of vital importance in modern applications, for example protein design in biomedicine. Efficient sampling of conformations according to a given energy function remains a bottleneck, yet is a vital step for energy-based structure prediction methods. While the Protein Data Bank of experimentally determined 3-D protein structures has steadily increased in size, structure predictions for new proteins tend to be unreliable in the amino acid segments where there is low sequence similarity with known structures. In this paper we introduce a new method for building such segments of protein structures, inspired by sequential Monte Carlo methods. We apply our method to examples of real 3-D structure predictions and demonstrate its promise for improving low confidence segments. We also provide applications to the prediction of reconstructed segments in known structures, and to the assessment of energy function accuracy. We find that our method is able to produce conformations that have both low energies and good coverage of the conformational space and hence can be a useful tool for protein design and structure prediction.

Article information

Ann. Appl. Stat., Volume 12, Number 3 (2018), 1628-1654.

Received: May 2017
Revised: November 2017
First available in Project Euclid: 11 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Protein structure prediction particle filter structure refinement energy optimization


Wong, Samuel W. K.; Liu, Jun S.; Kou, S. C. Exploring the conformational space for protein folding with sequential Monte Carlo. Ann. Appl. Stat. 12 (2018), no. 3, 1628--1654. doi:10.1214/17-AOAS1124.

Export citation


  • Anfinsen, C. (1973). Principles that govern the folding of protein chains. Science 181 223–230.
  • Bernstein, F. C., Koetzle, T. F., Williams, G. J., Meyer, E. F., Brice, M. D., Rodgers, J. R., Kennard, O., Shimanouchi, T. and Tasumi, M. (1977). The protein data bank. Eur. J. Biochem. 80 319–324.
  • Brooks, C. L., Onuchic, J. N. and Wales, D. J. (2001). Taking a walk on a landscape. Science 293 612–613.
  • Canutescu, A. and Dunbrack, R. (2003). Cyclic coordinate descent: A robotics algorithm for protein loop closure. Protein Sci. 12 963–972.
  • Cooper, S., Khatib, F., Treuille, A., Barbero, J., Lee, J., Beenen, M., Leaver-Fay, A., Baker, D., Popović, Z. et al. (2010). Predicting protein structures with a multiplayer online game. Nature 466 756–760.
  • Coutsias, E., Seok, C., Jacobson, M. and Dill, K. (2004). A kinematic view of loop closure. J. Comput. Chem. 25 510–528.
  • Dill, K. A. and MacCallum, J. L. (2012). The protein-folding problem, 50 years on. Science 338 1042–1046.
  • Douc, R. and Cappé, O. (2005). Comparison of resampling schemes for particle filtering. In Image and Signal Processing and Analysis, 2005. ISPA 2005. Proceedings of the 4th International Symposium on 64–69. IEEE, New York.
  • Doucet, A., de Freitas, N. and Gordon, N. (2001). An introduction to sequential Monte Carlo methods. In Sequential Monte Carlo Methods in Practice. 3–14. Springer, New York.
  • Eddy, S. R. (2004). Where did the BLOSUM62 alignment score matrix come from? Nat. Biotechnol. 22 1035–1036.
  • Engh, R. and Huber, R. (1991). Accurate bond and angle parameters for X-ray protein-structure refinement. Acta Crystallogr. Sect. A 47 392–400.
  • Fearnhead, P. and Clifford, P. (2003). On-line inference for hidden Markov models via particle filters. J. R. Stat. Soc. Ser. B. Stat. Methodol. 65 887–899.
  • Fiser, A. and Šali, A. (2003). Modeller: Generation and refinement of homology-based protein structure models. Methods Enzymol. 374 461–491.
  • Friesner, R. A., Prigogine, I. and Rice, S. A. (2002). Computational Methods for Protein Folding. Wiley, New York.
  • Jones, J. E. (1924). On the determination of molecular fields. II. From the equation of state of a gas. Proc. R. Soc. Lond. Ser. A Math. Phys. Eng. Sci. 106 463–477.
  • Kabsch, W. and Sander, C. (1983). Dictionary of protein secondary structure—pattern-recognition of hydrogen-bonded and geometrical features. Biopolymers 22 2577–2637.
  • Kendrew, J. C., Bodo, G., Dintzis, H. M., Parrish, R., Wyckoff, H. and Phillips, D. C. (1958). A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 181 662–666.
  • Khoury, G. A., Smadbeck, J., Kieslich, C. A. and Floudas, C. A. (2014). Protein folding and de novo protein design for biotechnological applications. Trends Biotechnol. 32 99–109.
  • Krissinel, E. (2007). On the relationship between sequence and structure similarities in proteomics. Bioinformatics 23 717–723.
  • Lazaridis, T. and Karplus, M. (2000). Effective energy functions for protein structure prediction. Curr. Opin. Struck. Biol. 10 139–145.
  • Lee, D., Redfern, O. and Orengo, C. (2007). Predicting protein function from sequence and structure. Nat. Rev., Mol. Cell Biol. 8 995–1005.
  • Li, J., Abel, R., Zhu, K., Cao, Y., Zhao, S. and Friesner, R. A. (2011). The VSGB 2.0 model: A next generation energy model for high resolution protein structure modeling. Proteins 79 2794–2812.
  • Liang, S., Zhang, C. and Standley, D. M. (2011). Protein loop selection using orientation-dependent force fields derived by parameter optimization. Proteins 79 2260–2267.
  • Liang, S., Zhang, C. and Zhou, Y. (2014). LEAP: Highly accurate prediction of protein loop conformations by integrating coarse-grained sampling and optimized energy scores with all-atom refinement of backbone and side chains. J. Comput. Chem. 35 335–341.
  • Lin, M., Chen, R. and Liu, J. S. (2013). Lookahead strategies for sequential Monte Carlo. Statist. Sci. 28 69–94.
  • Liu, J. S. (2001). Monte Carlo Strategies in Scientific Computing. Springer, New York.
  • Liu, J. S. and Chen, R. (1998). Sequential Monte Carlo methods for dynamic systems. J. Amer. Statist. Assoc. 93 1032–1044.
  • Liu, J. S., Chen, R. and Wong, W. H. (1998). Rejection control and sequential importance sampling. J. Amer. Statist. Assoc. 93 1022–1031.
  • Liu, J. S., Liang, F. and Wong, W. H. (2000). The multiple-try method and local optimization in Metropolis sampling. J. Amer. Statist. Assoc. 95 121–134.
  • Mandell, D. J., Coutsias, E. A. and Kortemme, T. (2009). Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Nat. Methods 6 551–552.
  • Modi, V. and Dunbrack, R. L. (2016). Assessment of refinement of template-based models in CASP11. Proteins 84 260–281.
  • Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T. and Tramontano, A. (2016). Critical assessment of methods of protein structure prediction: Progress and new directions in round XI. Proteins 84 4–14.
  • Onuchic, J. N., Luthey-Schulten, Z. and Wolynes, P. G. (1997). Theory of protein folding: The energy landscape perspective. Annu. Rev. Phys. Chem. 48 545–600.
  • Ramachandran, G., Ramakrishnan, C. and Saisekharan, V. (1963). Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 7 95–99.
  • Rohl, C. A., Strauss, C. E., Chivian, D. and Baker, D. (2004). Modeling structurally variable regions in homologous proteins with rosetta. Proteins 55 656–677.
  • Shapovalov, M. V. and Dunbrack, R. L. (2011). A smoothed backbone-dependent rotamer library for proteins derived from adaptive kernel density estimates and regressions. Structure 19 844–858.
  • Söding, J., Biegert, A. and Lupas, A. N. (2005). The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. 33 W244–W248.
  • Soto, C. S., Fasnacht, M., Zhu, J., Forrest, L. and Honig, B. (2008). Loop modeling: Sampling, filtering, and scoring. Proteins 70 834–843.
  • Tan, K., Gu, M., Clancy, S. and Joachimiak, A. (2016). The crystal structure of the catalytic domain of peptidoglycan N-acetylglucosamine deacetylase from Eubacterium rectale ATCC 33656 (CASP target). PDB ID: 5JMU. DOI:10.2210/pdb5jmu/pdb.
  • Tang, K., Zhang, J. and Liang, J. (2014). Fast protein loop sampling and structure prediction using distance-guided sequential chain-growth Monte Carlo method. PLoS Comput. Biol. 10 e1003539.
  • Vlugt, T., Martin, M., Smit, B., Siepmann, J. and Krishna, R. (1998). Improving the efficiency of the configurational-bias Monte Carlo algorithm. Mol. Phys. 94 727–733.
  • Wang, G. and Dunbrack, R. L. (2003). PISCES: A protein sequence culling server. Bioinformatics 19 1589–1591.
  • Wang, F. and Landau, D. P. (2001). Efficient, multiple-range random walk algorithm to calculate the density of states. Phys. Rev. Lett. 86 2050–2053.
  • Wick, C. and Siepmann, J. (2000). Self-adapting fixed-end-point configurational-bias Monte Carlo method for the regrowth of interior segments of chain molecules with strong intramolecular interactions. Macromolecules 33 7207–7218.
  • Wong, W., Cui, Y. and Chen, R. (1998). Torsional relaxation for biopolymers. J. Comput. Biol. 5 655–665.
  • Wong, S. W. K., Liu, J. S. and Kou, S. C. (2017). Fast de novo discovery of low-energy protein loop conformations. Proteins 85 1402–1412.
  • Zhang, J., Kou, S. C. and Liu, J. S. (2007). Biopolymer structure simulation and optimization via fragment regrowth Monte Carlo. J. Chem. Phys. 126 225101. DOI:10.1063/1.2736681.
  • Zhang, J., Lin, M., Chen, R., Liang, J. and Liu, J. S. (2007). Monte Carlo sampling of near-native structures of proteins with applications. Proteins 66 61–68.
  • Zhou, H. and Zhou, Y. (2002). Distance-scaled, finite ideal-gas reference state improves structure-derived potentials of mean force for structure selection and stability prediction. Protein Sci. 11 2714–2726.