The Annals of Applied Statistics

$\gamma$-SUP: A clustering algorithm for cryo-electron microscopy images of asymmetric particles

Ting-Li Chen, Dai-Ni Hsieh, Hung Hung, I-Ping Tu, Pei-Shien Wu, Yi-Ming Wu, Wei-Hau Chang, and Su-Yun Huang

Full-text: Open access


Cryo-electron microscopy (cryo-EM) has recently emerged as a powerful tool for obtaining three-dimensional (3D) structures of biological macromolecules in native states. A minimum cryo-EM image data set for deriving a meaningful reconstruction is comprised of thousands of randomly orientated projections of identical particles photographed with a small number of electrons. The computation of 3D structure from 2D projections requires clustering, which aims to enhance the signal to noise ratio in each view by grouping similarly oriented images. Nevertheless, the prevailing clustering techniques are often compromised by three characteristics of cryo-EM data: high noise content, high dimensionality and large number of clusters. Moreover, since clustering requires registering images of similar orientation into the same pixel coordinates by 2D alignment, it is desired that the clustering algorithm can label misaligned images as outliers. Herein, we introduce a clustering algorithm $\gamma$-SUP to model the data with a $q$-Gaussian mixture and adopt the minimum $\gamma$-divergence for estimation, and then use a self-updating procedure to obtain the numerical solution. We apply $\gamma$-SUP to the cryo-EM images of two benchmark macromolecules, RNA polymerase II and ribosome. In the former case, simulated images were chosen to decouple clustering from alignment to demonstrate $\gamma$-SUP is more robust to misalignment outliers than the existing clustering methods used in the cryo-EM community. In the latter case, the clustering of real cryo-EM data by our $\gamma$-SUP method eliminates noise in many views to reveal true structure features of ribosome at the projection level.

Article information

Ann. Appl. Stat., Volume 8, Number 1 (2014), 259-285.

First available in Project Euclid: 8 April 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Clustering algorithm cryo-EM images $\gamma$-divergence $k$-means mean-shift algorithm multilinear principal component analysis $q$-Gaussian distribution robust statistics self-updating process


Chen, Ting-Li; Hsieh, Dai-Ni; Hung, Hung; Tu, I-Ping; Wu, Pei-Shien; Wu, Yi-Ming; Chang, Wei-Hau; Huang, Su-Yun. $\gamma$-SUP: A clustering algorithm for cryo-electron microscopy images of asymmetric particles. Ann. Appl. Stat. 8 (2014), no. 1, 259--285. doi:10.1214/13-AOAS680.

Export citation


  • Adrian, M., Dubochet, J., Lepault, J. and McDowall, A. W. (1984). Cryo-electron microscopy of viruses. Nature 308 32–36.
  • Amari, S.-i. and Ohara, A. (2011). Geometry of $q$-exponential family of probability distributions. Entropy 13 1170–1185.
  • Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics 49 803–821.
  • Basu, A., Harris, I. R., Hjort, N. L. and Jones, M. C. (1998). Robust and efficient estimation by minimising a density power divergence. Biometrika 85 549–559.
  • Bracewell, R. N. (1956). Strip integration in radio astronomy. Austral. J. Phys. 9 198–217.
  • Carreira-Perpiñán, M. A. (2006). Fast nonparametric clustering with Gaussian blurring mean-shift. In Proceeding of the 23rd International Conference on Machine Learning 153–160. ACM, Pittsburgh, PA.
  • Chang, W.-H., Chiu, M.-K., Chen, C.-Y., Yen, C.-F., Lin, Y.-C., Weng, Y.-P., Chang, J.-C., Wu, Y.-M., Cheng, H., Fu, J. and Tu, I.-P. (2010). Zernike phase plate cryo-electron microscopy facilitates single particle analysis of unstained asymmetric protein complexes. Structure 18 17–27.
  • Chen, T.-L. (2013). On the convergence and consistency of the blurring mean-shift process. Available at arXiv:1305.1040.
  • Chen, T.-L. and Shiu, S.-Y. (2007). A clustering algorithm by self-updating process. In JSM Proceedings, Statistical Computing Section 2034–2038. American Statistical Association, Salt Lake City, UT.
  • Cheng, Y. (1995). Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence 17 790–799.
  • Cichocki, A. and Amari, S.-i. (2010). Families of alpha- beta- and gamma-divergences: Flexible and robust measures of similarities. Entropy 12 1532–1568.
  • Crowther, R. A., Amos, L. A., Finch, J. T., De Rosier, D. J. and Klug, A. (1970). Three dimensional reconstructions of spherical viruses by Fourier synthesis from electron micrographs. Nature 226 421–425.
  • Dubochet, J. (2012). Cryo-EM–the first thirty years. J. Microsc. 245 221–224.
  • Eguchi, S., Komori, O. and Kato, S. (2011). Projective power entropy and maximum Tsallis entropy distributions. Entropy 13 1746–1764.
  • Field, C. and Smith, B. (1994). Robust estimation: A weighted maximum likelihood approach. International Statistical Review 62 405–424.
  • Frank, J. (2002). Single-particle imaging of macromolecules by cryo-electron microscopy. Annu. Rev. Biophys. Biomol. Struct. 31 303–319.
  • Frank, J. (2009). Single-particle reconstruction of biological macromolecules in electron microscopy—30 years. Q. Rev. Biophys. 42 139–158.
  • Frank, J. (2012). Intermediate states during mRNA-tRNA translocation. Curr. Opin. Struct. Biol. 22 778–785.
  • Frank, J., Radermachera, M., Penczeka, P., Zhua, J., Li, Y., Ladjadj, M. and Leitha, A. (1996). SPIDER and WEB: Processing and visualization of images in 3D electron microscopy and related fields. Journal of Structural Biology 116 190–199.
  • Frigyik, B. A., Srivastava, S. and Gupta, M. R. (2008). Functional Bregman divergence and Bayesian estimation of distributions. IEEE Trans. Inform. Theory 54 5130–5139.
  • Fujisawa, H. and Eguchi, S. (2008). Robust parameter estimation with a small bias against heavy contamination. J. Multivariate Anal. 99 2053–2081.
  • Fukunaga, K. and Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inform. Theory IT-21 32–40.
  • Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • Good, I. J. (1971). Comment on “Measuring information and uncertainty.” In Foundation of Statistical Inference (V. P. Godambe and D. A. Sprott, eds.) 265–273. Holt, Rinehart and Winston, Toronto.
  • Grassucci, R., Taylor, D. and Frank, J. (2011). Preparation of macromolecular complexes for cryo-electron microscopy. Nature Protocols 2 3239–3246.
  • Hall, R. J., Nogales, E. and Glaeser, R. M. (2011). Accurate modeling of single-particle cryo-EM images quantitates the benefits expected from using Zernike phase contrast. J. Struct. Biol. 174 468–475.
  • Hartigan, J. A. (1975). Clustering Algorithms. Wiley, New York.
  • Henderson, R. (1995). The potential and limitations of neutrons, electrons and X-rays for atomic resolution microscopy of unstained biological molecules. Q. Rev. Biophys. 28 171–193.
  • Hung, H., Wu, P., Tu, I. and Huang, S. (2012). On multilinear principal component analysis of order-two tensors. Biometrika 99 569–583.
  • Jiang, W., Baker, M. L., Jakana, J., Weigele, P. R., King, J. and Chiu, W. (2008). Backbone structure of the infectious epsilon15 virus capsid revealed by electron cryomicroscopy. Nature 451 1130–1134.
  • Lepault, J., Booy, F. P. and Dubochet, J. (1983). Electron microscopy of frozen biological suspensions. J. Microsc. 129 89–102.
  • Liu, H., Jin, L., Koh, S. B. S., Atanasov, I., Schein, S., Wu, L. and Zhou, Z. H. (2010). Atomic structure of human adenovirus by cryo-EM reveals interactions among protein networks. Science 329 1038–1043.
  • Lloyd, S. P. (1982). Least squares quantization in PCM. IEEE Trans. Inform. Theory 28 129–137.
  • Lu, H., Plataniotis, K. N. K. and Venetsanopoulos, A. N. (2008). MPCA: Multilinear principal component analysis of tensor objects. IEEE Trans. Neural. Netw. 19 18–39.
  • Manning, C., Raghavan, P. and Schtze, H. (2008). Introduction to Information Retrieval. Cambridge Univ. Press, New York.
  • McQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability 291–297. Univ. California Press, Berkeley, CA.
  • Mollah, M. N. H., Sultana, N., Minami, M. and Eguchi, S. (2010). Robust extraction of local structures by the minimum beta-divergence method. Neural Networks 23 226–238.
  • Saibil, H. R. (2000). Macromolecular structure determination by cryo-electron microscopy. Acta Crystallographica Section D-Biological Crystallography 56 1215–1222.
  • Shiu, S.-Y. and Chen, T.-L. (2012). Clustering by self-updating process. Available at arXiv:1201.1979.
  • Singer, A., Coifman, R. R., Sigworth, F. J., Chester, D. W. and Shkolnisky, Y. (2010). Detecting consistent common lines in cryo-EM by voting. J. Struct. Biol. 169 312–322.
  • Sorzano, C. O. S., Marabini, R., Velázquez-Muriel, J., Bilbao-Castro, J. R., Scheres, S. H. W., Carazo, J. M. and Pascual-Montano, A. (2004). XMIPP: A new generation of an open-source image processing package for electron microscopy. J. Struct. Biol. 148 194–204.
  • Sorzano, C. O. S., Bilbao-Castro, J. R., Shkolnisky, Y., Alcorlo, M., Melero, R., Caffarena-Fernandez, G., Li, M., Xu, G., Marabini, R. and Carazo, J. M. (2010). A clustering approach to multireference alignment of single-particle projections in electron microscopy. Journal of Structural Biology 171 197–206.
  • Tibshirani, R., Walther, G. and Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 411–423.
  • van Heel, M. (1987). Angular reconstitution: A posteriori assignment of projection directions for 3D reconstruction. Ultramicroscopy 21 111–124.
  • van Heel, M., Gowen, B., Matadeen, R., Orlova, E. V., Finn, R., Pape, T., Cohen, D., Stark, H., Schmidt, R., Schatz, M. and Patwardhan, A. (2000). Single-particle electron cryo-microscopy: Towards atomic resolution. Q. Rev. Biophys. 33 307–369.
  • Wilsome, D. and Cate, J. (2012). The structure and function of the eukaryotic ribosome. Cold Spring Harbor Perspectives in Biology 4 a011536.
  • Windham, M. P. (1995). Robustifying model fitting. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 599–609.
  • Yang, Z., Fang, J., Chittuluru, J., Asturias, F. J. and Penczek, P. A. (2012). Iterative stable alignment and clustering of 2D transmission electron microscope images. Structure 20 237–247.