Statistical Science

Graphics Processing Units and High-Dimensional Optimization

Hua Zhou, Kenneth Lange, and Marc A. Suchard

Full-text: Open access


This article discusses the potential of graphics processing units (GPUs) in high-dimensional optimization problems. A single GPU card with hundreds of arithmetic cores can be inserted in a personal computer and dramatically accelerates many statistical algorithms. To exploit these devices fully, optimization algorithms should reduce to multiple parallel tasks, each accessing a limited amount of data. These criteria favor EM and MM algorithms that separate parameters and data. To a lesser extent block relaxation and coordinate descent and ascent also qualify. We demonstrate the utility of GPUs in nonnegative matrix factorization, PET image reconstruction, and multidimensional scaling. Speedups of 100-fold can easily be attained. Over the next decade, GPUs will fundamentally alter the landscape of computational statistics. It is time for more statisticians to get on-board.

Article information

Statist. Sci., Volume 25, Number 3 (2010), 311-324.

First available in Project Euclid: 4 January 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Block relaxation EM and MM algorithms multidimensional scaling nonnegative matrix factorization parallel computing PET scanning


Zhou, Hua; Lange, Kenneth; Suchard, Marc A. Graphics Processing Units and High-Dimensional Optimization. Statist. Sci. 25 (2010), no. 3, 311--324. doi:10.1214/10-STS336.

Export citation


  • [1] Ayers, K. L. and Lange, K. L (2008). Penalized estimation of haplotype frequencies. Bioinformatics 24 1596–1602.
  • [2] Berry, M. W., Browne, M., Langville, A. N., Pauca, V. P. and Plemmons, R. J. (2007). Algorithms and applications for approximate nonnegative matrix factorization. Comput. Statist. Data Anal. 52 155–173.
  • [3] Buckner, J., Wilson, J., Seligman, M., Athey, B., Watson, S. and Meng, F. (2010). The gputools package enables GPU computing in R. Bioinformatics 26 134–135.
  • [4] de Leeuw, J. (1992). Fitting distances by least squares. Unpublished manuscript.
  • [5] de Leeuw, J. and Heiser, W. J. (1977). Convergence of correction matrix algorithms for multidimensional scaling. In Geometric Representations of Relational Data 133–145. Mathesis Press, Ann Arbor, MI.
  • [6] de Leeuw, J. (1977). Applications of convex analysis to multidimensional scaling. In Recent Developments in Statistics (Proc. European Meeting Statisticians, Grenoble, 1976) 133–145. North-Holland, Amsterdam.
  • [7] de Leeuw, J. (1994). Block relaxation algorithms in statistics. In Information Systems and Data Analysis. Springer, Berlin.
  • [8] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1–38.
  • [9] Diaconis, P., Goel, S. and Holmes, S. (2008). Horseshoes in multidimensional scaling and local kernel methods. Ann. Appl. Statist. 2 777–807.
  • [10] Groenen, P. J. F. and Heiser, W. J. (1996). The tunneling method for global optimization in multidimensional scaling. Pshychometrika 61 529–550.
  • [11] Jamshidian, M. and Jennrich, R. I. (1993). Conjugate gradient acceleration of the EM algorithm. J. Amer. Statist. Assoc. 88 221–228.
  • [12] Jamshidian, M. and Jennrich, R. I. (1997). Acceleration of the EM algorithm by using quasi-Newton methods. J. Roy. Statist. Soc. Ser. B 59 569–587.
  • [13] Koren, Y., Bell, R. and Volinsky, C. (2009). Matrix factorization techniques for recommender systems. Computer 42 30–37.
  • [14] Lange, K. L. and Carson, R. (1984). EM reconstruction algorithms for emission and transmission tomography. J. Comput. Assist. Tomogr. 8 306–316.
  • [15] Lange, K. L. (1995). A quasi-Newton acceleration of the EM algorithm. Statist. Sinica 5 1–18.
  • [16] Lange, K. L. (2004). Optimization. Springer, New York.
  • [17] Lange, K. L., Hunter, D. R. and Yang, I. (2000). Optimization transfer using surrogate objective functions (with discussion). J. Comput. Graph. Statist. 9 1–59.
  • [18] Lee, A., Yan, C., Giles, M. B., Doucet, A. and Holmes, C. C. (2009). On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Technical report, Dept. Statistics, Oxford Univ.
  • [19] Lee, D. D. and Seung, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401 788–791.
  • [20] Lee, D. D. and Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In NIPS 556–562. MIT Press, Cambridge, MA.
  • [21] Liu, C. and Rubin, D. B. (1994). The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence. Biometrika 81 633–648.
  • [22] McLachlan, G. J. and Krishnan, T. (2008). The EM Algorithm and Extensions, 2nd ed. Wiley, Hoboken, NJ.
  • [23] Meng, X. L. and Rubin, D. B. (1993). Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 80 267–278.
  • [24] Meng, X. L. and van Dyk, D. (1997). The EM algorithm—an old folk-song sung to a fast new tune (with discussion). J. Roy. Statist. Soc. Ser. B 59 511–567.
  • [25] MIT center for biological and computational learning. CBCL Face Database #1. Available at
  • [26] NVIDIA (2008). NVIDIA CUBLAS Library.
  • [27] NVIDIA (2008). NVIDIA CUDA Compute Unified Device Architecture: Programming Guide Version 2.0.
  • [28] Owens, J. D., Luebke, D., Govindaraju, N., Harris, M., Krüger, J., Lefohn, A. E. and Purcell, T. J. (2007). A survey of general-purpose computation on graphics hardware. Computer Graphics Forum 26 80–113.
  • [29] Ranola, J. M., Ahn, S., Sehl, M. E., Smith, D. J. and Lange, K. L. (2010). A Poisson model for random multigraphs. Bioinformatics 26 2004–2011.
  • [30] Roland, C., Varadhan, R. and Frangakis, C. E. (2007). Squared polynomial extrapolation methods with cycling: An application to the positron emission tomography problem. Numer. Algorithms 44 159–172.
  • [31] Silberstein, M., Schuster, A., Geiger, D., Patney, A. and Owens, J. D. (2008). Efficient computation of sum-products on GPUs through software-managed cache. In Proceedings of the 22nd Annual International Conference on Supercomputing 309–318. ACM, New York.
  • [32] Sinnott-Armstrong, N. A., Greene, C. S., Cancare, F. and Moore, J. H. (2009). Accelerating epistasis analysis in human genetics with consumer graphics hardware. BMC Res. Notes 2 149.
  • [33] Suchard, M. A. and Rambaut, A. (2009). Many-core algorithms for statistical phylogenetics. Bioinformatics 25 1370–1376.
  • [34] Suchard, M. A., Holmes, C. and West, M. (2010). Some of the what?, why?, how?, who and where? of graphics processing unit computing for Bayesian analysis. ISBA Bull. 17 12–16.
  • [35] Suchard, M. A, Wang, Q., Chan, C., Frelinger, A. and West, M. (2010). Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures. J. Comput. Graph. Statist. 19 418–438.
  • [36] Tibbits, M. M., Haran, M. and Liechty, J. C. (2010). Parallel multivariate slice sampling. Statist. Comput. To appear.
  • [37] Ueda, N. and Nakano, R. (1998). Deterministic annealing EM algorithm. Neural Networks 11 271–282.
  • [38] Varadhan, R. and Roland, C. (2008). Simple and globally convergent methods for accelerating the convergence of any EM algorithm. Scand. J. Statist. 35 335–353.
  • [39] Vardi, Y., Shepp, L. A. and Kaufman, L. (1985). A statistical model for positron emission tomography (with discussion). J. Amer. Statist. Assoc. 80 8–37.
  • [40] Wu, T. T. and Lange, K. L. (2010). The MM alternative to EM. Statist. Sci. To appear.
  • [41] Zhou, H. and Lange K. L. (2009). On the bumpy road to the dominant mode. Scand. J. Statist. DOI: 10.1111/j.1467-9469.2009.00681.x.
  • [42] Zhou, H., Alexander, D. and Lange, K. L. (2009). A quasi-Newton acceleration for high-dimensional optimization algorithms. Statist. Comput. DOI: 10.1007/s11222-009-9166-3.
  • [43] Zhou, H. and Lange, K. L. (2010). MM algorithms for some discrete multivariate distributions. J. Comput. Graph. Statist. 19 645–665.