Brazilian Journal of Probability and Statistics

The coreset variational Bayes (CVB) algorithm for mixture analysis

Qianying Liu, Clare A. McGrory, and Peter W. J. Baxter

Full-text: Open access


The pressing need for improved methods for analysing and coping with big data has opened up a new area of research for statisticians. Image analysis is an area where there is typically a very large number of data points to be processed per image, and often multiple images are captured over time. These issues make it challenging to design methodology that is reliable and yet still efficient enough to be of practical use. One promising emerging approach for this problem is to reduce the amount of data that actually has to be processed by extracting what we call coresets from the full dataset; analysis is then based on the coreset rather than the whole dataset. Coresets are representative subsamples of data that are carefully selected via an adaptive sampling approach. We propose a new approach called coreset variational Bayes (CVB) for mixture modelling; this is an algorithm which can perform a variational Bayes analysis of a dataset based on just an extracted coreset of the data. We apply our algorithm to weed image analysis.

Article information

Braz. J. Probab. Stat., Volume 33, Number 2 (2019), 267-279.

Received: February 2017
Accepted: November 2017
First available in Project Euclid: 4 March 2019

Permanent link to this document

Digital Object Identifier

Mixture modelling coresets variational Bayes image analysis Bayesian statistics


Liu, Qianying; McGrory, Clare A.; Baxter, Peter W. J. The coreset variational Bayes (CVB) algorithm for mixture analysis. Braz. J. Probab. Stat. 33 (2019), no. 2, 267--279. doi:10.1214/17-BJPS387.

Export citation


  • Alston, C., Mengersen, K. and Pettitt, A. N. (2012). Case Studies in Bayesian Statistical Modelling and Analysis, 1st ed. New York: John Wiley & Sons.
  • Faes, C., Ormerod, J. and Wand, M. (2011). Variational Bayesian inference for parametric and nonparametric regression with missing data. Journal of the American Statistical Association 106, 959–971.
  • Feldman, D., Faulkner, M. and Krause, A. (2011). Scalable training of mixture models via coresets. In Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011 (J. Shawe-Taylor, R. S. Zemel, P. Bartlett, F. Pereira and K. Q. Weinberger, eds.) 2142–2150. NY: Curran Associates, Inc.
  • Fruhwirth-Schnatter, S. (2006). Finite Mixture and Markov Switching Models. New York: Springer.
  • Kargar, A. H. B. and Shirzadifar, A. M. (2013). Automatic weed detection system and smart herbicide sprayer robot for corn fields. In Proceeding of the 2013 RSI/ISMInternational Conference on Robotics and Mechatronics. February 13–15, Tehran, Iran.
  • Marin, J. M., Pudlo, P., Robert, C. P. and Ryder, R. (2012). Approximate Bayesian computation methods. Statistics and Computing 22, 1167–1180.
  • McGrory, C. A., Ahfock, D., Horsley, J. and Alston, C. L. Weighted Gibbs sampling for mixture modelling of massive datasets via coresets. Stat 3, 291–299.
  • McGrory, C. A., Pettitt, A. N., Reeves, R., Griffin, M. and Dwyer, M. (2012). Variational Bayes and the reduced dependence approximation for the autologistic model on an irregular grid with applications. Journal of Computational and Graphical Statistics 21, 781–796.
  • McGrory, C. A. and Titterington, D. M. (2007). Variational approximations in Bayesian model selection for finite mixture distributions. Computational Statistics & Data Analysis 51, 5352–5367.
  • Ormerod, J. T. and Wand, M. P. (2010). Explaining variational approximations. American Statistician 64, 140–153.
  • Sindenab, J., Jonesbc, R., Hesterba, S., Odomba, D., Kalischda, C., Jamese, R. and Cacho, O. (2004). The Economic Impact of Weeds in Australia. CRC for Australian Weed Management Technical Series No. 8.
  • Suchard, M. A., Wang, Q., Chan, C., Frelinger, J., Cron, A. J. and West, M. (2010). Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures. Journal of Computational and Graphical Statistics 19, 419–438.
  • Wand, M., Ormerod, J., Padoan, S. and Fruhwirth, R. (2012). Mean field variational Bayes for elaborate distributions. Bayesian Analysis 6, 847–900.
  • Zimdahl, R. (2009). Fundamentals of Weed Science. San Diego: Academic Press.