The Annals of Applied Statistics

A Bayesian approach for predicting the popularity of tweets

Tauhid Zaman, Emily B. Fox, and Eric T. Bradlow

Full-text: Open access

Abstract

We predict the popularity of short messages called tweets created in the micro-blogging site known as Twitter. We measure the popularity of a tweet by the time-series path of its retweets, which is when people forward the tweet to others. We develop a probabilistic model for the evolution of the retweets using a Bayesian approach, and form predictions using only observations on the retweet times and the local network or “graph” structure of the retweeters. We obtain good step ahead forecasts and predictions of the final total number of retweets even when only a small fraction (i.e., less than one tenth) of the retweet path is observed. This translates to good predictions within a few minutes of a tweet being posted, and has potential implications for understanding the spread of broader ideas, memes or trends in social networks.

Article information

Source
Ann. Appl. Stat., Volume 8, Number 3 (2014), 1583-1611.

Dates
First available in Project Euclid: 23 October 2014

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1414091226

Digital Object Identifier
doi:10.1214/14-AOAS741

Mathematical Reviews number (MathSciNet)
MR3271345

Zentralblatt MATH identifier
1303.62048

Keywords
Social networks Twitter Bayesian inference time series forecasting

Citation

Zaman, Tauhid; Fox, Emily B.; Bradlow, Eric T. A Bayesian approach for predicting the popularity of tweets. Ann. Appl. Stat. 8 (2014), no. 3, 1583--1611. doi:10.1214/14-AOAS741. https://projecteuclid.org/euclid.aoas/1414091226


Export citation

References

  • Agarwal, D., Chen, B. and Elango, P. (2009). Spatial–temporal models for estimating click-through rates. Unpublished manuscript.
  • Bakshy, E., Hofman, J. M., Mason, W. A. and Watts, D. J. (2010). Everyone’s an influencer: Quantifying influence on Twitter. In Proc. WSDM. ACM, New York.
  • Bandari, R., Asur, S. and Huberman, B. A. (2012). The pulse of news in social media: Forecasting popularity. In AAAI Conference on Weblogs and Social Media. AAAI, Dublin, Ireland.
  • Brown, L., Gans, N., Mandelbaum, A., Sakov, A., Shen, H., Zeltyn, S. and Zhao, L. (2005). Statistical analysis of a telephone call center: A queueing-science perspective. J. Amer. Statist. Assoc. 100 36–50.
  • Cha, M., Haddadi, H., Benevenuto, F. and Gummadi, K. P. (2010). Measuring user influence in Twitter: The million follower fallacy. In Proc. AAAI Conf. on Weblogs and Social Media. AAAI, Washington, DC.
  • Gelman, A. and Hill, H. (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge Univ. Press, Cambridge.
  • Gelman, A. and Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statist. Sci. 7 457–472.
  • Goel, S., Watts, D. J. and Goldstein, D. G. (2012). The structure of online diffusion networks. In Proc. EC. ACM, New York.
  • Hong, L., Dan, O. and Davison, B. D. (2011). Predicting popular messages in Twitter. In Proceedings of the 20th International Conference Companion on World Wide Web 57–58. ACM, New York.
  • Kwak, H., Lee, C., Park, H. and Moon, S. (2010). What is Twitter, a social network or a news media? In Proc. WWW. ACM, New York.
  • Naveed, N., Gottron, T., Kunegis, J. and Alhadi, A. C. (2011). Bad news travels fast: A content-based analysis of interestingness on Twitter. In ACM Web Science. ACM, New York.
  • Petrovic, S., Osborne, M. and Lavrenko, V. (2011). RT to win! Prediction message popularity in Twitter. In AAAI Conference on Weblogs and Social Media. AAAI, Barcelona. Spain.
  • Spiegelhalter, D. J., Best, N. G., Carlin, B. P. and van der Linde, A. (2002). Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 583–639.
  • Stouffer, D. B., Malmgren, R. D. and Amaral, L. A. N. (2006). Log-normal statistics in e-mail communication patterns. Available at ArXiv:physics/0605027.
  • Suh, B., Hong, L., Pirolli, P. and Chi, E. H. (2010). Want to be rewteeted? Large scale analysis on factors impacting retweet in Twitter network. In IEEE International Conference on Social Computing 177–184. IEEE, Minneapolis, MN.
  • Szabo, G. and Huberman, B. A. (2010). Predicting the popularity of online content. Commun. ACM 8 80–88.
  • Twitter (2012). Using the Twitter search API. Available at https://dev.twitter.com/docs/using-search.
  • Ulrich, R. and Miller, J. (1993). Information processing models generating lognormally distributed reaction times. J. Math. Psych. 37 513–525.
  • US Securities and Exchange Commission (2013). Twitter, Inc. Form S-1. Available at http://www.sec.gov/Archives/edgar/data/1418091/000119312513424260/d564001ds1a.htm.
  • van Breukelen, G. J. P. (1995). Theoretical note: Parallel information processing models compatible with lognormally distributed response times. J. Math. Psych. 39 396–399.
  • Vu, D. Q., Asuncion, A. U., Hunter, D. R. and Smyth, P. (2011). Dynamic egocentric models for citation networks. In International Conference on Machine Learning. ACM, New York.
  • Zaman, T., Fox, E. B. and Bradlow, E. T. (2014). Supplement to “A Bayesian approach for predicting the popularity of tweetss.” DOI:10.1214/14-AOAS741SUPP.
  • Zaman, T., Herbrich, R., Gael, J. V. and Stern, D. (2010). Predicting information spreading in Twitter. In Proc. Workshop on Computational Social Science and the Wisdom of Crowds, NIPS. NIPS, Vancouver, Canada.
  • Zhou, Z., Bandari, R., Kong, J., Qian, H. and Roychowdhury, V. (2010). Information resonance on Twitter: Watching Iran. In ACM Workshop on Social Media Analytics 123–131. ACM, New York.

Supplemental materials

  • Supplementary material: Retweet time series data. These files contain the data of the retweet time series for the root tweets studied in this paper. They also include the files which contain the different partitions of the tweets into training and prediction sets used for the analysis in this paper.