The Annals of Applied Statistics

Marked self-exciting point process modelling of information diffusion on Twitter

Feng Chen and Wai Hong Tan

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Information diffusion occurs on microblogging platforms like Twitter as retweet cascades. When a tweet is posted, it may be retweeted and henceforth further retweeted, and the retweeting process continues iteratively and indefinitely. A natural measure of the popularity of a tweet is the number of retweets it generates. Accurate predictions of tweet popularity can assist Twitter to rank contents more effectively and facilitate the assessment of potential for marketing and campaigning strategies. In this paper, we propose a model called the Marked Self-Exciting Process with Time-Dependent Excitation Function, or MaSEPTiDE for short, to model the retweeting dynamics and to predict the tweet popularity. Our model does not require expensive feature engineering but is capable of leveraging the observed dynamics to accurately predict the future evolution of retweet cascades. We apply our proposed methodology on a large amount of Twitter data and report substantial improvement in prediction performance over existing approaches in the literature.

Article information

Source
Ann. Appl. Stat., Volume 12, Number 4 (2018), 2175-2196.

Dates
Received: August 2017
Revised: February 2018
First available in Project Euclid: 13 November 2018

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1542078041

Digital Object Identifier
doi:10.1214/18-AOAS1148

Mathematical Reviews number (MathSciNet)
MR3875697

Keywords
B-spline forecast Hawkes process integral equation nonstationary self-exciting point process popularity prediction simulation

Citation

Chen, Feng; Tan, Wai Hong. Marked self-exciting point process modelling of information diffusion on Twitter. Ann. Appl. Stat. 12 (2018), no. 4, 2175--2196. doi:10.1214/18-AOAS1148. https://projecteuclid.org/euclid.aoas/1542078041


Export citation

References

  • Agarwal, D., Chen, B.-C. and Elango, P. (2009). Spatio-temporal models for estimating click-through rate. In Proceedings of the 18th International Conference on World Wide Web 21–30. ACM, New York.
  • Ahmed, M., Spagna, S., Huici, F. and Niccolini, S. (2013). A peek into the future: Predicting the evolution of popularity in user generated content. In Proceedings of the Sixth ACM International Conference on Web Search and Data Mining 607–616. ACM, New York.
  • Alves, R. A., Assunção, R. and de Melo, P. O. (2016). Burstiness scale: A highly parsimonious model for characterizing random series of events. Preprint. Availble at arXiv:1602.06431.
  • Bakshy, E., Hofman, J. M., Mason, W. A. and Watts, D. J. (2011). Everyone’s an influencer: Quantifying influence on Twitter. In Proceedings of the Fourth ACM International Conference on Web Search and Data Mining 65–74. ACM, New York.
  • Barabasi, A.-L. (2005). The origin of bursts and heavy tails in human dynamics. Nature 435 207–211.
  • Cha, M., Haddadi, H., Benevenuto, F. and Gummadi, P. K. (2010). Measuring user influence in Twitter: The million follower fallacy. In Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM-2010) 10–17. AAAI Press, Palo Alto, CA.
  • Chen, F. and Hall, P. (2013). Inference for a nonstationary self-exciting point process with an application in ultra-high frequency financial data modeling. J. Appl. Probab. 50 1006–1024.
  • Chen, F. and Hall, P. (2016). Nonparametric estimation for self-exciting point processes—A parsimonious approach. J. Comput. Graph. Statist. 25 209–224.
  • Chen, F. and Stindl, T. (2018). Direct likelihood evaluation for the renewal Hawkes process. J. Comput. Graph. Statist. 27 119–131.
  • Crane, R. and Sornette, D. (2008). Robust dynamic classes revealed by measuring the response function of a social system. Proc. Natl. Acad. Sci. USA 105 15649–15653.
  • Daley, D. J. and Vere-Jones, D. (2003). An Introduction to the Theory of Point Processes, Vol. I: Elementary Theory and Methods, 2nd ed. Springer, New York.
  • Fox, E. W., Short, M. B., Schoenberg, F. P., Coronges, K. D. and Bertozzi, A. L. (2016). Modeling e-mail networks and inferring leadership using self-exciting point processes. J. Amer. Statist. Assoc. 111 564–584.
  • Gao, S., Ma, J. and Chen, Z. (2015). Modeling and predicting retweeting dynamics on microblogging platforms. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining 107–116. ACM, New York.
  • Gneiting, T. (2011). Making and evaluating point forecasts. J. Amer. Statist. Assoc. 106 746–762.
  • Hawkes, A. G. (1971). Spectra of some self-exciting and mutually exciting point processes. Biometrika 58 83–90.
  • Kobayashi, R. and Lambiotte, R. (2016). TiDeH: Time-dependent Hawkes process for predicting retweet dynamics. In Proceedings of the Tenth International AAAI Conference on Web and Social Media (ICWSM-2016) 191–200. The AAAI Press, Palo Alto, CA.
  • Kwak, H., Lee, C., Park, H. and Moon, S. (2010). What is Twitter, a social network or a news media? In Proceedings of the 19th International Conference on World Wide Web 591–600. ACM, New York.
  • Lewis, P. A. W. and Shedler, G. S. (1979). Simulation of nonhomogeneous Poisson processes by thinning. Nav. Res. Logist. Q. 26 403–413.
  • Li, C.-T., Shan, M.-K., Jheng, S.-H. and Chou, K.-C. (2016). Exploiting concept drift to predict popularity of social multimedia in microblogs. Inform. Sci. 339 310–331.
  • Lymperopoulos, I. N. (2016). Predicting the popularity growth of online content: Model and algorithm. Inform. Sci. 369 585–613.
  • Matsubara, Y., Sakurai, Y., Prakash, B. A., Li, L. and Faloutsos, C. (2012). Rise and fall patterns of information diffusion: Model and implications. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 6–14. ACM, New York.
  • Mishra, S., Rizoiu, M.-A. and Xie, L. (2016). Feature driven and point process approaches for popularity prediction. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management 1069–1078. ACM, New York.
  • Naveed, N., Gottron, T., Kunegis, J. and Alhadi, A. C. (2011). Bad news travel fast: A content-based analysis of interestingness on Twitter. In Proceedings of the 3rd International Web Science Conference Art. ID 8. ACM, New York.
  • Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. Comput. J. 7 308–313.
  • Ogata, Y. (1988). Statistical models for earthquake occurrences and residual analysis for point processes. J. Amer. Statist. Assoc. 83 9–27.
  • R Core Team (2016). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing Vienna, Austria.
  • Szabo, G. and Huberman, B. A. (2010). Predicting the popularity of online content. Commun. ACM 53 80–88.
  • Tumasjan, A., Sprenger, T. O., Sandner, P. G. and Welpe, I. M. (2010). Predicting elections with Twitter: What 140 characters reveal about political sentiment. In Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM-2010) 178–185. AAAI Press, Palo Alto, CA.
  • Wu, B., Cheng, W.-H., Zhang, Y. and Mei, T. (2016). Time matters: Multi-scale temporalization of social media popularity. In Proceedings of the 2016 ACM on Multimedia Conference 1336–1344. ACM, New York.
  • Yan, Y., Tan, Z., Gao, X., Tang, S. and Chen, G. (2016). STH-Bass: A spatial-temporal heterogeneous bass model to predict single-tweet popularity. In International Conference on Database Systems for Advanced Applications 18–32. Springer, Cham.
  • Zaman, T., Fox, E. B. and Bradlow, E. T. (2014). A Bayesian approach for predicting the popularity of tweets. Ann. Appl. Stat. 8 1583–1611.
  • Zhao, Q., Erdogdu, M. A., He, H. Y., Rajaraman, A. and Leskovec, J. (2015). SEISMIC: A self-exciting point process model for predicting tweet popularity. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 1513–1522. ACM, New York.