Twitter is a popular medium for individuals to gather information and express opinions on topics of interest to them. By understanding who is interested in what topics, we can gauge the public mood, especially during periods of polarization such as elections. However, while the total volume of tweets may be huge, many people tweet rarely, and tweets are short and often noisy. Hence, directly inferring topics from tweets is both complicated and difficult to scale. Instead, the network structure of Twitter (who tweets at whom, who follows whom) can telegraph the interests of Twitter users. We propose the Producer-Consumer Model (PCM) to link latent topical interests of individuals to the directed structure of the network. A key component of PCM is the modeling of incentives of Twitter users. In particular, for a user to attract more followers and become popular, she must strive to be perceived as an expert on some topic. We use this to reduce the parameter space of PCM, greatly increasing its scalability. We apply PCM to track the evolution of Twitter topics during the Italian Elections of $2013$, and also to interpret those topics using hashtags. A secondary application of PCM to a citation network of machine learning papers is also shown. Extensive simulations and experiments with large real-world datasets demonstrate the accuracy and scalability of PCM.
Ann. Appl. Stat.
11(4):
2298-2331
(December 2017).
DOI: 10.1214/17-AOAS1079
Aiello, W., Chung, F. and Lu, L. (2000). A random graph model for massive graphs. In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing 171–180. ACM, New York.Aiello, W., Chung, F. and Lu, L. (2000). A random graph model for massive graphs. In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing 171–180. ACM, New York.
Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models. Soc. Netw. 33 41–55. DOI:10.1016/j.socnet.2010.09.004.Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models. Soc. Netw. 33 41–55. DOI:10.1016/j.socnet.2010.09.004.
Caldarelli, G., Chessa, A., Pammolli, F., Pompa, G., Puliga, M., Riccaboni, M. and Riotta, G. (2014). A multi-level geographical study of Italian political elections from Twitter data. PLoS ONE 9 e95809.Caldarelli, G., Chessa, A., Pammolli, F., Pompa, G., Puliga, M., Riccaboni, M. and Riotta, G. (2014). A multi-level geographical study of Italian political elections from Twitter data. PLoS ONE 9 e95809.
Caragea, C., Wu, J., Ciobanu, A., Williams, K., Fernández-Ramírez, J., Chen, H.-H., Wu, Z. and Giles, L. (2014). CiteSeerX: A scholarly big dataset. In Proceedings of the 36th European Conference on Information Retrieval (ECIR’14) 311–322.Caragea, C., Wu, J., Ciobanu, A., Williams, K., Fernández-Ramírez, J., Chen, H.-H., Wu, Z. and Giles, L. (2014). CiteSeerX: A scholarly big dataset. In Proceedings of the 36th European Conference on Information Retrieval (ECIR’14) 311–322.
Chakrabarti, D. (2017). Supplement to “Modeling node incentives in directed networks.” DOI:10.1214/17-AOAS1079SUPP.Chakrabarti, D. (2017). Supplement to “Modeling node incentives in directed networks.” DOI:10.1214/17-AOAS1079SUPP.
Chakrabarti, D. and Faloutsos, C. (2006). Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38 Article No. 2. DOI:10.1145/1132952.1132954.Chakrabarti, D. and Faloutsos, C. (2006). Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38 Article No. 2. DOI:10.1145/1132952.1132954.
Chakrabarti, D., Zhan, Y. and Faloutsos, C. (2004). R-MAT: A recursive model for graph mining. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM’04) 442–446.Chakrabarti, D., Zhan, Y. and Faloutsos, C. (2004). R-MAT: A recursive model for graph mining. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM’04) 442–446.
Chang, J. (2012). lda: Collapsed Gibbs sampling methods for topic models. Available at https://cran.r-project.org/web/packages/lda/index.html.Chang, J. (2012). lda: Collapsed Gibbs sampling methods for topic models. Available at https://cran.r-project.org/web/packages/lda/index.html.
Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435. DOI:10.1214/10-AAP728.Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435. DOI:10.1214/10-AAP728.
Fosdick, B. K. and Hoff, P. D. (2015). Testing and modeling dependencies between a network and nodal attributes. J. Amer. Statist. Assoc. 110 1047–1056.Fosdick, B. K. and Hoff, P. D. (2015). Testing and modeling dependencies between a network and nodal attributes. J. Amer. Statist. Assoc. 110 1047–1056.
Fu, W., Song, L. and Xing, E. P. (2009). Dynamic mixed membership blockmodel for evolving networks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09) 329–336.Fu, W., Song, L. and Xing, E. P. (2009). Dynamic mixed membership blockmodel for evolving networks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09) 329–336.
Gehrke, J., Ginsparg, P. and Kleinberg, J. (2003). Overview of the 2003 KDD cup. ACM SIGKDD Explor. Newsl. 5 149–151. DOI:10.1145/980972.980992.Gehrke, J., Ginsparg, P. and Kleinberg, J. (2003). Overview of the 2003 KDD cup. ACM SIGKDD Explor. Newsl. 5 149–151. DOI:10.1145/980972.980992.
Gopalan, P. K. and Blei, D. M. (2013). Efficient discovery of overlapping communities in massive networks. Proc. Natl. Acad. Sci. USA 110 14534–14539.Gopalan, P. K. and Blei, D. M. (2013). Efficient discovery of overlapping communities in massive networks. Proc. Natl. Acad. Sci. USA 110 14534–14539.
Handcock, M. S. and Jones, J. H. (2004). Likelihood-based inference for stochastic models of sexual network formation. Theor. Popul. Biol. 65 413–422. DOI:10.1016/j.tpb.2003.09.006.Handcock, M. S. and Jones, J. H. (2004). Likelihood-based inference for stochastic models of sexual network formation. Theor. Popul. Biol. 65 413–422. DOI:10.1016/j.tpb.2003.09.006.
Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99) 50–57.Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99) 50–57.
Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22 89–115. DOI:10.1145/963770.963774.Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22 89–115. DOI:10.1145/963770.963774.
Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65. MR608176 10.1080/01621459.1981.10477598Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65. MR608176 10.1080/01621459.1981.10477598
Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI’06) 381–388.Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI’06) 381–388.
Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08) 426–434.Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08) 426–434.
Krivitsky, P. N., Handcock, M. S., Raftery, A. E. and Hoff, P. D. (2009). Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc. Netw. 31 204–213. DOI:10.1016/j.socnet.2009.04.001.Krivitsky, P. N., Handcock, M. S., Raftery, A. E. and Hoff, P. D. (2009). Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc. Netw. 31 204–213. DOI:10.1016/j.socnet.2009.04.001.
Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237. MR3285605 10.1214/14-AOS1274 euclid.aos/1418135620
Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237. MR3285605 10.1214/14-AOS1274 euclid.aos/1418135620
Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C. and Ghahramani, Z. (2010). Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11 985–1042.Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C. and Ghahramani, Z. (2010). Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11 985–1042.
Miller, K. T., Griffiths, T. L. and Jordan, M. I. (2009). Nonparametric latent feature models for link prediction. In Advances in Neural Information Processing Systems 22 (NIPS’09) 1276–1284.Miller, K. T., Griffiths, T. L. and Jordan, M. I. (2009). Nonparametric latent feature models for link prediction. In Advances in Neural Information Processing Systems 22 (NIPS’09) 1276–1284.
Palla, K., Knowles, D. A. and Ghahramani, Z. (2012). An infinite latent attribute model for network data. In Proceedings of the 29th International Conference on Machine Learning (ICML’12) 1607–1614.Palla, K., Knowles, D. A. and Ghahramani, Z. (2012). An infinite latent attribute model for network data. In Proceedings of the 29th International Conference on Machine Learning (ICML’12) 1607–1614.
Raftery, A. E., Niu, X., Hoff, P. D. and Yeung, K. Y. (2012). Fast inference for the latent space network model using a case-control approximate likelihood. J. Comput. Graph. Statist. 21 901–919.Raftery, A. E., Niu, X., Hoff, P. D. and Yeung, K. Y. (2012). Fast inference for the latent space network model using a case-control approximate likelihood. J. Comput. Graph. Statist. 21 901–919.
Richardson, M., Agrawal, R. and Domingos, P. M. (2003). Trust management for the semantic web. In Proceedings of the 2nd International Semantic Web Conference (ISWC’03) 351–368.Richardson, M., Agrawal, R. and Domingos, P. M. (2003). Trust management for the semantic web. In Proceedings of the 2nd International Semantic Web Conference (ISWC’03) 351–368.
Salter-Townshend, M. and Murphy, T. B. (2013). Variational Bayesian inference for the latent position cluster model for network data. Comput. Statist. Data Anal. 57 661–671.Salter-Townshend, M. and Murphy, T. B. (2013). Variational Bayesian inference for the latent position cluster model for network data. Comput. Statist. Data Anal. 57 661–671.
Sarkar, P. and Moore, A. W. (2010). Fast nearest-neighbor search in disk-resident graphs. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10) 513–522.Sarkar, P. and Moore, A. W. (2010). Fast nearest-neighbor search in disk-resident graphs. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10) 513–522.
Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.
Wasserman, S. and Pattison, P. (1996). Logit models and logistic regressions for social networks. I. An introduction to Markov graphs and $p$. Psychometrika 61 401–425.Wasserman, S. and Pattison, P. (1996). Logit models and logistic regressions for social networks. I. An introduction to Markov graphs and $p$. Psychometrika 61 401–425.
Xu, Z., Tresp, V., Yu, K. and Kriegel, H. (2006). Infinite hidden relational models. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI’06) 544–551.Xu, Z., Tresp, V., Yu, K. and Kriegel, H. (2006). Infinite hidden relational models. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI’06) 544–551.
Yan, T., Leng, C. and Zhu, J. (2016). Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann. Statist. 44 31–57.Yan, T., Leng, C. and Zhu, J. (2016). Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann. Statist. 44 31–57.
Zhang, Y., Levina, E. and Zhu, J. (2014). Detecting overlapping communities in networks using spectral methods. ArXiv e-print. Available at https://arxiv.org/abs/1412.3432.Zhang, Y., Levina, E. and Zhu, J. (2014). Detecting overlapping communities in networks using spectral methods. ArXiv e-print. Available at https://arxiv.org/abs/1412.3432.