Modeling node incentives in directed networks

Deepayan Chakrabarti

doi:10.1214/17-AOAS1079

December 2017 Modeling node incentives in directed networks

Deepayan Chakrabarti

Ann. Appl. Stat. 11(4): 2298-2331 (December 2017). DOI: 10.1214/17-AOAS1079

Abstract

Twitter is a popular medium for individuals to gather information and express opinions on topics of interest to them. By understanding who is interested in what topics, we can gauge the public mood, especially during periods of polarization such as elections. However, while the total volume of tweets may be huge, many people tweet rarely, and tweets are short and often noisy. Hence, directly inferring topics from tweets is both complicated and difficult to scale. Instead, the network structure of Twitter (who tweets at whom, who follows whom) can telegraph the interests of Twitter users. We propose the Producer-Consumer Model (PCM) to link latent topical interests of individuals to the directed structure of the network. A key component of PCM is the modeling of incentives of Twitter users. In particular, for a user to attract more followers and become popular, she must strive to be perceived as an expert on some topic. We use this to reduce the parameter space of PCM, greatly increasing its scalability. We apply PCM to track the evolution of Twitter topics during the Italian Elections of $2013$, and also to interpret those topics using hashtags. A secondary application of PCM to a citation network of machine learning papers is also shown. Extensive simulations and experiments with large real-world datasets demonstrate the accuracy and scalability of PCM.

References

1.

Adamic, L. and Adar, E. (2003). Friends and neighbors on the Web. Soc. Netw. 25 211–230.Adamic, L. and Adar, E. (2003). Friends and neighbors on the Web. Soc. Netw. 25 211–230.

2.

Aiello, W., Chung, F. and Lu, L. (2000). A random graph model for massive graphs. In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing 171–180. ACM, New York.Aiello, W., Chung, F. and Lu, L. (2000). A random graph model for massive graphs. In Proceedings of the Thirty-Second Annual ACM Symposium on Theory of Computing 171–180. ACM, New York.

3.

Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.Airoldi, E. M., Blei, D. M., Fienberg, S. E. and Xing, E. P. (2008). Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9 1981–2014.

4.

Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.Blei, D. M., Ng, A. Y. and Jordan, M. I. (2003). Latent Dirichlet allocation. J. Mach. Learn. Res. 3 993–1022.

5.

Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models. Soc. Netw. 33 41–55. DOI:10.1016/j.socnet.2010.09.004.Caimo, A. and Friel, N. (2011). Bayesian inference for exponential random graph models. Soc. Netw. 33 41–55. DOI:10.1016/j.socnet.2010.09.004.

6.

Caldarelli, G., Chessa, A., Pammolli, F., Pompa, G., Puliga, M., Riccaboni, M. and Riotta, G. (2014). A multi-level geographical study of Italian political elections from Twitter data. PLoS ONE 9 e95809.Caldarelli, G., Chessa, A., Pammolli, F., Pompa, G., Puliga, M., Riccaboni, M. and Riotta, G. (2014). A multi-level geographical study of Italian political elections from Twitter data. PLoS ONE 9 e95809.

7.

Caragea, C., Wu, J., Ciobanu, A., Williams, K., Fernández-Ramírez, J., Chen, H.-H., Wu, Z. and Giles, L. (2014). CiteSeerX: A scholarly big dataset. In Proceedings of the 36th European Conference on Information Retrieval (ECIR’14) 311–322.Caragea, C., Wu, J., Ciobanu, A., Williams, K., Fernández-Ramírez, J., Chen, H.-H., Wu, Z. and Giles, L. (2014). CiteSeerX: A scholarly big dataset. In Proceedings of the 36th European Conference on Information Retrieval (ECIR’14) 311–322.

8.

Chakrabarti, D. (2017). Supplement to “Modeling node incentives in directed networks.” DOI:10.1214/17-AOAS1079SUPP.Chakrabarti, D. (2017). Supplement to “Modeling node incentives in directed networks.” DOI:10.1214/17-AOAS1079SUPP.

9.

Chakrabarti, D. and Faloutsos, C. (2006). Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38 Article No. 2. DOI:10.1145/1132952.1132954.Chakrabarti, D. and Faloutsos, C. (2006). Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38 Article No. 2. DOI:10.1145/1132952.1132954.

10.

Chakrabarti, D., Zhan, Y. and Faloutsos, C. (2004). R-MAT: A recursive model for graph mining. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM’04) 442–446.Chakrabarti, D., Zhan, Y. and Faloutsos, C. (2004). R-MAT: A recursive model for graph mining. In Proceedings of the 4th SIAM International Conference on Data Mining (SDM’04) 442–446.

11.

Chang, J. (2012). lda: Collapsed Gibbs sampling methods for topic models. Available at https://cran.r-project.org/web/packages/lda/index.html.Chang, J. (2012). lda: Collapsed Gibbs sampling methods for topic models. Available at https://cran.r-project.org/web/packages/lda/index.html.

12.

Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435. DOI:10.1214/10-AAP728.Chatterjee, S., Diaconis, P. and Sly, A. (2011). Random graphs with a given degree sequence. Ann. Appl. Probab. 21 1400–1435. DOI:10.1214/10-AAP728.

13.

Duijn, M. A., Snijders, T. A. and Zijlstra, B. J. (2004). P2: A random effects model with covariates for directed graphs. Stat. Neerl. 58 234–254.Duijn, M. A., Snijders, T. A. and Zijlstra, B. J. (2004). P2: A random effects model with covariates for directed graphs. Stat. Neerl. 58 234–254.

14.

Erdős, P. and Rényi, A. (1959). On random graphs. I. Publ. Math. Debrecen 6 290–297.Erdős, P. and Rényi, A. (1959). On random graphs. I. Publ. Math. Debrecen 6 290–297.

15.

Fosdick, B. K. and Hoff, P. D. (2015). Testing and modeling dependencies between a network and nodal attributes. J. Amer. Statist. Assoc. 110 1047–1056.Fosdick, B. K. and Hoff, P. D. (2015). Testing and modeling dependencies between a network and nodal attributes. J. Amer. Statist. Assoc. 110 1047–1056.

16.

Frank, O. and Strauss, D. (1986). Markov graphs. J. Amer. Statist. Assoc. 81 832–842.Frank, O. and Strauss, D. (1986). Markov graphs. J. Amer. Statist. Assoc. 81 832–842.

17.

Fu, W., Song, L. and Xing, E. P. (2009). Dynamic mixed membership blockmodel for evolving networks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09) 329–336.Fu, W., Song, L. and Xing, E. P. (2009). Dynamic mixed membership blockmodel for evolving networks. In Proceedings of the 26th Annual International Conference on Machine Learning (ICML’09) 329–336.

18.

Gehrke, J., Ginsparg, P. and Kleinberg, J. (2003). Overview of the 2003 KDD cup. ACM SIGKDD Explor. Newsl. 5 149–151. DOI:10.1145/980972.980992.Gehrke, J., Ginsparg, P. and Kleinberg, J. (2003). Overview of the 2003 KDD cup. ACM SIGKDD Explor. Newsl. 5 149–151. DOI:10.1145/980972.980992.

19.

Gilbert, E. N. (1959). Random graphs. Ann. Math. Stat. 30 1141–1144.Gilbert, E. N. (1959). Random graphs. Ann. Math. Stat. 30 1141–1144.

20.

Gopalan, P. K. and Blei, D. M. (2013). Efficient discovery of overlapping communities in massive networks. Proc. Natl. Acad. Sci. USA 110 14534–14539.Gopalan, P. K. and Blei, D. M. (2013). Efficient discovery of overlapping communities in massive networks. Proc. Natl. Acad. Sci. USA 110 14534–14539.

21.

Handcock, M. S. and Jones, J. H. (2004). Likelihood-based inference for stochastic models of sexual network formation. Theor. Popul. Biol. 65 413–422. DOI:10.1016/j.tpb.2003.09.006.Handcock, M. S. and Jones, J. H. (2004). Likelihood-based inference for stochastic models of sexual network formation. Theor. Popul. Biol. 65 413–422. DOI:10.1016/j.tpb.2003.09.006.

22.

Hoff, P. D. (2005). Bilinear mixed-effects models for dyadic data. J. Amer. Statist. Assoc. 100 286–295.Hoff, P. D. (2005). Bilinear mixed-effects models for dyadic data. J. Amer. Statist. Assoc. 100 286–295.

23.

Hoff, P. D. (2009). Multiplicative latent factor models for description and prediction of social networks. Comput. Math. Organ. Theory 15 261–272.Hoff, P. D. (2009). Multiplicative latent factor models for description and prediction of social networks. Comput. Math. Organ. Theory 15 261–272.

24.

Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.Hoff, P. D., Raftery, A. E. and Handcock, M. S. (2002). Latent space approaches to social network analysis. J. Amer. Statist. Assoc. 97 1090–1098.

25.

Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99) 50–57.Hofmann, T. (1999). Probabilistic latent semantic indexing. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’99) 50–57.

26.

Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22 89–115. DOI:10.1145/963770.963774.Hofmann, T. (2004). Latent semantic models for collaborative filtering. ACM Trans. Inf. Syst. 22 89–115. DOI:10.1145/963770.963774.

27.

Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.Holland, P. W., Laskey, K. B. and Leinhardt, S. (1983). Stochastic blockmodels: First steps. Soc. Netw. 5 109–137.

28.

Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65. MR608176 10.1080/01621459.1981.10477598Holland, P. W. and Leinhardt, S. (1981). An exponential family of probability distributions for directed graphs. J. Amer. Statist. Assoc. 76 33–65. MR608176 10.1080/01621459.1981.10477598

29.

Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks. J. Comput. Graph. Statist. 15 565–583.Hunter, D. R. and Handcock, M. S. (2006). Inference in curved exponential family models for networks. J. Comput. Graph. Statist. 15 565–583.

30.

Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107.Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107.

31.

Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika 18 39–43.Katz, L. (1953). A new status index derived from sociometric analysis. Psychometrika 18 39–43.

32.

Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI’06) 381–388.Kemp, C., Tenenbaum, J. B., Griffiths, T. L., Yamada, T. and Ueda, N. (2006). Learning systems of concepts with an infinite relational model. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI’06) 381–388.

33.

Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08) 426–434.Koren, Y. (2008). Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’08) 426–434.

34.

Krivitsky, P. N., Handcock, M. S., Raftery, A. E. and Hoff, P. D. (2009). Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc. Netw. 31 204–213. DOI:10.1016/j.socnet.2009.04.001.Krivitsky, P. N., Handcock, M. S., Raftery, A. E. and Hoff, P. D. (2009). Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc. Netw. 31 204–213. DOI:10.1016/j.socnet.2009.04.001.

35.

Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237. MR3285605 10.1214/14-AOS1274 euclid.aos/1418135620 Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. Ann. Statist. 43 215–237. MR3285605 10.1214/14-AOS1274 euclid.aos/1418135620

36.

Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C. and Ghahramani, Z. (2010). Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11 985–1042.Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C. and Ghahramani, Z. (2010). Kronecker graphs: An approach to modeling networks. J. Mach. Learn. Res. 11 985–1042.

37.

Miller, K. T., Griffiths, T. L. and Jordan, M. I. (2009). Nonparametric latent feature models for link prediction. In Advances in Neural Information Processing Systems 22 (NIPS’09) 1276–1284.Miller, K. T., Griffiths, T. L. and Jordan, M. I. (2009). Nonparametric latent feature models for link prediction. In Advances in Neural Information Processing Systems 22 (NIPS’09) 1276–1284.

38.

Palla, K., Knowles, D. A. and Ghahramani, Z. (2012). An infinite latent attribute model for network data. In Proceedings of the 29th International Conference on Machine Learning (ICML’12) 1607–1614.Palla, K., Knowles, D. A. and Ghahramani, Z. (2012). An infinite latent attribute model for network data. In Proceedings of the 29th International Conference on Machine Learning (ICML’12) 1607–1614.

39.

Raftery, A. E., Niu, X., Hoff, P. D. and Yeung, K. Y. (2012). Fast inference for the latent space network model using a case-control approximate likelihood. J. Comput. Graph. Statist. 21 901–919.Raftery, A. E., Niu, X., Hoff, P. D. and Yeung, K. Y. (2012). Fast inference for the latent space network model using a case-control approximate likelihood. J. Comput. Graph. Statist. 21 901–919.

40.

Richardson, M., Agrawal, R. and Domingos, P. M. (2003). Trust management for the semantic web. In Proceedings of the 2nd International Semantic Web Conference (ISWC’03) 351–368.Richardson, M., Agrawal, R. and Domingos, P. M. (2003). Trust management for the semantic web. In Proceedings of the 2nd International Semantic Web Conference (ISWC’03) 351–368.

41.

Salter-Townshend, M. and Murphy, T. B. (2013). Variational Bayesian inference for the latent position cluster model for network data. Comput. Statist. Data Anal. 57 661–671.Salter-Townshend, M. and Murphy, T. B. (2013). Variational Bayesian inference for the latent position cluster model for network data. Comput. Statist. Data Anal. 57 661–671.

42.

Sarkar, P. and Moore, A. W. (2010). Fast nearest-neighbor search in disk-resident graphs. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10) 513–522.Sarkar, P. and Moore, A. W. (2010). Fast nearest-neighbor search in disk-resident graphs. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’10) 513–522.

43.

Shalizi, C. R. and Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. Ann. Statist. 41 508–535.Shalizi, C. R. and Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. Ann. Statist. 41 508–535.

44.

Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.Snijders, T. A. B. and Nowicki, K. (1997). Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classification 14 75–100.

45.

Vu, D. Q., Hunter, D. R. and Schweinberger, M. (2013). Model-based clustering of large networks. Ann. Appl. Stat. 7 1010–1039.Vu, D. Q., Hunter, D. R. and Schweinberger, M. (2013). Model-based clustering of large networks. Ann. Appl. Stat. 7 1010–1039.

46.

Wang, Y. J. and Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. J. Amer. Statist. Assoc. 82 8–19.Wang, Y. J. and Wong, G. Y. (1987). Stochastic blockmodels for directed graphs. J. Amer. Statist. Assoc. 82 8–19.

47.

Wasserman, S. and Pattison, P. (1996). Logit models and logistic regressions for social networks. I. An introduction to Markov graphs and $p$. Psychometrika 61 401–425.Wasserman, S. and Pattison, P. (1996). Logit models and logistic regressions for social networks. I. An introduction to Markov graphs and $p$. Psychometrika 61 401–425.

48.

Xu, Z., Tresp, V., Yu, K. and Kriegel, H. (2006). Infinite hidden relational models. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI’06) 544–551.Xu, Z., Tresp, V., Yu, K. and Kriegel, H. (2006). Infinite hidden relational models. In Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence (UAI’06) 544–551.

49.

Yan, T., Leng, C. and Zhu, J. (2016). Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann. Statist. 44 31–57.Yan, T., Leng, C. and Zhu, J. (2016). Asymptotics in directed exponential random graph models with an increasing bi-degree sequence. Ann. Statist. 44 31–57.

50.

Zhang, Y., Levina, E. and Zhu, J. (2014). Detecting overlapping communities in networks using spectral methods. ArXiv e-print. Available at https://arxiv.org/abs/1412.3432.Zhang, Y., Levina, E. and Zhu, J. (2014). Detecting overlapping communities in networks using spectral methods. ArXiv e-print. Available at https://arxiv.org/abs/1412.3432.

Citation Download Citation

Deepayan Chakrabarti "Modeling node incentives in directed networks," The Annals of Applied Statistics 11(4), 2298-2331, (December 2017). https://doi.org/10.1214/17-AOAS1079

Received: 1 May 2016; Published: December 2017

Access the abstract

JOURNAL ARTICLE
34 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY