The Annals of Applied Statistics

Estimating links of a network from time to event data

Tso-Jung Yen, Zong-Rong Lee, Yi-Hau Chen, Yu-Min Yen, and Jing-Shiang Hwang

Full-text: Open access

Abstract

In this paper we develop a statistical method for identifying links of a network from time to event data. This method models the hazard function of a node conditional on event time of other nodes, parameterizing the conditional hazard function with the links of the network. It then estimates the hazard function by maximizing a pseudo partial likelihood function with parameters subject to a user-specified penalty function and additional constraints. To make such estimation robust, it adopts a pre-specified risk control on the number of false discovered links by using the Stability Selection method. Simulation study shows that under this hybrid procedure, the number of false discovered links is tightly controlled while the true links are well recovered. We apply our method to estimate a political cohesion network that drives donation behavior of 146 firms from the data collected during the 2008 Taiwanese legislative election. The results show that firms affiliated with elite organizations or firms of monopoly are more likely to diffuse donation behavior. In contrast, firms belonging to technology industry are more likely to act independently on donation.

Article information

Source
Ann. Appl. Stat. Volume 11, Number 3 (2017), 1429-1451.

Dates
Received: June 2016
Revised: March 2017
First available in Project Euclid: 5 October 2017

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1507168835

Digital Object Identifier
doi:10.1214/17-AOAS1032

Keywords
Hazard network models right-censored data partial likelihood function stability selection political cohesion networks

Citation

Yen, Tso-Jung; Lee, Zong-Rong; Chen, Yi-Hau; Yen, Yu-Min; Hwang, Jing-Shiang. Estimating links of a network from time to event data. Ann. Appl. Stat. 11 (2017), no. 3, 1429--1451. doi:10.1214/17-AOAS1032. https://projecteuclid.org/euclid.aoas/1507168835


Export citation

References

  • Amhed, A. and Xing, E. (2009). Recovering time-varying networks of dependencies in social and biological studies. Proc. Natl. Acad. Sci. USA 106 11878–11883.
  • Anandkumar, A., Tan, V. Y. F., Huang, F. and Willsky, A. S. (2012). High-dimensional structure estimation in Ising models: Local separation criterion. Ann. Statist. 40 1346–1375.
  • Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
  • Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509–512.
  • Burris, V. (2005). Interlocking directorates and political cohesion among corporate elites. Am. J. Sociol. 111 249–283.
  • Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
  • Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2012). Latent variable graphical model selection via convex optimization. Ann. Statist. 40 1935–1967.
  • Chu, Y.-H. (1994). The realignment of business-government relations and regime transition in Taiwan. In Business and Government in Industrialising Asia (A. MacIntyre, ed.). Cornell Univ. Press, Ithaca, NY.
  • Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal Complex Systems 1695.
  • Danaher, P., Wang, P. and Witten, D. M. (2014). The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 373–397.
  • Daneshmand, H., Gomez-Rodriguez, M., Song, L. and Schölkopf, B. (2014). Estimating diffusion networks: Recovery conditions, sample complexity & soft-thresholding algorithm. In Proceedings of the 31 st International Conference on Machine Learning. Beijing, China.
  • Freeman, L. (1977). A set of measures of centrality based on betweenness. Sociometry 40 35–41.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Gomez-Rodriguez, M., Leskovec, J. and Krause, A. (2012). Inferring networks of diffusion and influence. ACM Trans. Knowl. Discov. Data 5.
  • Homans, G. C. (1950). The Human Group. Harcourt, New York.
  • Khare, K., Oh, S.-Y. and Rajaratnam, B. (2015). A convex pseudolikelihood framework for high dimensional partial correlation estimation with convergence guarantees. J. R. Stat. Soc. Ser. B. Stat. Methodol. 77 803–825.
  • Lafferty, J., Liu, H. and Wasserman, L. (2012). Sparse nonparametric graphical models. Statist. Sci. 27 519–537.
  • Laumann, E. O., Marsden, P. V. and Prensky, D. (1982). The boundary specification problem in network analysis. In Applied Network Analysis (R. S. Burt, M. Minor, eds.) 18–34. Sage, Beverly Hills, CA.
  • Lee, Z.-R. (2016). Corporate power and democracy: An analysis of business groups’ campaign contributions in the 2008 legislator election. Taiwanese Sociology 31 43–83.
  • Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40 2293–2326.
  • Loh, P.-L. and Wainwright, M. J. (2013). Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses. Ann. Statist. 41 3022–3049.
  • McPherson, M., Smith-Lovin, L. and Cook, J. M. (2001). Bird of a feather: Homophily in social networks. Annu. Rev. Sociol. 27 415–444.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
  • Mizruchi, M. S. (1989). Similarity of political behavior among large American corporations. Am. J. Sociol. 95 401–424.
  • Mizruchi, M. S. (1992). The Structure of Corporate Political Action: Interfirm Relations and Their Consequences. Harvard Univ. Press, Cambridge, MA.
  • Newman, M. E. J. (2002). Assortative mixing in networks. Phys. Rev. Lett. 89 208701.
  • Numazaki, I. (1986). Network of Taiwanese big business: A preliminary analysis. Mod. China 12 487–534.
  • Peng, J., Zhou, N. and Zhu, J. (2009). Partial correlation estimation by joint sparse regression models. J. Amer. Statist. Assoc. 104 735–746.
  • Qiu, H., Han, F., Liu, H. and Caffo, B. (2016). Joint estimation of multiple graphical models from high dimensional time series. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 487–504.
  • Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
  • Ren, Z., Sun, T., Zhang, C.-H. and Zhou, H. H. (2015). Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Statist. 43 991–1026.
  • Rogers, E. M. (1995). Diffusion of Innovations. Free Press, New York.
  • Shah, R. D. and Samworth, R. J. (2013). Variable selection with error control: Another look at stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 55–80.
  • Strang, D. and Soule, S. A. (1998). Diffusion in organizations and social movements: From hybrid corn to poison pills. Annu. Rev. Sociol. 24 265–290.
  • Strang, D. and Tuma, N. B. (1993). Spatial and temporal heterogeneity in diffusion. Am. J. Sociol. 99 614–639.
  • Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature 393 440–442.
  • Xue, L., Zou, H. and Cai, T. (2012). Nonconcave penalized composite conditional likelihood estimation of sparse Ising models. Ann. Statist. 40 1403–1429.
  • Yen, T.-J, Lee, Z.-R, Chen, Y.-H, Yen, Y.-M and Hwang, J.-S (2017). Supplement to “Estimating links of a network from time to event data.” DOI:10.1214/17-AOAS1032SUPP.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.

Supplemental materials

  • Supplementary Materials for “Estimating links of a network from time to event data”. Supplementary Materials contain an numerical algorithm for obtaining estimator (4.4), further details on aggregation of the campaign donation data, additional results for simulation study and additional results for real data application.