Electronic Journal of Statistics

Least squares estimation of spatial autoregressive models for large-scale social networks

Abstract

Due to the rapid development of various social networks, the spatial autoregressive (SAR) model is becoming an important tool in social network analysis. However, major bottlenecks remain in analyzing large-scale networks (e.g., Facebook has over 700 million active users), including computational scalability, estimation consistency, and proper network sampling. To address these challenges, we propose a novel least squares estimator (LSE) for analyzing large sparse networks based on the SAR model. Computationally, the LSE is linear in the network size, making it scalable to analysis of huge networks. In theory, the LSE is $\sqrt{n}$-consistent and asymptotically normal under certain regularity conditions. A new LSE-based network sampling technique is further developed, which can automatically adjust autocorrelation between sampled and unsampled units and hence guarantee valid statistical inferences. Moreover, we generalize the LSE approach for the classical SAR model to more complex networks associated with multiple sources of social interaction effect. Numerical results for simulated and real data are presented to illustrate performance of the LSE.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 1135-1165.

Dates
First available in Project Euclid: 5 April 2019

https://projecteuclid.org/euclid.ejs/1554429626

Digital Object Identifier
doi:10.1214/19-EJS1549

Mathematical Reviews number (MathSciNet)
MR3935846

Zentralblatt MATH identifier
07056148

Citation

Huang, Danyang; Lan, Wei; Zhang, Hao Helen; Wang, Hansheng. Least squares estimation of spatial autoregressive models for large-scale social networks. Electron. J. Statist. 13 (2019), no. 1, 1135--1165. doi:10.1214/19-EJS1549. https://projecteuclid.org/euclid.ejs/1554429626

References

• Anselin, L. (2013), Spatial Econometrics: Methods and Models, Springer Science & Business Media.
• Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2004), Hierarchical Modeling and Analysis for Spatial Data, Champman & Hall/CRC.
• Barabási, A.-L. and Albert, R. (1999), “Emergence of scaling in random networks,”, Science, 286, 509–512.
• Barry, P. and Pace, K. (1999), “Monte Carlo estimates of the log determinant of large sparse matrices,”, Linear Algebra Application, 289, 41–54.
• Bronnenberg, B. J. and Mahajan, V. (2001), “Unobserved retailer behavior in multimarket data: Joint spatial dependence in Marketing Shares and Promotion Variables,”, Marketing Science, 20, 284–299.
• Chen, X., Chen, Y., and Xiao, P. (2013), “The impact of sampling and network topology on the estimation of social intercorrelation,”, Journal of Marketing Research, 50, 95–110.
• Clauset, A., Shalizi, C. R., and Newman, M. E. (2009), “Power-law distributions in empirical data,”, SIAM review, 51(4), 661–703.
• Costenbader, E. and Valente, T. W. (2003), “The stability of centrality measures when networks are sampled,”, Social Networks, 25(4), 283–307.
• Frank, O. (1979), “Sampling and estimation in large social networks,”, Social Networks, 1(1), 91–101.
• Fujimoto, K., Chou, C. P., and Valente, T. W. (2011), “The network autocorrelation model using two-mode data: Affiliation exposure and potential bias in the autocorrelation parameter,”, Social Networks, 33(3), 231–243.
• Handcock, M. S. and Gile, K. J. (2010), “Modeling social networks from sampled data,”, The Annals of Applied Statistics.
• Hillier, G. and Martellosio, F. (2014), “Properties of the maximum likelihood estimator in spacial autoregressive models,”, Working Paper.
• Holland, P. W. and Leinhardt, S. (1981), “An exponential family of probability distributions for directed graphs,”, Journal of the American Statistical Association, 76, 33–5.
• Huang, D., Chang, X., and Wang, H. (2018), “Spatial autoregression with repeated measurements for social networks,”, Communications in Statistics - Theory and Methods, 47, 3715–3727.
• Huang, D., Yin, J., Shi, T., and Wang, H. (2016), “A statistical model for social network labeling,”, Journal of Business & Economic Statistics, 34, 368–374.
• Krivitsky, P. N.and Handcock, M. S., Raftery, A. E., and Hoff, P. D. (2009), “Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models,”, Social Networks, 31.
• Lee, L. (2004), “Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models,”, Econometrica, 72, 1899–1925.
• Lee, L., Li, J., and Lin, X. (2010), “Specification and estimation of social interaction models with network structure,”, The Econometrics Journal, 13, 145–176.
• Lee, L. F. and Liu, X. (2010), “Efficient GMM estimation of high order spatial autoregressive models with autoregressive disturbances,”, Econometric Theory, 26, 187–230.
• Leenders, R. T. A. (2002), “Modeling social influence through network autocorrelation: constructing the weight matrix,”, Social Networks, 24, 21–47.
• LeSage, J. and Pace, R. (2007), “A Matrix Exponential Spatial Specification,”, Journal of Econometrics, 140:1, 190–214.
• LeSage, J. and Pace, R. K. (2009), Introduction to Spatial Econometrics, New York: Chapman & Hall.
• Meyn, S. P. and Tweedie, R. L. (2012), Markov Chains and Stochastic Stability, Springer Science & Business Media.
• Newman, M., Barabasi, A.-L., and Watts, D. J. (2006), The Structure and Dynamics of Networks, Princeton University Press.
• Nowicki, K. and Snijders, T. A. B. (2001), “Estimation and prediction for stochastic block structures,”, Journal of the American Statistical Association, 96, 1077–1087.
• Robins, G. (2013), “A tutorial on methods for the modeling and analysis of social network data,”, Journal of Mathematical Psychology, 57(6), 261–274.
• Robins, G., Elliott, P., and Pattison, P. (2001), “Network models for social selection processes,”, Social Networks, 23(1), 1–30.
• Robinson, P. and Rossi, F. (2014), “Improved Lagrange multiplier tests in spatial autoregressions,”, Econometrics Journal, 17, 139–164.
• Sampson, R. J., Morenoff, J. D., and Earls, F. (1999), “Beyond social capital: Spatial dynamics of collective efficacy for children,”, American Sociological Review.
• Shalizi, C. R. and Rinaldo, A. (2013), “Consistency under sampling of exponential random graph models,”, The Annals of Statistics, 41(2), 508–535.
• Smirnov, O. and Anselin, L. (2001), “Fast Maximum Likelihood Estimation of Very Large Spatial Autoregressive Models: A Characteristic Polynomial Approach,”, Computational Statistics and Data Analysis, 35, 301–319.
• Trefethen, L. N. and Bau, D. (1997), Numerical Linear Algebra, vol. Vol.50, Siam.
• Wang, Y. J. and Wong, G. Y. (1987), “Stochastic blockmodels for directed graphs,”, Journal of the American Statistical Association, 82, 8–19.
• Yang, K. and Lee, L.-f. (2017), “Identification and QML estimation of multivariate and simultaneous equations spatial autoregressive models,”, Journal of Econometrics, 196, 196–214.
• Zhou, J., Tu, Y., Chen, Y., and Wang, H. (2017), “Estimating spatial autocorrelation with sampled network data,”, Journal of Business & Economic Statistics, 35(1), 130–138.