The Annals of Applied Statistics

Coauthorship and citation networks for statisticians

Pengsheng Ji and Jiashun Jin

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


We have collected and cleaned two network data sets: Coauthorship and Citation networks for statisticians. The data sets are based on all research papers published in four of the top journals in statistics from $2003$ to the first half of $2012$. We analyze the data sets from many different perspectives, focusing on (a) productivity, patterns and trends, (b) centrality and (c) community structures.

For (a), we find that over the 10-year period, both the average number of papers per author and the fraction of self citations have been decreasing, but the proportion of distant citations has been increasing. These findings are consistent with the belief that the statistics community has become increasingly more collaborative, competitive and globalized.

For (b), we have identified the most prolific/collaborative/highly cited authors. We have also identified a handful of “hot” papers, suggesting “Variable Selection” as one of the “hot” areas.

For (c), we have identified about $15$ meaningful communities or research groups, including large-size ones such as “Spatial Statistics,” “Large-Scale Multiple Testing” and “Variable Selection” as well as small-size ones such as “Dimensional Reduction,” “Bayes,” “Quantile Regression” and “Theoretical Machine Learning.”

Our findings shed light on research habits, trends and topological patterns of statisticians. The data sets provide a fertile ground for future research on social networks.

Article information

Ann. Appl. Stat. Volume 10, Number 4 (2016), 1779-1812.

Received: October 2014
Revised: November 2015
First available in Project Euclid: 5 January 2017

Permanent link to this document

Digital Object Identifier

Adjacent rand index centrality collaboration community detection Degree Corrected Block Model productivity social network spectral clustering


Ji, Pengsheng; Jin, Jiashun. Coauthorship and citation networks for statisticians. Ann. Appl. Stat. 10 (2016), no. 4, 1779--1812. doi:10.1214/15-AOAS896.

Export citation


  • Amini, A. A., Chen, A., Bickel, P. J. and Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. Ann. Statist. 41 2097–2122.
  • Arenas, A., Duch, J., Fernández, A. and Gómez, S. (2007). Size reduction of complex networks preserving modularity. New J. Phys. 9 176.1–176.15.
  • Bang-Jensen, J. and Gutin, G. (2009). Digraphs: Theory, Algorithms and Applications, 2nd ed. Springer, London.
  • Barabási, A.-L. and Albert, R. (1999). Emergence of scaling in random networks. Science 286 509–512.
  • Bickel, P. and Chen, A. (2009). A nonparametric view of network models and Newman–Girvan and other modularities. Proc. Natl. Acad. Sci. USA 106 21068–21073.
  • Bickel, P. J. and Levina, E. (2008a). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Bickel, P. J. and Levina, E. (2008b). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Candes, E. and Tao, T. (2007). The Dantzig selector: Statistical estimation when $p$ is much larger than $n$. Ann. Statist. 35 2313–2351.
  • Chen, S. S., Donoho, D. L. and Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20 33–61.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Fan, J. and Li, R. (2004). New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J. Amer. Statist. Assoc. 99 710–723.
  • Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B. Stat. Methodol. 70 849–911.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • Freeman, L. C., Borgatti, S. P. and White, D. R. (1991). Centrality in valued graphs: A measure of betweenness based on network flow. Social Networks 13 141–154.
  • Gini, C. (1936). On the measure of concentration with special reference to income and statistics. Colorado College Publication, General Series 208 73–79.
  • Goldenberg, A., Zheng, A., Fienberg, S. and Airoldi, E. (2009). A survey of statistical network models. Faund. Trends Mach. Learn. 2 129–233.
  • Grossman, J. W. (2002). The evolution of the mathematical research collaboration graph. Congr. Numer. 158 201–212.
  • Huang, J., Horowitz, J. L. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587–613.
  • Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 2 193–218.
  • Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617–1642.
  • Ioannidis, J. P. A. (2008). Measuring co-authorship and networking-adjusted scientific impact. PLoS ONE 3 e2778.
  • Ji, P., Jin, J. and Ke, Z. (2015). Social networks for statisticians, new data and new perspectives. Unpublished manuscript.
  • Jin, J. (2015). Fast community detection by SCORE. Ann. Statist. 43 57–89.
  • Johnstone, I. M. and Silverman, B. W. (2005). Empirical Bayes selection of wavelet thresholds. Ann. Statist. 33 1700–1752.
  • Karrer, B. and Newman, M. E. J. (2011). Stochastic blockmodels and community structure in networks. Phys. Rev. E (3) 83 016107, 10.
  • Kim, Y., Son, S.-W. and Jeong, H. (2010). Finding communities in directed networks. Phys. Rev. E 81 016103.
  • Leicht, E. and Newman, M. (2008). Community structure in directed networks. Phys. Rev. Lett. 100 118703.
  • Martin, T., Ball, B., Karrer, B. and Newman, M. (2013). Coauthorship and citation patterns in the physical review. Phys. Rev. E 88.
  • Meila, M. (2003). Comparing clusterings by the variation of information. In Learning Theory and Kernel Machines: 16th Annual Conference on Computational Learning Theory and 7th Kernel Workshop (B. Scholkopf and M. K. Warmuth, eds.). Springer, Berlin.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Newman, M. E. J. (2001a). The structure of scientific collaboration networks. Proc. Natl. Acad. Sci. USA 98 404–409 (electronic).
  • Newman, M. E. J. (2001b). Scientific collaboration networks. I. Network construction and fundamental results. Phys. Rev. E 64 016131.
  • Newman, M. (2004). Coauthorship networks and patterns of scientific collaboration. Proc. Natl. Acad. Sci. USA 101 5200–5205.
  • Newman, M. (2006). Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 103 8577–8582.
  • Newman, M. E. J. and Leicht, E. A. (2007). Mixture models and exploratory analysis in networks. Proc. Natl. Acad. Sci. USA 104 9564–9569.
  • Ramasco, J. J. and Mungan, M. (2008). Inversion method for content-based networks. Phys. Rev. E (3) 77 036122, 12.
  • Sabidussi, G. (1966). The centrality index of a graph. Psychometrika 31 581–603.
  • Storey, J. D. (2003). The positive false discovery rate: A Bayesian interpretation and the $q$-value. Ann. Statist. 31 2013–2035.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
  • Tukey, J. (1977). Exploratory Data Analysis. Addison-Wesley, Reading, MA.
  • Zhao, Y., Levina, E. and Zhu, J. (2012). Consistency of community detection in networks under degree-corrected stochastic block models. Ann. Statist. 40 2266–2292.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B. Stat. Methodol. 67 301–320.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models. Ann. Statist. 36 1509–1533.

See also

  • Introduction to discussion of "Coauthorship and citation networks for statisticians".
  • Discussion of "Coauthorship and citation networks for statisticians".
  • Discussion of "Coauthorship and citation networks for statisticians".
  • Discussion of "Coauthorship and citation networks for statisticians".
  • Discussion of "Coauthorship and citation networks for statisticians".
  • Discussion of "Coauthorship and citation networks for statisticians".
  • Rejoinder: "Coauthorship and citation networks for statisticians".