Electronic Journal of Statistics

Visualizing bivariate long-tailed data

Justin S. Dyer and Art B. Owen

Full-text: Open access

Abstract

Variables in large data sets in biology or e-commerce often have a head, made up of very frequent values and a long tail of ever rarer values. Models such as the Zipf or Zipf–Mandelbrot provide a good description. The problem we address here is the visualization of two such long-tailed variables, as one might see in a bivariate Zipf context. We introduce a copula plot to display the joint behavior of such variables. The plot uses an empirical ordering of the data; we prove that this ordering is asymptotically accurate in a Zipf–Mandelbrot–Poisson model. We often see an association between entities at the head of one variable with those from the tail of the other. We present two generative models (saturation and bipartite preferential attachment) that show such qualitative behavior and we characterize the power law behavior of the marginal distributions in these models.

Article information

Source
Electron. J. Statist., Volume 5 (2011), 642-668.

Dates
First available in Project Euclid: 25 July 2011

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1311600465

Digital Object Identifier
doi:10.1214/11-EJS622

Mathematical Reviews number (MathSciNet)
MR2820634

Zentralblatt MATH identifier
1264.00033

Subjects
Primary: 00A66: Mathematics and visual arts, visualization 90B15: Network models, stochastic
Secondary: 91D30: Social networks

Keywords
Copula bivariate Zipf bipartite preferential attachment preferential attachment Zipf–Mandelbrot

Citation

Dyer, Justin S.; Owen, Art B. Visualizing bivariate long-tailed data. Electron. J. Statist. 5 (2011), 642--668. doi:10.1214/11-EJS622. https://projecteuclid.org/euclid.ejs/1311600465


Export citation

References

  • E. Artin., The Gamma Function. Holt, Rinehart and Winston, New York, 1964.
  • A.-L. Barabási and R. Albert. The emergence of scaling in random networks., Science, 286:509–512, 1999.
  • J. Bennett and S. Lanning. The Netflix prize. In, Proceedings of KDD Cup and Workshop 2007, 2007.
  • B. Bollobás, O. Riordan, J. Spencer, and G. Tusnády. The degree sequence of a scale-free random graph process., Random Struct. Algorithms, 18(3):279–290, 2001. ISSN 1042-9832.
  • V. Colizza, A. Flammini, M. A. Serrano, and A. Vespignani. Detecting rich-club ordering in complex networks., Nature physics, 2:110–115, 2006.
  • R. Durrett., Random Graph Dynamics. Cambridge University Press, New York, 2006.
  • J. S. Dyer and A. B. Owen. Correct ordering in the Zipf–Poisson model. Technical report, Stanford University, Statistics, September, 2010.
  • W. Gautschi. Some elementary inequalities relating to the gamma and incomplete gamma function., J. Math. Phys., 38:77–81, 1959.
  • J.-L. Guillaume and M. Latapy. Bipartite graphs as models of complex networks., Physica A, 371:795–813, 2006.
  • J. M. Kleinberg. Authoritative sources in a hyperlinked environment., Journal of the ACM, 46(5):604–632, 1999.
  • S. Maslov and K. Sneppen. Specificity and stability in topology of protein networks., Science, 296:910–913, 2002.
  • S. Maslov, K. Sneppen, and A. Zaliznyak. Detection of topological patterns in complex networks: correlation profile of the internet., Physics A, 333:529–540, 2004.
  • R. B. Nelsen., An Introduction to Copulas. Springer, New York, 2nd edition, 2006.
  • M. E. J. Newman. Mixing patterns in networks., Physical Review E, 67 :026126 1–13, 2003.
  • M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks., Physical Review E, 69 :026113 1–15, 2004.
  • M. E. J. Newman, D. J. Watts, and S. H. Strogatz. Finding and evaluating community structure in networks., Proceedings of the National Academy of Science, 99 :2566–2572, 2002.
  • H. Niederreiter., Random Number Generation and Quasi-Monte Carlo Methods. S.I.A.M., Philadelphia, PA, 1992.
  • G. Palla, I. Derényi, I. Farkas, and T. Vicsek. Uncovering the overlapping community structure of complex networks in nature and society., Nature, 435:814–818, 2005.
  • G. R. Shorack and J. A. Wellner., Empirical Processes with Applications to Statistics. Wiley, New York, 1986.
  • Yahoo! Webscope., http://research.yahoo.com/Academic_Relations.