Open Access
2011 Visualizing bivariate long-tailed data
Justin S. Dyer, Art B. Owen
Electron. J. Statist. 5: 642-668 (2011). DOI: 10.1214/11-EJS622

Abstract

Variables in large data sets in biology or e-commerce often have a head, made up of very frequent values and a long tail of ever rarer values. Models such as the Zipf or Zipf–Mandelbrot provide a good description. The problem we address here is the visualization of two such long-tailed variables, as one might see in a bivariate Zipf context. We introduce a copula plot to display the joint behavior of such variables. The plot uses an empirical ordering of the data; we prove that this ordering is asymptotically accurate in a Zipf–Mandelbrot–Poisson model. We often see an association between entities at the head of one variable with those from the tail of the other. We present two generative models (saturation and bipartite preferential attachment) that show such qualitative behavior and we characterize the power law behavior of the marginal distributions in these models.

Citation

Download Citation

Justin S. Dyer. Art B. Owen. "Visualizing bivariate long-tailed data." Electron. J. Statist. 5 642 - 668, 2011. https://doi.org/10.1214/11-EJS622

Information

Published: 2011
First available in Project Euclid: 25 July 2011

zbMATH: 1264.00033
MathSciNet: MR2820634
Digital Object Identifier: 10.1214/11-EJS622

Subjects:
Primary: 00A66 , 90B15
Secondary: 91D30

Keywords: bipartite preferential attachment , bivariate Zipf , copula , preferential attachment , Zipf–Mandelbrot

Rights: Copyright © 2011 The Institute of Mathematical Statistics and the Bernoulli Society

Back to Top