### Visualizing bivariate long-tailed data

Justin S. Dyer and Art B. Owen

#### Abstract

Variables in large data sets in biology or e-commerce often have a head, made up of very frequent values and a long tail of ever rarer values. Models such as the Zipf or Zipf–Mandelbrot provide a good description. The problem we address here is the visualization of two such long-tailed variables, as one might see in a bivariate Zipf context. We introduce a copula plot to display the joint behavior of such variables. The plot uses an empirical ordering of the data; we prove that this ordering is asymptotically accurate in a Zipf–Mandelbrot–Poisson model. We often see an association between entities at the head of one variable with those from the tail of the other. We present two generative models (saturation and bipartite preferential attachment) that show such qualitative behavior and we characterize the power law behavior of the marginal distributions in these models.

#### Article information

**Source**

**Dates**

First available in Project Euclid: 25 July 2011

**Permanent link to this document**

https://projecteuclid.org/euclid.ejs/1311600465

**Digital Object Identifier**

doi:10.1214/11-EJS622

**Mathematical Reviews number (MathSciNet)**

MR2820634

**Zentralblatt MATH identifier**

1264.00033

**Subjects**

Primary: 00A66: Mathematics and visual arts, visualization 90B15: Network models, stochastic

Secondary: 91D30: Social networks

**Keywords**

Copula bivariate Zipf bipartite preferential attachment preferential attachment Zipf–Mandelbrot

#### Citation

Dyer, Justin S.; Owen, Art B. Visualizing bivariate long-tailed data. Electron. J. Statist. 5 (2011), 642--668. doi:10.1214/11-EJS622. https://projecteuclid.org/euclid.ejs/1311600465