The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 8, Number 1 (2014), 89-119.
Power-law distributions in binned empirical data
Yogesh Virkar and Aaron Clauset
Abstract
Many man-made and natural phenomena, including the intensity of earthquakes, population of cities and size of international wars, are believed to follow power-law distributions. The accurate identification of power-law patterns has significant consequences for correctly understanding and modeling complex systems. However, statistical evidence for or against the power-law hypothesis is complicated by large fluctuations in the empirical distribution’s tail, and these are worsened when information is lost from binning the data. We adapt the statistically principled framework for testing the power-law hypothesis, developed by Clauset, Shalizi and Newman, to the case of binned data. This approach includes maximum-likelihood fitting, a hypothesis test based on the Kolmogorov–Smirnov goodness-of-fit statistic and likelihood ratio tests for comparing against alternative explanations. We evaluate the effectiveness of these methods on synthetic binned data with known structure, quantify the loss of statistical power due to binning, and apply the methods to twelve real-world binned data sets with heavy-tailed patterns.
Article information
Source
Ann. Appl. Stat. Volume 8, Number 1 (2014), 89-119.
Dates
First available in Project Euclid: 8 April 2014
Permanent link to this document
http://projecteuclid.org/euclid.aoas/1396966280
Digital Object Identifier
doi:10.1214/13-AOAS710
Mathematical Reviews number (MathSciNet)
MR3191984
Zentralblatt MATH identifier
06302229
Keywords
Power-law distribution heavy-tailed distributions model selection binned data
Citation
Virkar, Yogesh; Clauset, Aaron. Power-law distributions in binned empirical data. Ann. Appl. Stat. 8 (2014), no. 1, 89--119. doi:10.1214/13-AOAS710. http://projecteuclid.org/euclid.aoas/1396966280.
Supplemental materials
- Supplementary material: Supplement to “Power-law distributions in binned empirical data”. In this supplemental file, we derive a closed-form expression for the binned MLE in Section 1.1, quantify the amount of information loss on using a coarser binning scheme in Section 1.2 and include the likelihood ratio test for the binned case in Section 2.Digital Object Identifier: doi:10.1214/13-AOAS710SUPPSupplemental files available for subscribers.

