Abstract
Suppose we wish to construct a variable $k$-cell histogram based on an independent identically distributed sample of size $n - 1$ from an unknown density $f$ on the interval of finite length. A variable cell histogram requires cutpoints and heights of all of its cells to be specified. We propose the following procedure: (i) choose from the order statistics corresponding to the sample a set of $k + 1$ cutpoints that maximize a criterion, a function of the sample spacings; (ii) compute heights of the $k$ cells according to a formula. The resulting histogram estimates a $k$-cell theoretical histogram that stays constant within a cell and that minimizes the Hellinger distance to the density $f$. The histogram tends to estimate low density regions accurately and is easy to compute. We find the number of cells of order $n^{1/3}$ minimizes the mean Hellinger distance between the density $f$ and a class of histograms whose cutpoints are chosen from the order statistics.
Citation
Yuichiro Kanazawa. "An Optimal Variable Cell Histogram Based on the Sample Spacings." Ann. Statist. 20 (1) 291 - 304, March, 1992. https://doi.org/10.1214/aos/1176348523
Information