Electronic Journal of Statistics

Sequential quantiles via Hermite series density estimation

Michael Stephanou, Melvin Varughese, and Iain Macdonald

Full-text: Open access

Abstract

Sequential quantile estimation refers to incorporating observations into quantile estimates in an incremental fashion thus furnishing an online estimate of one or more quantiles at any given point in time. Sequential quantile estimation is also known as online quantile estimation. This area is relevant to the analysis of data streams and to the one-pass analysis of massive data sets. Applications include network traffic and latency analysis, real time fraud detection and high frequency trading. We introduce new techniques for online quantile estimation based on Hermite series estimators in the settings of static quantile estimation and dynamic quantile estimation. In the static quantile estimation setting we apply the existing Gauss-Hermite expansion in a novel manner. In particular, we exploit the fact that Gauss-Hermite coefficients can be updated in a sequential manner. To treat dynamic quantile estimation we introduce a novel expansion with an exponentially weighted estimator for the Gauss-Hermite coefficients which we term the Exponentially Weighted Gauss-Hermite (EWGH) expansion. These algorithms go beyond existing sequential quantile estimation algorithms in that they allow arbitrary quantiles (as opposed to pre-specified quantiles) to be estimated at any point in time. In doing so we provide a solution to online distribution function and online quantile function estimation on data streams. In particular we derive an analytical expression for the CDF and prove consistency results for the CDF under certain conditions. In addition we analyse the associated quantile estimator. Simulation studies and tests on real data reveal the Gauss-Hermite based algorithms to be competitive with a leading existing algorithm.

Article information

Source
Electron. J. Statist., Volume 11, Number 1 (2017), 570-607.

Dates
Received: June 2016
First available in Project Euclid: 3 March 2017

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1488531636

Digital Object Identifier
doi:10.1214/17-EJS1245

Mathematical Reviews number (MathSciNet)
MR3619317

Zentralblatt MATH identifier
1392.62243

Keywords
Sequential quantile estimation online quantile estimation sequential distribution function estimation online distribution function estimation

Rights
Creative Commons Attribution 4.0 International License.

Citation

Stephanou, Michael; Varughese, Melvin; Macdonald, Iain. Sequential quantiles via Hermite series density estimation. Electron. J. Statist. 11 (2017), no. 1, 570--607. doi:10.1214/17-EJS1245. https://projecteuclid.org/euclid.ejs/1488531636


Export citation

References

  • [1] Blinnikov, S. and Moessner, R. (1998). Expansions for nearly Gaussian distributions., Astronomy and Astrophysics Supplement Series 130 193–205.
  • [2] Cahill, M. H., Lambert, D., Pinheiro, J. C. and Sun, D. X. (2002). Detecting fraud in the real world. In, Handbook of massive data sets 911–929. Springer.
  • [3] Chambers, J. M., James, D. A., Lambert, D. and Wiel, S. V. (2006). Monitoring networked applications with incremental quantile estimation., Statistical Science 463–475.
  • [4] Chen, F., Lambert, D. and Pinheiro, J. C. (2000). Incremental quantile estimation for massive tracking. In, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining 516–522. ACM.
  • [5] Chen, M., Mao, S. and Liu, Y. (2014). Big data: A survey., Mobile Networks and Applications 19 171–209.
  • [6] Clauset, A., Shalizi, C. R. and Newman, M. E. (2009). Power-law distributions in empirical data., SIAM Review 51 661–703.
  • [7] Davis, H. F. (1989)., Fourier series and orthogonal functions. Courier Corporation.
  • [8] Devroye, L. and Gyorfi, L. (1985)., Nonparametric density estimation: The L1 view 119. John Wiley & Sons Incorporated.
  • [9] Diggle, P. J. and Hall, P. (1986). The selection of terms in an orthogonal series density estimator., Journal of the American Statistical Association 81 230–233.
  • [10] Falk, M. (1984). Relative deficiency of kernel type estimators of quantiles., The Annals of Statistics 261–268.
  • [11] Falk, M. et al. (1985). Asymptotic normality of the kernel quantile estimator., The Annals of Statistics 13 428–433.
  • [12] Greblicki, W. and Pawlak, M. (1984). Hermite series estimates of a probability density and its derivatives., Journal of Multivariate Analysis 15 174–182.
  • [13] Greblicki, W. and Pawlak, M. (2008)., Nonparametric system identification. Cambridge University Press Cambridge.
  • [14] Hart, J. D. (1985). Data-based choice of the smoothing parameter for a kernel density estimator., Australian Journal of Statistics 27 44–52.
  • [15] Jain, R. and Chlamtac, I. (1985). The P 2 algorithm for dynamic calculation of quantiles and histograms without storing observations., Communications of the ACM 28 1076–1085.
  • [16] Kronmal, R. and Tarter, M. (1968). The estimation of probability densities and cumulatives by Fourier series methods., Journal of the American Statistical Association 63 925–952.
  • [17] Loveless, J., Stoikov, S. and Waeber, R. (2013). Online algorithms in high-frequency trading., Communications of the ACM 56 50–56.
  • [18] Naumov, V. and Martikainen, O. (2007). Exponentially weighted simultaneous estimation of several quantiles., World Academy of Science, Engineering and Technology 8 563–568.
  • [19] Ott, J. and Kronmal, R. A. (1976). Some classification procedures for multivariate binary data using orthogonal functions., Journal of the American Statistical Association 71 391–399.
  • [20] Pepelyshev, A., Rafajłowicz, E. and Steland, A. (2014). Estimation of the quantile function using Bernstein–Durrmeyer polynomials., Journal of Nonparametric Statistics 26 1–20.
  • [21] Puuronen, J. and Hyvärinen, A. (2011). Hermite polynomials and measures of non-gaussianity. In, Artificial Neural Networks and Machine Learning–ICANN 2011 205–212. Springer.
  • [22] Raatikainen, K. E. (1987). Simultaneous estimation of several percentiles., Simulation 49 159–163.
  • [23] Raatikainen, K. E. (1990). Sequential procedure for simultaneous estimation of several percentiles., Trans. Society for Computer Simulation 1 21–44.
  • [24] Reiss, R.-D. (1981). Nonparametric estimation of smooth distribution functions., Scandinavian Journal of Statistics 116–119.
  • [25] Robbins, H. and Monro, S. (1951). A stochastic approximation method., The Annals of Mathematical Statistics 400–407.
  • [26] Sheather, S. J. and Marron, J. S. (1990). Kernel quantile estimators., Journal of the American Statistical Association 85 410–416.
  • [27] Szego, G. (1939)., Orthogonal polynomials 23. American Mathematical Soc.
  • [28] Tierney, L. (1983). A space-efficient recursive procedure for estimating a quantile of an unknown distribution., SIAM Journal on Scientific and Statistical Computing 4 706–711.
  • [29] Walter, G. G. (1977). Properties of Hermite series estimation of probability density., The Annals of Statistics 1258–1264.
  • [30] Welford, B. (1962). Note on a method for calculating corrected sums of squares and products., Technometrics 4 419–420.