Electronic Journal of Statistics

On the exact Berk-Jones statistics and their $p$-value calculation

Amit Moscovich, Boaz Nadler, and Clifford Spiegelman

Full-text: Open access


Continuous goodness-of-fit testing is a classical problem in statistics. Despite having low power for detecting deviations at the tail of a distribution, the most popular test is based on the Kolmogorov-Smirnov statistic. While similar variance-weighted statistics such as Anderson-Darling and the Higher Criticism statistic give more weight to tail deviations, as shown in various works, they still mishandle the extreme tails.

As a viable alternative, in this paper we study some of the statistical properties of the exact $M_{n}$ statistics of Berk and Jones. In particular we show that they are consistent and asymptotically optimal for detecting a wide range of rare-weak mixture models. Additionally, we present a new computationally efficient method to calculate $p$-values for any supremum-based one-sided statistic, including the one-sided $M_{n}^{+},M_{n}^{-}$ and $R_{n}^{+},R_{n}^{-}$ statistics of Berk and Jones and the Higher Criticism statistic. Finally, we show that $M_{n}$ compares favorably to related statistics in several finite-sample simulations.

Article information

Electron. J. Statist., Volume 10, Number 2 (2016), 2329-2354.

Received: February 2016
First available in Project Euclid: 2 September 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G10: Hypothesis testing 62G20: Asymptotic properties
Secondary: 62-04: Explicit machine computation and programs (not the theory of computation or programming)

Continuous goodness-of-fit Hypothesis testing p-value computation Rare-weak model


Moscovich, Amit; Nadler, Boaz; Spiegelman, Clifford. On the exact Berk-Jones statistics and their $p$-value calculation. Electron. J. Statist. 10 (2016), no. 2, 2329--2354. doi:10.1214/16-EJS1172. https://projecteuclid.org/euclid.ejs/1472829397

Export citation


  • Aldor-Noiman, S., Brown, L. D., Buja, A., Rolke, W. and Stine, R. A. (2014). The power to see: a new graphical test of normality (correction)., Amer. Statist. 68 318.
  • Anderson, T. W. and Darling, D. A. (1952). Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes., Ann. Math. Statistics 23 193–212.
  • Anderson, T. W. and Darling, D. A. (1954). A test of goodness of fit., J. Amer. Statist. Assoc. 49 765–769.
  • Barnett, I. J. and Lin, X. (2014). Analytical $p$-value calculation for the higher criticism test in finite-$d$ problems., Biometrika 101 964–970.
  • Berk, R. H. and Jones, D. H. (1978). Relatively optimal combinations of test statistics., Scand. J. Statist. 5 158–162.
  • Berk, R. H. and Jones, D. H. (1979). Goodness-of-fit test statistics that dominate the Kolmogorov statistics., Z. Wahrsch. Verw. Gebiete 47 47–59.
  • Brown, J. R. and Harvey, M. E. (2008a). Arbitrary Precision Mathematica Functions to Evaluate the One-Sided One Sample K-S Cumulative Sampling Distribution., Journal of Statistical Software 26 1–55.
  • Brown, J. R. and Harvey, M. E. (2008b). Rational Arithmetic Mathematica Functions to Evaluate the Two-Sided One Sample K-S Cumulative Sampling Distribution., Journal of Statistical Software 26 1–40.
  • Cai, T. T. and Wu, Y. (2014). Optimal detection of sparse mixtures against a given null distribution., IEEE Trans. Inform. Theory 60 2217–2232.
  • Calitz, F. (1987). An alternative to the Kolmogorov-Smirnov test for goodness of fit., Comm. Statist. Theory Methods 16 3519–3534.
  • Csörgő, M., Csörgő, S., Horváth, L. and Mason, D. M. (1986). Weighted empirical and quantile processes., Ann. Probab. 14 31–85.
  • Daniels, H. E. (1945). The statistical theory of the strength of bundles of threads. I., Proc. Roy. Soc. London. Ser. A. 183 405–435.
  • Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures., Ann. Statist. 32 962–994.
  • Donoho, D. and Jin, J. (2008). Higher criticism thresholding: Optimal feature selection when useful features are rare and weak., Proceedings of the National Academy of Sciences 105 14790–14795.
  • Duembgen, L. and Wellner, J. A. (2014). Confidence Bands for Distribution Functions: A New Look at the Law of the Iterated Logarithm., ArXiv e-prints.
  • Durbin, J. (1973)., Distribution theory for tests based on the sample distribution function. Society for Industrial and Applied Mathematics, Philadelphia, Pa. Conference Board of the Mathematical Sciences Regional Conference Series in Applied Mathematics, No. 9.
  • Eicker, F. (1979). The asymptotic distribution of the suprema of the standardized empirical processes., Ann. Statist. 7 116–138.
  • Friedrich, T. and Schellhaas, H. (1998). Computation of the percentage points and the power for the two-sided Kolmogorov-Smirnov one sample test., Statist. Papers 39 361–375.
  • Gontscharuk, V. and Finner, H. (2016). Asymptotics of goodness-of-fit tests based on minimum P-value statistics., Communications in Statistics - Theory and Methods.
  • Gontscharuk, V., Landwehr, S. and Finner, H. (2015). The intermediates take it all: asymptotics of higher criticism statistics and a powerful alternative based on equal local levels., Biom. J. 57 159–180.
  • Gontscharuk, V., Landwehr, S. and Finner, H. (2016). Goodness of fit tests in terms of local levels with special emphasis on higher criticism tests., Bernoulli 22 1331–1363.
  • Ingster, Y. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions., Math. Methods Statist. 6 47–69.
  • Jaeschke, D. (1979). The asymptotic distribution of the supremum of the standardized empirical distribution function on subintervals., Ann. Statist. 7 108–115.
  • Jager, L. and Wellner, J. A. (2007). Goodness-of-fit tests via phi-divergences., Ann. Statist. 35 2018–2053.
  • Janssen, A. (2000). Global power functions of goodness of fit tests., Ann. Statist. 28 239–253.
  • Kaplan, D. M. and Goldman, M. (2014). True equality (of pointwise sensitivity) at last: a Dirichlet alternative to Kolmogorov–Smirnov inference on distributions. Technical, Report.
  • Karlin, S. and Rinott, Y. (1980). Classes of orderings of measures and related correlation inequalities. I. Multivariate totally positive distributions., J. Multivariate Anal. 10 467–498.
  • Keilson, J. and Sumita, U. (1983). A decomposition of the beta distribution, related order and asymptotic behavior., Ann. Inst. Statist. Math. 35 243–253.
  • Khmaladze, E. and Shinjikashvili, E. (2001). Calculation of noncrossing probabilities for Poisson processes and its corollaries., Adv. in Appl. Probab. 33 702–716.
  • Kotel'nikova, V. F. and Khmaladze, È. V. (1982). Calculation of the probability of an empirical process not crossing a curvilinear boundary., Teor. Veroyatnost. i Primenen. 27 599–607.
  • Ledwina, T. (1994). Data-driven version of Neyman’s smooth test of fit., J. Amer. Statist. Assoc. 89 1000–1005.
  • Lehmann, E. L. and Romano, J. P. (2005)., Testing statistical hypotheses, third ed. Springer Texts in Statistics. Springer, New York.
  • Li, J. and Siegmund, D. (2015). Higher criticism: $p$-values and criticism., Ann. Statist. 43 1323–1350.
  • Marsaglia, G., Tsang, W. W. and Wang, J. (2003). Evaluating Kolmogorov’s distribution., Journal of Statistical Software 8 1–4.
  • Mary, D. and Ferrari, A. (2014). A non-asymptotic standardization of binomial counts in Higher Criticism. In, Information Theory (ISIT), 2014 IEEE International Symposium on 561–565. IEEE.
  • Mason, D. M. and Schuenemeyer, J. H. (1983). A modified Kolmogorov-Smirnov test sensitive to tail alternatives., Ann. Statist. 11 933–946.
  • Moscovich, A. and Nadler, B. (2016). Fast calculation of boundary crossing probabilities for Poisson processes., ArXiv e-prints.
  • Neyman, J. (1937). Smooth test for goodness of fit., Skand. Aktuarie Tidskv. 20 149-199.
  • Noé, M. (1972). The calculation of distributions of two-sided Kolmogorov-Smirnov type statistics., Ann. Math. Statist. 43 58–64.
  • Owen, A. B. (1995). Nonparametric likelihood confidence bands for a distribution function., J. Amer. Statist. Assoc. 90 516–521.
  • Peizer, D. B. and Pratt, J. W. (1968). A normal approximation for binomial, $F$, beta, and other common, related tail probabilities. I., J. Amer. Statist. Assoc 63 1416–1456.
  • Pratt, J. W. (1968). A normal approximation for binomial, $F$, beta, and other common, related tail probabilities. II., J. Amer. Statist. Assoc. 63 1457–1483.
  • Rainer, J. C. W., Thas, O. and Best, D. J. (2009)., Smooth Tests of Goodness of Fit Using R, 2nd ed. Wiley.
  • Walther, G. (2013). The average likelihood ratio for large-scale multiple testing and detecting sparse mixtures. In, From probability to statistics and back: high-dimensional models and processes. Inst. Math. Stat. (IMS) Collect. 9 317–326. Inst. Math. Statist., Beachwood, OH.

Supplemental materials