Electronic Journal of Statistics

Hypothesis testing near singularities and boundaries

Jonathan D. Mitchell, Elizabeth S. Allman, and John A. Rhodes

Full-text: Open access

Abstract

The likelihood ratio statistic, with its asymptotic $\chi ^{2}$ distribution at regular model points, is often used for hypothesis testing. However, the asymptotic distribution can differ at model singularities and boundaries, suggesting the use of a $\chi ^{2}$ might be problematic nearby. Indeed, its poor behavior for testing near singularities and boundaries is apparent in simulations, and can lead to conservative or anti-conservative tests. Here we develop a new distribution designed for use in hypothesis testing near singularities and boundaries, which asymptotically agrees with that of the likelihood ratio statistic. For two example trinomial models, arising in the context of inference of evolutionary trees, we show the new distributions outperform a $\chi ^{2}$.

Note

When this article was first made public, on June 28, 2019, its page numbering was incorrect (pp. 1250–1293). The article’s page numbers were corrected to 2150–2193 on July 30, 2019.

Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 2150-2193.

Dates
Received: June 2018
First available in Project Euclid: 28 June 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1561687407

Digital Object Identifier
doi:10.1214/19-EJS1576

Mathematical Reviews number (MathSciNet)
MR3980955

Zentralblatt MATH identifier
07089017

Subjects
Primary: 62E17: Approximations to distributions (nonasymptotic)
Secondary: 92D15: Problems related to evolution

Keywords
Hypothesis testing singularity boundary likelihood ratio statistic chi-squared phylogenomics coalescent

Rights
Creative Commons Attribution 4.0 International License.

Citation

Mitchell, Jonathan D.; Allman, Elizabeth S.; Rhodes, John A. Hypothesis testing near singularities and boundaries. Electron. J. Statist. 13 (2019), no. 1, 2150--2193. doi:10.1214/19-EJS1576. https://projecteuclid.org/euclid.ejs/1561687407


Export citation

References

  • [1] Allman, E. S., Degnan, J. H. and Rhodes, J. A. (2011). Identifying the Rooted Species Tree from the Distribution of Unrooted Gene Trees under the Coalescent., J. Math Biol. 6 833–862.
  • [2] Andrews, D. W. K. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space., Econometrica 68 399–405.
  • [3] Andrews, D. W. K. and Guggenberger, P. (2009a). Hybrid and size-corrected subsampling methods., Econometrica 77 721–762.
  • [4] Andrews, D. W. K. and Guggenberger, P. (2009b). Incorrect Asymptotic size of subsampling procedures based on post-consistent model selection estimators., J. Econometrics 152 19–27.
  • [5] Andrews, D. and Guggenberger, P. (2010). Asymptotic size and a problem with subsampling and with the m out of n bootstrap., Econometric Theory 26 426–468.
  • [6] Bartolucci, F. (2006). Likelihood inference for a class of latent Markov models under linear hypotheses on the transition probabilities., J. R. Statist. Soc. B 68 155–178.
  • [7] Bartolucci, F., Forcina, A. and Dardanoni, V. (2001). Positive quadrant dependence and marginal modeling in two-way tables with ordered margins., J. Am. Stat. Assoc. 96 1497–1505.
  • [8] Berger, R. L. and Boos, D. D. (1994). P values maximized over a confidence set for nuisence parameters., J. Am. Stat. Assoc. 89 1012–1016.
  • [9] Chernoff, H. (1954). On the distribution of the likelihood ratio., The Annals of Mathematical Statistics 25 573–578.
  • [10] Cressie, N. A. and Read, T. R. (1984). Multinomial Goodness-of-Fit Tests., Journal of the Royal Statistical Society. Series B (Methodological) 440–464.
  • [11] Cressie, N. A. and Read, T. R. (1989). Pearson’s $X^2$ and the Loglikelihood Ratio Statistic $G^2$: A Comparative Review., International Statistical Review/Revue Internationale de Statistique 19–43.
  • [12] Degnan, J. H. and Rosenberg, N. A. (2009). Gene tree discordance, phylogenetic inference and the multispecies coalescent., Trends in Ecology & Evolution 24 332–340.
  • [13] Drton, M. (2009). Likelihood Ratio Tests and Singularities., The Annals of Statistics 979–1012.
  • [14] Durand, E. Y., Patterson, N., Reich, D. and Slatkin, M. (2011). Testing for Ancient Admixture between Closely Related Populations., Mol Biol Evol. 28 2239–2252.
  • [15] Florescu, I. (2014)., Probability and Stochastic Processes. John Wiley & Sons.
  • [16] Gaither, J. and Kubatko, L. (2016). Hypothesis tests for phylogenetic quartets, with applications to coalescent-based species tree inference., Journal of Theoretical Biology 408 179–186.
  • [17] Green, R. E., Krause, J., Briggs, A. W., Maricic, T., Stenzel, U. et al. (2010). A Draft Sequence of the Neandertal Genome., Science 328 710–722.
  • [18] Gu, X. and Li, W.-H. (1996). Bias-corrected paralinear and LogDet distances and tests of molecular clocks and phylogenies under nonstationary nucleotide frequencies., Molecular Biology and Evolution 13 1375–1383.
  • [19] Massingham, T. and Goldman, N. (2007). Statistics of the log-det estimator., Molecular Biology and Evolution 24 2277–2285.
  • [20] McCloskey, A. (2017). Bonferroni-based size-correction for nonstandard testing problems., J. Econometrics 200 17–35.
  • [21] Miller, J. J. (1977). Asymptotic properties of maximum likelihood estimates in the mixed model of the analysis of variance., The Annals of Statistics 746–762.
  • [22] Neyman, J. and Pearson, E. S. (1933). On the Problem of the Most Efficient Tests of Statistical Hypotheses., Philosophical Transactions of the Royal Society of London. Series A, Containing Papers of a Mathematical or Physical Character 231 289–337.
  • [23] Olver, F. W. (2010)., NIST handbook of mathematical functions hardback and CD-ROM. Cambridge University Press.
  • [24] Pamilo, P. and Nei, M. (1988). Relationships between gene trees and species trees., Mol Biol Evol. 5 568–583.
  • [25] Rannala, B. and Yang, Z. (2003). Bayes Estimation of Species Divergence Times and Ancestral Population Sizes Using DNA Sequences From Multiple Loci., Genetics 164 1645–1656.
  • [26] Rosenberg, N. A. (2002). The probability of topological concordance of gene trees and species trees., Theoretical Population Biology 61 225–247.
  • [27] Self, S. G. and Liang, K.-Y. (1987). Asymptotic Properties of Maximum Likelihood Estimators and Likelihood Ratio Tests Under Nonstandard Conditions., Journal of the American Statistical Association 82 605–610.
  • [28] Shapiro, A. (1985). Asymptotic distribution of test statistics in the analysis of moment structures under inequality constraints., Biometrika 72 133–144.
  • [29] Silvapulle, M. J. and Sen, P. K. (2001)., Constrained Statistical Inference: Inequality, Order, and Shape Restrictions. Wiley.
  • [30] Van der Vaart, A. W. (2000)., Asymptotic Statistics 3. Cambridge University Press.
  • [31] Wakeley, J. (2009). Coalescent theory: an introduction., Roberts & Company.
  • [32] Wilks, S. S. (1938). The large-sample distribution of the likelihood ratio for testing composite hypotheses., The Annals of Mathematical Statistics 9 60–62.