Translator Disclaimer
June, 1993 Generalized Pearson-Fisher Chi-Square Goodness-of-Fit Tests, with Applications to Models with Life History Data
Gang Li, Hani Doss
Ann. Statist. 21(2): 772-797 (June, 1993). DOI: 10.1214/aos/1176349151


Suppose that $X_1,\ldots,X_n$ are i.i.d. $\sim F$, and we wish to test the null hypothesis that $F$ is a member of the parametric family $\mathscr{F} = \{F_\theta(x); \theta \in \Theta\}$ where $\Theta \subset \mathbb{R}^q$. The classical Pearson-Fisher chi-square test involves partitioning the real axis into $k$ cells $I_1,\ldots, I_k$ and forming the chi-square statistic $X^2 = \sum^k_{i = 1}(O_i - nF_{\hat{\theta}}(I_i))^2/nF_{\hat{\theta}}(I_i)$, where $O_i$ is the number of observations falling into cell $i$ and $\hat{\theta}$ is the value of $\theta$ minimizing $\sum^k_{i = 1}(O_i - nF_\theta(I_i))^2/nF_\theta(I_i)$. We obtain a generalization of this test to any situation for which there is available a nonparametric estimator $\hat{F}$ of $F$ for which $n^{1/2}(\hat{F} - F) \rightarrow_d W$, where $W$ is a continuous zero mean Gaussian process satisfying a mild regularity condition. We allow the cells to be data dependent. Essentially, we estimate $\theta$ by the value $\hat{\theta}$ that minimizes a "distance" between the vectors $(\hat{F}(I_1),\ldots,\hat{F}(I_k))$ and $(F_\theta(I_1),\ldots, F_\theta(I_k))$, where distance is measured through an arbitrary positive definite quadratic form, and then form a chi-square type test statistic based on the difference between $(\hat{F}(I_1),\ldots,\hat{F}(I_k))$ and $(F_{\hat{\theta}}(I_1),\ldots, F_{\hat{\theta}}(I_k))$. We prove that this test statistic has asymptotically a chi-square distribution with $k - q - 1$ degrees of freedom, and point out some errors in the literature on chi-square tests in survival analysis. Our procedure is very general and applies to a number of well-known models in survival analysis, such as right censoring and left truncation. We apply our method to deal with questions of model selection in the problem of estimating the distribution of the length of the incubation period of the AIDS virus using the CDC's data on blood-transfusion related AIDS. Our analysis suggests some models that seem to fit better than those used in the literature.


Download Citation

Gang Li. Hani Doss. "Generalized Pearson-Fisher Chi-Square Goodness-of-Fit Tests, with Applications to Models with Life History Data." Ann. Statist. 21 (2) 772 - 797, June, 1993.


Published: June, 1993
First available in Project Euclid: 12 April 2007

zbMATH: 0788.62020
MathSciNet: MR1232519
Digital Object Identifier: 10.1214/aos/1176349151

Primary: 62F05
Secondary: 62E20, 62F03

Rights: Copyright © 1993 Institute of Mathematical Statistics


Vol.21 • No. 2 • June, 1993
Back to Top