The Annals of Statistics

Generalized Pearson-Fisher Chi-Square Goodness-of-Fit Tests, with Applications to Models with Life History Data

Gang Li and Hani Doss

Full-text: Open access


Suppose that $X_1,\ldots,X_n$ are i.i.d. $\sim F$, and we wish to test the null hypothesis that $F$ is a member of the parametric family $\mathscr{F} = \{F_\theta(x); \theta \in \Theta\}$ where $\Theta \subset \mathbb{R}^q$. The classical Pearson-Fisher chi-square test involves partitioning the real axis into $k$ cells $I_1,\ldots, I_k$ and forming the chi-square statistic $X^2 = \sum^k_{i = 1}(O_i - nF_{\hat{\theta}}(I_i))^2/nF_{\hat{\theta}}(I_i)$, where $O_i$ is the number of observations falling into cell $i$ and $\hat{\theta}$ is the value of $\theta$ minimizing $\sum^k_{i = 1}(O_i - nF_\theta(I_i))^2/nF_\theta(I_i)$. We obtain a generalization of this test to any situation for which there is available a nonparametric estimator $\hat{F}$ of $F$ for which $n^{1/2}(\hat{F} - F) \rightarrow_d W$, where $W$ is a continuous zero mean Gaussian process satisfying a mild regularity condition. We allow the cells to be data dependent. Essentially, we estimate $\theta$ by the value $\hat{\theta}$ that minimizes a "distance" between the vectors $(\hat{F}(I_1),\ldots,\hat{F}(I_k))$ and $(F_\theta(I_1),\ldots, F_\theta(I_k))$, where distance is measured through an arbitrary positive definite quadratic form, and then form a chi-square type test statistic based on the difference between $(\hat{F}(I_1),\ldots,\hat{F}(I_k))$ and $(F_{\hat{\theta}}(I_1),\ldots, F_{\hat{\theta}}(I_k))$. We prove that this test statistic has asymptotically a chi-square distribution with $k - q - 1$ degrees of freedom, and point out some errors in the literature on chi-square tests in survival analysis. Our procedure is very general and applies to a number of well-known models in survival analysis, such as right censoring and left truncation. We apply our method to deal with questions of model selection in the problem of estimating the distribution of the length of the incubation period of the AIDS virus using the CDC's data on blood-transfusion related AIDS. Our analysis suggests some models that seem to fit better than those used in the literature.

Article information

Ann. Statist., Volume 21, Number 2 (1993), 772-797.

First available in Project Euclid: 12 April 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier


Primary: 62F05: Asymptotic properties of tests
Secondary: 62F03: Hypothesis testing 62E20: Asymptotic distribution theory

Goodness-of-fit test Pearson-Fisher chi-square test chi-squared statistic left truncation right censoring Aalen model


Li, Gang; Doss, Hani. Generalized Pearson-Fisher Chi-Square Goodness-of-Fit Tests, with Applications to Models with Life History Data. Ann. Statist. 21 (1993), no. 2, 772--797. doi:10.1214/aos/1176349151.

Export citation