## The Annals of Statistics

- Ann. Statist.
- Volume 21, Number 2 (1993), 772-797.

### Generalized Pearson-Fisher Chi-Square Goodness-of-Fit Tests, with Applications to Models with Life History Data

#### Abstract

Suppose that $X_1,\ldots,X_n$ are i.i.d. $\sim F$, and we wish to test the null hypothesis that $F$ is a member of the parametric family $\mathscr{F} = \{F_\theta(x); \theta \in \Theta\}$ where $\Theta \subset \mathbb{R}^q$. The classical Pearson-Fisher chi-square test involves partitioning the real axis into $k$ cells $I_1,\ldots, I_k$ and forming the chi-square statistic $X^2 = \sum^k_{i = 1}(O_i - nF_{\hat{\theta}}(I_i))^2/nF_{\hat{\theta}}(I_i)$, where $O_i$ is the number of observations falling into cell $i$ and $\hat{\theta}$ is the value of $\theta$ minimizing $\sum^k_{i = 1}(O_i - nF_\theta(I_i))^2/nF_\theta(I_i)$. We obtain a generalization of this test to any situation for which there is available a nonparametric estimator $\hat{F}$ of $F$ for which $n^{1/2}(\hat{F} - F) \rightarrow_d W$, where $W$ is a continuous zero mean Gaussian process satisfying a mild regularity condition. We allow the cells to be data dependent. Essentially, we estimate $\theta$ by the value $\hat{\theta}$ that minimizes a "distance" between the vectors $(\hat{F}(I_1),\ldots,\hat{F}(I_k))$ and $(F_\theta(I_1),\ldots, F_\theta(I_k))$, where distance is measured through an arbitrary positive definite quadratic form, and then form a chi-square type test statistic based on the difference between $(\hat{F}(I_1),\ldots,\hat{F}(I_k))$ and $(F_{\hat{\theta}}(I_1),\ldots, F_{\hat{\theta}}(I_k))$. We prove that this test statistic has asymptotically a chi-square distribution with $k - q - 1$ degrees of freedom, and point out some errors in the literature on chi-square tests in survival analysis. Our procedure is very general and applies to a number of well-known models in survival analysis, such as right censoring and left truncation. We apply our method to deal with questions of model selection in the problem of estimating the distribution of the length of the incubation period of the AIDS virus using the CDC's data on blood-transfusion related AIDS. Our analysis suggests some models that seem to fit better than those used in the literature.

#### Article information

**Source**

Ann. Statist., Volume 21, Number 2 (1993), 772-797.

**Dates**

First available in Project Euclid: 12 April 2007

**Permanent link to this document**

https://projecteuclid.org/euclid.aos/1176349151

**Digital Object Identifier**

doi:10.1214/aos/1176349151

**Mathematical Reviews number (MathSciNet)**

MR1232519

**Zentralblatt MATH identifier**

0788.62020

**JSTOR**

links.jstor.org

**Subjects**

Primary: 62F05: Asymptotic properties of tests

Secondary: 62F03: Hypothesis testing 62E20: Asymptotic distribution theory

**Keywords**

Goodness-of-fit test Pearson-Fisher chi-square test chi-squared statistic left truncation right censoring Aalen model

#### Citation

Li, Gang; Doss, Hani. Generalized Pearson-Fisher Chi-Square Goodness-of-Fit Tests, with Applications to Models with Life History Data. Ann. Statist. 21 (1993), no. 2, 772--797. doi:10.1214/aos/1176349151. https://projecteuclid.org/euclid.aos/1176349151