On the Asymptotic Distribution of the "PSI-Squared" Goodness of Fit Criteria for Markov Chains and Markov Sequences

B. R. Bhat

doi:10.1214/aoms/1177705138

March, 1961 On the Asymptotic Distribution of the "PSI-Squared" Goodness of Fit Criteria for Markov Chains and Markov Sequences

B. R. Bhat

Ann. Math. Statist. 32(1): 49-58 (March, 1961). DOI: 10.1214/aoms/1177705138

Abstract

The use of a statistic of the algebraic form of Pearson's chi-squared as a measure of goodness of fit for frequencies from a fully specified $m$th order stationary Markov chain was first discussed and contrasted with the appropriate likelihood ratio criterion by Bartlett [2]. Since the distribution of the former statistic is not that of a tabular $\chi^2$-variate, it and allied statistics, are sometimes described as "psi-squared" statistics. Patankar [14] derived the approximate asymptotic distribution (as the total number of transitions $\rightarrow \infty$) of \begin{equation*}\tag{1}\psi^2_1 = \sum_i \lbrack(n_i - m_i)^2/m_i\rbrack,\end{equation*} where the $n_i$ are the marginal frequencies (1-tuples) in a large sequence from a simple stationary Markov chain and the $m_i$ are their expected values in a new sequence of the same length. The proof is based on the fact that for a large sequence of observations the marginal frequencies are asymptotically multivariate normal and then (1) is distributed as a linear function of independent $\chi^2$-variates. Since the latter can be approximated by a single Type III variate ([5], [15]) the approximate asymptotic distribution of (1) is completely specified by its first two moments. Let $n_\mathfrak{u}$ be the frequency of the $t$-tuple $\mathfrak{u} = (u_1, u_2, \cdots, u_t)$ in a sequence of length $n + t - 1$ from an $m$th order stationary Markov chain; and let $m_\mathfrak{u}$ be its expected value in a new sequence of the same length. To test whether the chain has a specified transition probability matrix, in analogy with (1) one may construct the statistic \begin{equation*}\tag{2}\psi^2_t = \sum_\mathfrak{u}\lbrack(n_\mathfrak{u} - m_\mathfrak{u})^2/m_\mathfrak{u}\rbrack\end{equation*} and test the goodness of fit for $n_\mathfrak{u}$. In (2) the summation extends over those values of $\mathfrak{u}$ for which $m_\mathfrak{u}$ does not vanish. Using methods different from those used here, Good [9] gave the asymptotic distribution of $\psi^2_t$ for the special case of a random sequence of digits, and showed that for an equiprobable random sequence (Markovity of order $-1$) having a prime number of categories, $\psi^2_t$ is asymptotically a linear combination of independent $\chi^2$-variates. This was generalized to the case of an arbitrary number of categories and to an arbitrary random sequence (Markovity of order 0) by Billingsley [4]. Good [11] conjectured that a similar result might be true for Markovity of any order. Following Good [10], Goodman [12] has shown that this conjecture is not true, and has proceeded to study a modification that is true. For further work in this direction and additional references see [13]. Since it is clear that the distribution of (2) does not have a simple form, we might assume that it follows the Type III form approximately. This approximation is suggested by the fact that $(n_\mathfrak{u})$ is asymptotically normal and hence the quadratic form (2) in $(n_\mathfrak{u})$ is distributed asymptotically as a linear function of $\chi^2$-variates with one degree of freedom [5]. Since (2) is nonnegative, the coefficients of the corresponding linear function of $\chi^2$-variates are also nonnegative. In the case when $m = 1$ or 0, the exact values of these coefficients are also known [4], [9]. The problem of approximating the distribution of a linear function of $\chi^2$-variates has been discussed by Welch [15] and Box [5]. They observe that this Type III approximation is fairly good over a wide range of values of degrees of freedom of the different $\chi^2$ and their coefficients, especially when these coefficients are positive. The advantage of this approximation is that it enables us to test the goodness of fit by referring to standard $\chi^2$-tables. In Section 2 of this paper, we derive this approximate distribution of (2) by obtaining its first two moments for any $m$ and $t \geqq m$. Let $X_1, X_2, \cdots, X_{n + t - 1}$ be a series of observations from a stationary linear Markov sequence (autoregressive) of first order; \begin{equation*}\tag{3}X_i = \rho X_{i - 1} + Y_i\quad(i = 2, 3, \cdots, n + t - 1),\end{equation*} where $|\rho| < 1$, and the $Y_i$ are independent identically distributed continuous random variables with zero mean and range $(-\infty, +\infty)$. (Even though not in universal use, the term "Markov sequence" here refers to a Markov chain with continuous state space. We follow Bartlett [3] in using it.) Let these $n + t - 1$ observations be grouped into $k$ class intervals and let $n_\mathfrak{u}$ be the frequency of the $t$-tuple $(X_{u_1}, X_{u_2}, \cdots, X_{u_t})$ in this sequence, where $X_{u_1}, X_{u_2}, \cdots, X_{u_t}$ are $t(\geqq 1)$ consecutive observations belonging to the $u_1$th, $u_2$th, $\cdots, u_t$th class intervals respectively. For these frequencies, we derive the approximate distribution of the psi-squared test defined by (2), under some mild restrictions on the distribution of $Y$ and for small class intervals, assuming $\rho$ to be known. For the case $t = 1$, and the distribution of $Y$ normal, Patankar [14] has obtained its distribution. We observe that, for $t = 1$ and $Y$ arbitrarily distributed, the same distribution is obtained. In Section 4, the distribution of the $\psi^2_t$ test of goodness of fit for frequencies of $t$-tuples $(t \geqq 2)$ in a series of observations, grouped into a finite number of class-intervals, from the stationary linear Markov sequence (autoregressive) of second order, \begin{equation*}\tag{4}X_i = aX_{i - 1} + bX_{i -2} + Y_i,\end{equation*} is derived, under similar restrictions on the distribution of $Y$. From this, the distribution of $\psi^2_t$ for stationary linear Markov sequences (autoregressive) of arbitrary order is deduced. The distribution of (2) may also be used to calculate the power of the usual $\chi^2$-test of goodness of fit for independent observations, when the alternative is serial dependence.

Citation

Download Citation

B. R. Bhat. "On the Asymptotic Distribution of the "PSI-Squared" Goodness of Fit Criteria for Markov Chains and Markov Sequences." Ann. Math. Statist. 32 (1) 49 - 58, March, 1961. https://doi.org/10.1214/aoms/1177705138