## Abstract

We shall consider the problem of estimating a linear functional relationship \begin{equation*}\tag{1.1}\alpha + \beta_1\tau^1 + \cdots + \beta_p\tau^p = 0\end{equation*} among $p$ variables $\tau^1, \cdots, \tau^p$ when the observed values do not satisfy it because all of them are subject to errors or fluctuations (superscripts will, in general, be indexing symbols, not powers, in this paper). Geometrically, the problem is equivalent to fitting straight lines or planes to a series of $q$ observed points when all the coordinates are subject to error. This problem has a long history. R. J. Adcock, in two papers written in 1877 [2] and 1878 [3], solves it by minimizing the sum of squares of the orthogonal distances from the points to the hyperplane (1.1). Adcock and many other authors used the model \begin{equation*}\tag{1.2}y_i = \tau_i + \epsilon_i\quad (i = 1, \cdots, q),\end{equation*} where $y_i$ and $\tau_i$ are column vectors representing the observed and true points, and the errors $\epsilon_i$ are independent random vectors with mean value zero. Since the $\tau_i$ are points lying on the hyperplane (1.1), we have in matrix notation \begin{equation*}\tag{1.3}\alpha + \beta\tau_i = 0\quad (i = 1, \cdots, q)\end{equation*} where $\beta$ is a row vector with components $\beta_1, \cdots, \beta_p$. If we assume that the $\tau_i$ are independently drawn from a probability distribution, then the estimate of $\beta$ obtained by Adcock is not consistent. In fact, in 1937, J. Neyman [21] pointed out that if the distribution of the true vectors $\tau_i$ and the errors $\epsilon_i$ is normal, then the distribution of the observed vectors $y_i$ is also normal and, being determined by moments of the first two orders, it is not sufficient to determine the parameters $\alpha$ and $\beta$; the functional relationship (1.1) is, therefore, nonidentifiable. Several methods have been proposed to overcome this difficulty, for which the reader is referred to a recent general survey of the literature by A. Madansky [18], which also contains an extensive bibliography. One approach is to assume that we know the covariance matrix of the errors up to a numerical factor. As was shown, in general, by C. F. Gauss [8], [9], in the case of independently and normally distributed observations whose variances are known up to a numerical factor, the maximum likelihood estimate is simply the weighted least-squares estimate. This estimate of the linear functional relationship was obtained as early as 1879 by C. H. Kummell [15], for the case in which the components $\epsilon^h_i$ of the vectors $\epsilon_i$ are independently distributed with variances $\kappa^{ii}\sigma_{hh}$, where the $\kappa^{ii}$ are known constants and the $\sigma_{hh}$ are known only up to a numerical factor. Kummell found that his estimate coincides with the estimate proposed by Adcock only in the case in which all the variances are equal. M. J. van Uven [24] considered the case in which the errors $\epsilon_i$ are independent and have the same multivariate normal distribution with a covariance matrix $\Sigma$ which is known only up to a numerical factor. His method is essentially the following. He considers $\tau^1, \cdots, \tau^p$ as skew coordinates in a new, "isotropic" space in which the rectilinear orthogonal coordinates are independent and have the same variance. In the new space he then uses Adcock's principle of adjustment, namely, he chooses as the estimate the hyperplane which minimizes the sum of orthogonal distances. Later, T. Koopmans [14] showed that van Uven's estimate is the maximum likelihood estimate for the case considered. If the $\tau_i$ are assumed independently drawn from a probability distribution, the estimate of the linear functional relationship thus obtained is consistent, but the estimate of $\Sigma$ converges in probability to $p^{-1}\Sigma$ (see also [16]). B. M. Dent [7] solved the maximum likelihood equations in the case in which $\Sigma$ is not known, but, as was shown later by D. V. Lindley [16], her estimates are not consistent, and should, therefore, be rejected. More recently, J. Kiefer and J. Wolfowitz [13] showed that, under certain conditions of identifiability, when the $\tau_i$ have a probability distribution, the method of maximum likelihood, if properly applied, yields consistent estimates of both the linear functional relationship and the probability distribution of the $\tau_i$. However, Kiefer and Wolfowitz do not give explicit expressions for the maximum likelihood estimates. No difficulties with respect to the identifiability of the functional relationship or with respect to the consistency of the estimates arise if we can replicate the observations. The model is now, in matrix notation, \begin{equation*}\tag{1.4}y_{ij} = \tau_i + \epsilon_{ij}.\end{equation*} In general we have $n_i$ observed points for each of the true points $\tau_i$, and it is assumed that not all the $\tau_i$ lie on a translated subspace of dimension smaller than $p - 1$. Obviously this implies that $q \geqq p$. This model has been considered previously by G. W. Housner and J. F. Brennan [12], J. W. Tukey [23] and by F. S. Acton [1] (see also [11] and [25]). If we assume that the errors $\epsilon_{ij}$ are independently and normally distributed with a known covariance matrix $\Sigma$, we lose nothing if we consider only the averages $y_{i\cdot} = n^{-1}_i \sum_j y_{ij}$. We have then the same model (1.2), with the only difference that $y_i$ is replaced by $y_{i\cdots}$. If, however, the covariance matrix is not known, we can now obtain in the usual way an estimate $S$ of $\Sigma$. F. S. Acton [1] suggested the use of $S$ instead of $\Sigma$ in the estimate obtained by the method of maximum likelihood in the case of known $\Sigma$. In this paper it will be shown that the estimate thus obtained is the maximum likelihood estimate when $\Sigma$ is unknown. If the design is a (in general incomplete) block design, we have, if the treatment $i$ is applied on the block $j$, \begin{equation*}\tag{1.5}y_{ij} = \tau_i + b_j + \epsilon_{ij},\end{equation*} where $b_j$ is a column vector representing the block effect. Considering the block effects $b_j$ as unknown constants, we get in the usual way the intrablock estimates $t_i$ of the treatment effects $\tau_i$. Then, the same equation (1.2) still holds if $y_i$ is replaced by $t_i$, but in general the errors $\epsilon_i$ and consequently also the estimates $t_i$ will no longer be independent. If the design consists of $r$ replications of a basic design, then the covariance of two errors $\epsilon_i, \epsilon_i'$ will be given by \begin{equation*}\tag{1.6}\operatorname{cov} (\epsilon_i,\epsilon_i') = \frac{\kappa^{ii'}\Sigma}{r}\end{equation*} where $\kappa^{ii'}$ are known coefficients and the matrix $\Sigma$ is unknown. In this paper maximum likelihood estimates for $\Sigma$ and the parameters of the linear functional relationship will be found for the case in which (1.6) holds. It will be shown that the maximum likelihood estimates $\hat\alpha, \hat\beta$ in the case of unknown $\Sigma$ are obtained from the corresponding estimates in the case of known $\Sigma$ by simply replacing $\Sigma$ by the linear regression estimate $S$. In the last Section it will be shown that if the maximum likelihood method is applied directly to the variables $y_{ij}$ instead of the variables $t_i$ and $S$, then the same estimate $\hat\beta$ is obtained, but the estimate of $\Sigma$ is multiplied by $1 - k^{-1} + N^{-1}$, where $k$ is the number of experimental units in each block and $N$ is the total number of experimental units. All of the estimates obtained are consistent, with the exception only of the estimate of $\Sigma$ obtained by the direct approach in the last Section, which converges to $(1 - k^{-1})\Sigma$.

## Citation

C. Villegas. "Maximum Likelihood Estimation of a Linear Functional Relationship." Ann. Math. Statist. 32 (4) 1048 - 1062, December, 1961. https://doi.org/10.1214/aoms/1177704845

## Information