Optimum Designs in Regression Problems

J. Kiefer; J. Wolfowitz

doi:10.1214/aoms/1177706252

June, 1959 Optimum Designs in Regression Problems

J. Kiefer, J. Wolfowitz

Ann. Math. Statist. 30(2): 271-294 (June, 1959). DOI: 10.1214/aoms/1177706252

Abstract

Although regression problems have been considered by workers in all sciences for many years, until recently relatively little attention has been paid to the optimum design of experiments in such problems. At what values of the independent variable should one take observations, and in what proportions? The purpose of this paper is to develop useful computational procedures for finding optimum designs in regression problems of estimation, testing hypotheses, etc. In Section 2 we shall develop the theory for the case where the desired inference concerns just one of the regression coefficients, and illustrative examples will be given in Section 3. In Section 4 the theory for the case of inference on several coefficients is developed; here there is a choice of several possible optimality criteria, as discussed in [1]. In Section 5 we treat the problem of global estimation of the regression function, rather than of the individual coefficients. We shall now indicate briefly some of the computational aspects of the search for optimum designs by considering the problem of Section 2 wherein the inference concerns one of $k$ regression coefficients. For the sake of concreteness, we shall occasionally refer here to the example of polynomial regression on the real interval $\lbrack -1, 1\rbrack$, where all observations are independent and have the same variance. The quadratic case is rather trivial to treat by our methods, so we shall sometimes refer here to the case of cubic regression. In the latter case we suppose all four regression coefficients to be unknown, and we want to estimate or test a hypothesis about the coefficient $a_3$ of $x^3$. If a fixed number $N$ of observations is to be taken, we can think of representing the proportion of observations taken at any point $x$ by $\xi(x)$, where $\xi$ is a probability measure on $\lbrack -1, 1\rbrack$. To a first approximation (which is discussed in Section 2), we can ignore the fact that in what follows $N\xi$ can take only integer values. We consider three methods of attacking the problem of finding an optimum $\xi$: A. The direct approach is to compute the variance of the best linear estimator of $a_3$ as a function of the values of the independent variable at which observations are taken or, equivalently, as a function of the moments of $\xi$. Denoting by $\mu_i$ the $i$th moment of $\xi$, and assuming $\xi$ to be concentrated entirely on more than three points (so that $a_3$ is estimable), we find easily that the reciprocal of this variance is proportional to $$\frac{\mu^2_5(\mu^2_1 - \mu_2) + 2\mu_5(\mu^2_2 \mu_3 + \mu_3 \mu_4 - \mu_1 \mu^2_3 - \mu_1 \mu_2 \mu_4)\\- \mu^3_4 + \mu^2_4(\mu^2_2 + 2\mu_1 \mu_3) - 3\mu_4 \mu_2 \mu^2_3 + \mu^4_3}{\mu_4(\mu_2 - \mu^2_1) - \mu^2_3 - \mu^3_2 + 2\mu_1 \mu_2 \mu_3} + \mu_6$$ in the case of cubic regression. The problem is to find a $\xi$ on $\lbrack -1, 1\rbrack$ which maximizes this expression. Thus, this direct approach leads to a calculation which appears quite formidable. This is true even if one uses the remark on symmetry of the next paragraph and restricts attention to symmetrical $\xi$, so that $\mu_i = 0$ for $i$ odd. For polynomials of higher degree or for regression functions which are not polynomials, the difficulties are greater. B. The results of Section 2 yield the following approach to the problem: Let $c_0 + c_1x + c_2x^2$ be a best Chebyshev approximation to $x^3$ on $\lbrack -1, 1\rbrack$, i.e., such that the maximum over $\lbrack -1, 1\rbrack$ of $|x^3 - (c_0 + c_1x + c_2x^2)|$ is a minimum over all choices of the $c_i$, and suppose $B$ is the subset of $\lbrack -1, 1\rbrack$ where the maximum of this absolute value is taken on. Then $\xi$ must give measure one to $B$, and the weights assigned by $\xi$ to the various points of $B$ (there are four in this case) can be found either by solving the linear equations (2.10) or by computing these weights so as to make $\xi$ a maximum strategy for the game discussed in Section 2. Two points should be mentioned: (1) In the general polynomial case, where there are $k$ parameters ($k = 4$ here), the results described in [10], p. 42, or in Section 2 below imply that there is an optimum $\xi$ concentrated on at most $k$ points. Thus, even if we use this result with the approach of the previous paragraph, we obtain the following comparison in a $k$-parameter problem in Section 2: Method A: minimize a nonlinear function of $2k - 1$ real variables. Method B: solve the Chebyshev problem and then solve $k - 1$ simultaneous linear equations. The fact that the solution of the Chebyshev problem can often be found in the literature (e.g., [2]) makes the comparison of the second method with the first all the more favorable. (2) Although the computational difficulty cannot in general be reduced further, in the case of polynomial regression on $\lbrack -1, 1\rbrack$ there is present a kind of symmetry (discussed in Section 2) which implies that there is an optimum $\xi$ which is symmetrical about 0 and which is concentrated on four points; thus, in the case of cubic regression, this fact reduces the computation under Method A to a minimization in 3 variables, but Method B involves only the solution of a single linear equation. C. A third method, which rests on the game-theoretic results of Section 2, and which is especially useful when one has a reasonable guess of what an optimum $\xi$ is, involves the following steps: first guess a $\xi$, say $\xi^{\ast}$, and compute the minimum on the left side of (2.8); second, if this minimum is achieved for $c = c^{\ast}$, compute the square of the maximum on the right side of (2.9); then, if these two computations yield the same number, $\xi^{\ast}$ is optimum. If one has a guess of a class of $\xi$'s depending on one or several parameters, among which it is thought that there is an optimum $\xi$, then one can maximize over that class at the end of the first step and, the maximum being at $\xi^{\ast}$, go through the same analysis as above. This method is illustrated in Example 3.5 and Example 4. Of course, the remarks (1) and (2) of the previous paragraph can be used in applying Method C, as in these examples. In the example of cubic regression just cited, the optimum procedure turns out to be $\xi(-1) = \xi(1) = \frac{1}{6}, \xi(\frac{1}{2}) = \xi(-\frac{1}{2}) = \frac{1}{3}$. It is striking that any of the commonly used procedures which take equal numbers of observations at equally spaced points on $\lbrack -1, 1\rbrack$ requires over 38% more observations than this optimum procedure in order to yield the same variance for the best linear estimator of $a_3$ (see Example 3.1); the comparison is even more striking for higher degree regression. The unique optimum procedure in the case of degree $h$ is given by (3.3). The comparison of a direct computational attack, analogous to that of A above, with the methods developed in Sections 4 and 5 for the problems considered there, indicates even more the inferiority of the direct attack. In particular cases, e.g., Example 5.1, special methods may prove useful. Among recent work in the design of experiments we may mention the papers of Elfving [3], [4], Chernoff [5], Williams [11], Ehrenfeld [12], Guest [13], and Hoel [15]. Only Guest and Hoel explicitly consider computational problems of the kind discussed below. Our methods of employing Chebyshev and game theoretic results seem to be completely new. The results obtained in the examples below are also new, except for some slight overlap with results of [13] and [15], which is explicitly described below. We shall consider elsewhere some further problems of the type considered in this paper.

Citation

Download Citation

J. Kiefer. J. Wolfowitz. "Optimum Designs in Regression Problems." Ann. Math. Statist. 30 (2) 271 - 294, June, 1959. https://doi.org/10.1214/aoms/1177706252