## Abstract

Suppose that an experimenter is interested in determining the relationship between a response $\eta$ and several "independent" variables, $x_1, x_2,\cdots, x_K$. The $x$-variables may be controlled by the experimenter or observed without control. Suppose, further, that these $K$ "independent" variables represent all the factors that contribute to the response $\eta$, and that the exact relationship between $\eta$ and the $x$'s is \begin{equation*}\tag{1.1}\eta = \phi(x),\end{equation*} where $\mathbf{x} = (x_1, x_2,\cdots, x_K)'$. The function $\phi(\mathbf{x})$ is called the response function and, geometrically, it defines a surface in $K$-space which is called the response surface. In the real world, however, we rarely know the exact relationship, or all the variables which affect that relationship. One way of proceeding then is to graduate, or approximate to, the true relationship by a polynomial function, linear in some unknown parameters to be estimated and of some selected order in the independent variables. Under the tentative assumption of the validity of this linear model (which we can justify on the basis of a Taylor expansion of $\phi$), we can perform experiments, fit the model using regression techniques, and then apply standard statistical procedures to determine whether this model appears adequate. Since in practice we do not know all of the $K$ factors which affect the response, we usually select $k(< K)$ variables which we believe might have significant effects. This selection may be made on the basis of prior knowledge, or a preliminary experiment may be performed to screen the important variables out of a larger set of possible variables. We can write our graduating model as follows: \begin{equation*}\tag{1.2}y_u = f(x_{1u}, x_{2u},\cdots, x_{ku}\mid\beta_0, \beta_1,\cdots, \beta_p) + \varepsilon_u,\end{equation*} where $y_u$ is the observed response at the $u$th trial, $x_{iu}$ is the setting of $i$th input variable for the $u$th trial, $\beta_j$ is the $j$th parameter to be estimated, $\varepsilon_u$ is the error involved in observing $y$ on the $u$th trial, $u = 1,2,\cdots, N; i = 1,2,\cdots, k < K$; and $j = 0, 1,2,\cdots, p$. The errors $\varepsilon_u$ arise in one or more of the following ways: (i) The true response $\eta$ may be observed with error. (ii) The function $f(x_{1u}, x_{2u},\cdots, x_{ku})$ may not be the correct model. (iii) The observations on the independent variables may contain errors. Once an experimenter has chosen a polynomial model of suitable order, the problem arises as how best to choose the settings for the independent variables over which he has control. A particular selection of settings, or factor levels, at which observations are to be taken is called a design. Designs are usually selected to satisfy some desirable criteria chosen by the experimenter. In this paper we consider the problem of design selection when there are errors in the factor levels as well as in the response. This problem has received little attention; in fact it appears that only Box (1963) has tackled it. Box first gives conditions under which the variance of a linear combination of the observations, say $L = \sum \alpha_u y_u$, is estimated unbiasedly. Next, he shows that these conditions are satisfied by two-level factorial and fractional factorial designs for first order models. In Section 2, we present a criterion for selecting designs for experiments in which errors occur in the factor levels. In Section 3 we make a simplifying assumption and an approximation which leads to an approximate criterion, and this is applied to the first order model case in Section 4.

## Citation

Norman R. Draper. William J. Beggs. "Errors in the Factor Levels and Experimental Design." Ann. Math. Statist. 42 (1) 46 - 58, February, 1971. https://doi.org/10.1214/aoms/1177693493

## Information