In the usual formulation of problems in statistical decision theory the probability distribution of the observations is assumed to be a member of some specified class of distribution functions. No a priori information is ordinarily assumed to exist concerning which member of this class is the true distribution of the observations although a priori probability measures defined over this class may be introduced as a technical device for generating complete classes of decision functions, minimax decision rules, etc. However, in some experimental situations it may be reasonable to suppose that such an a priori probability measure actually exists in the sense that the distributions of observations occurring in different experiments made under similar circumstances may be thought of as having been selected from a specified class of distribution functions according to some probability law. Such an assumption seems particularly apt in the case where measurements are made on an individual selected according to some probability law (e.g., "at random") from a population and where it is desired to make inferences about some characteristic of the individual on the basis of these measurements. If the class of probability distributions of the measurements for all individuals in the population and the law of selection are known, an optimum Bayes decision procedure can then be found. In general, however, such information will not be available to the experimenter, but there may be observations available on individuals previously selected in the same way from the same population and, under certain circumstances, these prior observations may be used to obtain approximations to the optimum Bayes decision procedure. The possibility of using prior observations to approximate Bayes procedures was first established for certain estimation problems in  by H. Robbins who coined the term "empirical Bayes procedures" to describe such approximations. Robbins in  discusses the estimation, using a squared error loss function, of the value $\lambda$ of a random variable $\Lambda$ associated with a discrete valued observation $X$ whose conditional probability function, given $\lambda$, is $p(x \mid \lambda),$ where $p(x \mid \lambda)$ is known for each $\lambda$ but where the (a priori) distribution of $\Lambda$ is unknown. For several specific parametric families of discrete probability functions $p(x \mid \lambda)$ Robbins shows that if prior independent observations $X_1, X_2, \cdots, X_n$, each having the same unconditional distribution as $X$ are available, then an empirical Bayes estimator Bayes estimator using $X_1, X_2, \cdots, X_n$ can be found which converges with probability 1 to the Bayes estimator as $n$ increases, for any a priori distribution of $\Lambda$. In Sec. 2 below a similar estimation problem is considered for the "non-parametric" case where the class of (conditional) probability distributions of $X$ is not restricted to a particular parametric family, but is instead the class of all probability functions assigning probability 1 to some specified denumerable set of numbers. The quantity to be estimated is the value of a functional defined on this class of probability functions and it is assumed that there exists an unknown a priori probability measure defined on a suitable $\sigma$-algebra of subsets of this class. For this case it is shown that under certain circumstances prior observations may be used to construct empirical Bayes estimators having the property that, as the number of prior observations increases, the risks of the empirical Bayes estimators converge to the risk of the Bayes estimator for any a priori probability measure provided that certain moments exist. The rate of convergence of these risks is also investigated for two special cases. In Sec. 3 the techniques of Sec. 2 are modified to apply to the case where the class of (conditional) distributions of $X$ is the class of all absolutely continuous distribution functions, and similar results are obtained. In Sec. 4 the results of the previous sections are used to obtain empirical Bayes solutions for certain two-decision problems of the hypothesis-testing type. Throughout this paper certain elementary properties of conditional expectations are used which are immediate consequences of results contained, for example, in Chapter VII of .
"Non-Parametric Empirical Bayes Procedures." Ann. Math. Statist. 28 (3) 649 - 669, September, 1957. https://doi.org/10.1214/aoms/1177706877