A probability assessor or forecaster is a person who assigns subjective probabilities to events which will eventually occur or not occur. There are two purposes for which one might wish to compare two forecasters. The first is to see who has given better forecasts in the past. The second is to decide who will give better forecasts in the future. A method of comparison suitable for the first purpose may not be suitable for the second and vice versa. A criterion called calibration has been suggested for comparing the forecasts of different forecasters. Calibration, in a frequency sense, is a function of long run (future) properties of forecasts and hence is not suitable for making comparisons in the present. A method for comparing forecasters based on past performance is the use of scoring rules. In this paper a general method for comparing forecasters after a finite number of trials is introduced. The general method is proven to include calculating all proper scoring rules as special cases. It also includes comparison of forecasters in all simple two-decision problems as special cases. The relationship between the general method and calibration is also explored. The general method is also translated into a method for deciding who will give better forecasts in the future. An example is given using weather forecasts.
"A General Method for Comparing Probability Assessors." Ann. Statist. 17 (4) 1856 - 1879, December, 1989. https://doi.org/10.1214/aos/1176347398