Bayesian Analysis

On the measure of the information in a statistical experiment

Josep Ginebra

Full-text: Open access

Abstract

Setting aside experimental costs, the choice of an experiment is usually formulated in terms of the maximization of a measure of information, often presented as an optimality design criterion. However, there does not seem to be a universal agreement on what objects can qualify as a valid measure of the information in an experiment. In this article we explicitly state a minimal set of requirements that must be satisfied by all such measures. Under that framework, the measure of the information in an experiment is equivalent to the measure of the variability of its likelihood ratio statistics or which is the same, it is equivalent to the measure of the variability of its posterior to prior ratio statistics and to the measure of the variability of the distribution of the posterior distributions yielded by it. The larger that variability, the more peaked the likelihood functions and posterior distributions that tend to be yielded by the experiment, and the more informative the experiment is. By going through various measures of variability, this paper uncovers the unifying link underlying well known information measures as well as information measures that are not yet recognized as such.

The measure of the information in an experiment is then related to the measure of the information in a given observation from it. In this framework, the choice of experiment based on statistical merit only, is posed as a decision problem where the reward is a likelihood ratio or posterior distribution, the utility function is convex, the utility of the reward is the information observed, and the expected utility is the information in an experiment. Finally, the information in an experiment is linked to the information and to the uncertainty in a probability distribution, and we find that the measure of the information in an experiment is not always interpretable as the uncertainty in the prior minus the expected uncertainty in the posterior.

Article information

Source
Bayesian Anal., Volume 2, Number 1 (2007), 167-211.

Dates
First available in Project Euclid: 22 June 2012

Permanent link to this document
https://projecteuclid.org/euclid.ba/1340390067

Digital Object Identifier
doi:10.1214/07-BA207

Mathematical Reviews number (MathSciNet)
MR2289927

Zentralblatt MATH identifier
1331.62056

Subjects
Primary: Database Expansion Item

Keywords
Convex ordering design of experiments divergence measure Hellinger transform likelihood ratio measure of association measure of diversity measure of surprise mutual information optimal design posterior to prior ratio reference prior location parameter stochastic ordering sufficiency uncertainty utility value of information

Citation

Ginebra, Josep. On the measure of the information in a statistical experiment. Bayesian Anal. 2 (2007), no. 1, 167--211. doi:10.1214/07-BA207. https://projecteuclid.org/euclid.ba/1340390067


Export citation

References

  • \item[] Ali, S. M., and Silvey, S. D. (1965). “Association between random variables and the dispersion of a Radom Nykodim derivative." Journal of the Royal Statistical Society, Series B, 27: 100-107.
  • \item[] Ali, S. M., and Silvey, S. D. (1966). “A general class of coefficients of divergence of one distribution from another." Journal of the Royal Statistical Society, Series B, 28: 131-142.
  • \item[] Barnard, G. A. (1951). “The theory of information (with discussion)." Journal of the Royal Statistical Society, Series B, 13: 131-142.
  • \item[] ––- (1959). Discussion of “Optimum experimental designs," by J. Kiefer. Journal of the Royal Statistical Society, Series B, 21: 311-312.
  • \item[] Barnard, G. A., Jenkins, G. M., and Winsten, C. B. (1962). “Likelihood inference and time series (with discussion)." Journal of the Royal Statistical Society, Series B, 24: 321-372.
  • \item[] Barndorff-Nielsen, O. (1978). Information and Exponential Families in Statistical Theory. New York: Wiley.
  • \item[] Barron, A. R. (1999). “Information-theoretic characterization of Bayes performance and the choice of priors in parametric and nonparametric problems." In J. M. Bernardo, A. P. Dawid, and A. F. M. Smith, (eds.), Bayesian Statistics 6, 27-52. Oxford: Oxford University Press.
  • \item[] Bassan, B., and Scarsini, M. (1991). “Convex orderings for stochastic processes." Commentationes Mathematicae Universitatis Carolinae, 32: 115-118.
  • \item[] Basu, D. (1975). “Statistical information and likelihood (with discussion)." Sankhya A, 37: 1-71.
  • \item[] Bayarri, M. J., and Berger, J. O. (1999). “Quantifying surprise in the data and model verification." In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith (eds.), Bayesian Statistics 6, 53-82. Oxford: Oxford University Press.
  • \item[] Bernardo, J. M. (1979a). “Expected information as expected utility." The Annals of Statistics, 7: 686-690.
  • \item[] ––- (1979b). “Reference posterior distributions for Bayesian inference (with discussion)." Journal of the Royal Statistical Society, Series B, 41: 113-147.
  • \item[] ––- (2005a). “Reference analysis." In D. Dipak, and C. R. Rao (eds.), Bayesian Thinking: Modeling and Computation, Handbook of Statistics 25, 17-90. Amsterdam: North Holland.
  • \item[] ––- (2005b). “Intrinsic credible regions: An objective Bayesian approach to interval estimation (with discussion)." Test, 14: 317-384.
  • \item[] Bernardo, J. M., and Smith, A. F. M. (1994). Bayesian Theory. New York: Wiley.
  • \item[] Birnbaum, A. (1962). “On the foundations of statistical inference (with discussion)." Journal of the American Statistical Association, 67: 269-326.
  • \item[] ––- (1969). “Concepts of statistical evidence." In S. Morgenbesser, P. Suppes, and M. White (eds.), Science and Methodology, 112-143. Saint Martin's Press.
  • \item[] Blackwell, D. (1951). “Comparison of experiments." In Proceengs 2nd Berkeley Symposium on Mathematical Statistics and Probability, 93-102. Berkeley: University of California Press.
  • \item[] ––- (1953). “Equivalent comparison of experiments." Annals of Mathematical Statistics, 24: 265-272.
  • \item[] Blackwell, D., and Girshick, M. A. (1954). Theory of Games and Statistical Decisions. New York: Wiley.
  • \item[] Box, G. E. P. (1980). “Sampling and Bayes inference in scientific modelling and robustness." Journal of the Royal Statistical Society, Series A, 143: 383-420.
  • \item[] Chaloner, K. (1984). “Optimal Bayesian experimental designs for linear models." The Annals of Statistics, 12: 283-300.
  • \item[] Chaloner, K., and Verdinelli, I. (1995). “Bayesian experimental design: A review." Statistical Science, 3: 273-304.
  • \item[] Chernoff, H. (1952). “A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations." Annals of Mathematical Statistics, 23: 493-507.
  • \item[] Clarke, B. (1996). “Implications of reference priors for prior information and for sample size." Journal of the American Statistical Association, 91: 173-184.
  • \item[] Cover, T. M., and Thomas, J. A. (1991). Elements of Information Theory. New York: Wiley.
  • \item[] Csiszár, I. (1963). “Eine informationstheoretische ungleichung und ihre Anwendung auf den beweis der ergodizitat von markoffschen ketten." Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 8: 85-108.
  • \item[] ––- (1967). “Information-type measures of difference of probability distributions, and indirect observations." Studia Scientarum Mathematicarum Hungarica, 2: 191-213.
  • \item[] Dawid, A. P. (1979). Discussion of “Reference posterior distributions for Bayesian inference," by J. Bernardo. Journal of the Royal Statistical Society, Series B, 41: 132-133.
  • \item[] ––- (1998). “Coherent measures of discrepancy, uncertainty and dependence, with applications to Bayesian predictive experimental design." Technical Report 139, Dept. of Statistical Science, University College, London.
  • \item[] Dawid, A. P., and Sebastiani, P. (1999). “Coherent dispersion criteria for optimal experimental design." The Annals of Statistics, 27: 65-81.
  • \item[] DeGroot, M. H. (1962). “Uncertainty, information and sequential experiments." Annals of Mathematical Statistics, 33: 404-419.
  • \item[] ––- (1970). Optimal Statistical Decisions. New York: McGraw-Hill.
  • \item[] ––- (1979). Discussion of “Reference posterior distributions for Bayesian inference," by J. Bernardo. Journal of the Royal Statistical Society, Series B, 41: 135-136.
  • \item[] ––- (1984). “Changes in utility as information." Theory and Decision, 17: 283-303.
  • \item[] Ebanks, B., Sahoo, P., and Sander, W. (1998). Characterization of Information Measures. Singapore: World Scientific Press.
  • \item[] Fazekas, I., and Liese, F. (1996). “Some properties of the Hellinger transform and its application in classification problems." Computers and Mathematical Applications, 31: 107-116.
  • \item[] Fisher, R. A. (1922). “On the mathematical foundations of theoretical statistics." Phyilosophical Transactions of the Royal Society, A 222: 309-368.
  • \item[] Goel, P. K. (1983). “Information measures and Bayesian hierarchical models." Journal of the American Statistical Association, 78: 408-410.
  • \item[] ––- (1988). “Comparison of experiments and information in censored data." In S. Gupta, and J. Berger (eds.), Statistical Decision Theory and Related Topics IV (Vol. 2), 335-349. New York: Springer Verlag.
  • \item[] Goel, P. K., and DeGroot, M. H. (1979). “Comparison of experiments and information measures." The Annals of Statistics, 5: 1066-1077.
  • \item[] ––- (1981). “Information about hyperparameters in hierarchical models." Journal of the American Statistical Association, 76: 140-147.
  • \item[] Goel, P. K., and Ginebra, J. (2003). “When is one experiment `always better than' another?" Journal of the Royal Statistical Society, Series D, 52: 515-537.
  • \item[] Goel, P. K., and Padilla, M. L. R. (1994). “Generalized Hellinger transforms as information measures." In ASA Proceedings of the Bayesian Statistical Science Section, 78-83.
  • \item[] Gollier, C. (2001). The Economics of Risk and Time. Cambridge: The MIT Press.
  • \item[] González, E., and Ginebra, J. (2001). “Bayesian heuristic for multiperiod control." Journal of the American Statistical Association, 96: 1113-1121.
  • \item[] Good, I. J. (1950). Probability and the Weighting of Evidence. New York: Hafner Publisher.
  • \item[] ––- (1960). “Weight of evidence, corroboration, explanatory power, information and the utility of experiments." Journal of the Royal Statistical Society, Series B, 22: 319-331.
  • \item[] ––- (1966). “A derivation of the probabilistic explicata of information." Journal of the Royal Statistical Society, Series B, 28: 578-581.
  • \item[] ––- (1979). “Studies in the history of probability and statistics. A. M. Turing's statistical work in world war II." Biometrika, 66: 393-396.
  • \item[] ––- (1985). “Weight of evidence: A brief survey." In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, and A. F. M. Smith (eds.), Bayesian Statistics 2, 249-270. Amsterdam: Elsevier Science Publisher.
  • \item[] Greenshtein, E. (1996). “Comparison of sequential experiments." The Annals of Statistics, 24: 436-448.
  • \item[] Hansen, O. H., and Torgersen, E. N. (1974). “Comparison of linear normal experiments." The Annals of Statistics, 2: 367-373.
  • \item[] Haussler, D., and Opper, M. (1997). “Mutual information, metric entropy and cumulative relative entropy risk." The Annals of Statistics, 25: 2451-2492.
  • \item[] Heyer, H. (1982). Theory of Statistical Experiments. New York: Springer Verlag.
  • \item[] Kiefer, J. (1959). “Optimum experimental designs (with discussion)." Journal of the Royal Statistical Society, Series B, 21: 272-319.
  • \item[] Kullback, S. (1959). Information Theory and Statistics. New York: Wiley.
  • \item[] Le Cam, L. (1964). “Sufficiency and approximate sufficiency." Annals of Mathematical Statistics, 35: 1419-1455.
  • \item[] ––- (1975). “Distances between experiments." In A Survey of Statistical Design and Linear Models, 383-396. Amsterdam: North Holland.
  • \item[] ––- (1986). Asymptotic Methods in Statistical Decision Theory. New York: Springer Verlag.
  • \item[] ––- (1996). “Comparison of experiments. A short review." In T. S. Ferguson, L. S. Shapley, and J. B. MacQueen (eds.), Statistics, Probability and Game Theory, 127-138. IMS Lecture Notes-Monograph Series, Vol. 30.
  • \item[] Le Cam, L., and Yang, G. L. (2000). Asymptotics in Statistics; Some Basic Concepts (2nd ed.). New York: Spriner Verlag.
  • \item[] Lehmann, E. L. (1959, 1986). Testing Statistical Hypothesis (1rst and 2nd ed.). New York: Wiley.
  • \item[] ––- (1983). Theory of Point Estimation. New York: Wiley.
  • \item[] ––- (1988). “Comparing location experiments." The Annals of Statistics, 16: 521-533.
  • \item[] Lindley, D. V. (1956). “On a measure of the information provided by an experiment." Annals of Mathematical Statistics, 27: 986-1005.
  • \item[] ––- (1961). “Dynamic programming and decision theory." Applied Statistics, 10: 39-51.
  • \item[] ––- (1972). Bayesian Statistics, a Review. Philadelphia: SIAM.
  • \item[] ––- (2000). “The philosophy of statistics (with discussion)." The Statistician, 49: 293-337.
  • \item[] Papaioannou, T. (2001). “On distances and measures of information: A case of diversity." In Ch. A. Charalambides, M. V. Koutras, and N. Balakrishnan (eds.), Probability and Statistical Models with Applications, 503-515. Boca Raton: Chapman and Hall.
  • \item[] Pukelsheim, F. (1993). Optimal Design of Experiments. New York: Wiley.
  • \item[] Raiffa, H., and Schlaifer, R. O. (1961). Applied Statistical Decision Theory. Cambridge: M.I.T. Press.
  • \item[] Rao, C. R. (1982). “Analysis of diversity: A unified approach." In S. Gupta, and J. Berger (eds.), Statistical Decision Theory and Related Topics III (Vol. 2), 233-250. New York: Academic Press.
  • \item[] Rényi, A. (1961). “On measures of entropy and information." In Proceedings 4th Berkeley Symposium on Mathematical Statistics and Probability, 547-561. Berkeley: University of California Press.
  • \item[] ––- (1967a). “Statistics and information theory." Studia Scientarum Mathematicarum Hungarica, 2: 249-256.
  • \item[] ––- (1967b). “On some basic problems of statistics from the point of view of information theory." In Proceedings 5th Berkeley Symposium on Mathematical Statistics and Probability, 531-543. Berkeley: University of California Press.
  • \item[] Rockafellar, R. T. (1970). Convex Analysis. Princeton New Jersey: University Press.
  • \item[] Savage, L. J. (1954). The Foundations of Statistics. New York: Wiley.
  • \item[] Sebastiani, P., and Wynn, H. P. (2000). “Maximum entropy sampling and optimal Bayesian experimental design." Journal of the Royal Statistical Society, Series B, 62: 145-157.
  • \item[] Shaked, M., and Shantikumar, J. G. (1994). Stochastic Orders and their Applications. New York: Academic Press.
  • \item[] Shannon, C. E. (1948). “A mathematical theory of communications." Bell System Tech. Journal, 27: 379-423, 623-656.
  • \item[] Sherman, S. (1951). “On a theorem of Hardy, Littlewood, Polya and Blackwell." Proceedings of the National Academy of Sciences, 37: 826-831.
  • \item[] Shiryaev, A. N., and Spokoiny, V. G. (2000). Statistical Experiments and Decisions. Asymptotic Theory. Singapore: World Scientific Press.
  • \item[] Shore, J. E., and Johnson, R. W. (1980). “Axiomatic derivation of the principle of maximum entropy and the principle of minimum cross-entropy." IEEE Transactions on Information Theory, 26: 26-37.
  • \item[] Silvey, S. D. (1964). “On a measure of association." Annals of Mathematical Statistics, 35: 1157-1166.
  • \item[] Soofi, E. (2000). “Principal information theoretic approaches." Journal of the American Statistical Association, 95: 1349-1353.
  • \item[] Stein, C. (1951). “Notes on a seminar on theoretical statistics; Comparison of experiments." Unpublished report.
  • \item[] Stone, M. (1961). “Non-equivalent comparison of experiments and their use for experiments involving location parameters." Annals of Mathematical Statistics, 32: 326-332.
  • \item[] Strasser, H. (1985). Mathematical Theory of Statistics: Statistical Experiments and Asymptotic Decision Theory. New York: Walter Gruyter.
  • \item[] Torgersen, E. N. (1976). “Comparison of statistical experiments (with discussion)." Scandinavian Journal of Statistics, 3: 186-208.
  • \item[] ––- (1991a). Comparison of Experiments. Cambridge: Cambridge University Press.
  • \item[] ––- (1991b). “Stochastic orders and comparison of experiments." In K. Mosley, and M. Scarsini (eds.), Stochastic Orders and Decision under Risk, 334-371. IMS Lecture Notes-Monograph Series, Vol. 19.
  • \item[] ––- (1994). “Information orderings and stochastic orderings." In Stochastic Orders and their Applications, 275-319. New York: Academic Press.
  • \item[] Vajda, I. (1989). Theory of Statistical Inference and Information. Dordretch: Kluwer.
  • \item[] Verdinelli, I., and Kadane, J. B. (1992). “Bayesian designs for maximizing information and output." Journal of the American Statistical Association, 87: 510-515.
  • \item[] Wald, A. (1950). Statistical Decision Functions. New York: Wiley.
  • \item[] Yuan, A., and Clarke, B. (1999). “A minimally informative likelihood for decision analysis: Robustness and illustration." Canadian Journal of Statistics, 27: 649-665.