Statistical Science

On the Sample Information About Parameter and Prediction

Nader Ebrahimi, Ehsan S. Soofi, and Refik Soyer

Full-text: Open access

Abstract

The Bayesian measure of sample information about the parameter, known as Lindley’s measure, is widely used in various problems such as developing prior distributions, models for the likelihood functions and optimal designs. The predictive information is defined similarly and used for model selection and optimal designs, though to a lesser extent. The parameter and predictive information measures are proper utility functions and have been also used in combination. Yet the relationship between the two measures and the effects of conditional dependence between the observable quantities on the Bayesian information measures remain unexplored. We address both issues. The relationship between the two information measures is explored through the information provided by the sample about the parameter and prediction jointly. The role of dependence is explored along with the interplay between the information measures, prior and sampling design. For the conditionally independent sequence of observable quantities, decompositions of the joint information characterize Lindley’s measure as the sample information about the parameter and prediction jointly and the predictive information as part of it. For the conditionally dependent case, the joint information about parameter and prediction exceeds Lindley’s measure by an amount due to the dependence. More specific results are shown for the normal linear models and a broad subfamily of the exponential family. Conditionally independent samples provide relatively little information for prediction, and the gap between the parameter and predictive information measures grows rapidly with the sample size. Three dependence structures are studied: the intraclass (IC) and serially correlated (SC) normal models, and order statistics. For IC and SC models, the information about the mean parameter decreases and the predictive information increases with the correlation, but the joint information is not monotone and has a unique minimum. Compensation of the loss of parameter information due to dependence requires larger samples. For the order statistics, the joint information exceeds Lindley’s measure by an amount which does not depend on the prior or the model for the data, but it is not monotone in the sample size and has a unique maximum.

Article information

Source
Statist. Sci., Volume 25, Number 3 (2010), 348-367.

Dates
First available in Project Euclid: 4 January 2011

Permanent link to this document
https://projecteuclid.org/euclid.ss/1294167964

Digital Object Identifier
doi:10.1214/10-STS329

Mathematical Reviews number (MathSciNet)
MR2791672

Zentralblatt MATH identifier
1329.62046

Keywords
Bayesian predictive distribution entropy mutual information optimal design reference prior intraclass correlation serial correlation order statistics

Citation

Ebrahimi, Nader; Soofi, Ehsan S.; Soyer, Refik. On the Sample Information About Parameter and Prediction. Statist. Sci. 25 (2010), no. 3, 348--367. doi:10.1214/10-STS329. https://projecteuclid.org/euclid.ss/1294167964


Export citation

References

  • Abramowitz, M. and Stegun, I. A. (1970). Handbook of Mathematical Functions, with Formulas, and Mathematical Tables. Dover, New York.
  • Aitchison, J. (1975). Goodness of prediction fit. Biometrika 62 547–554.
  • Amaral-Turkman, M. A. and Dunsmore, I. (1985). Measures of information in the predictive distribution. In Bayesian Statistics (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) 2 603–612. North-Holland, Amsterdam.
  • Arnold, B., Balakrishnan, N. and Nagaraja, H. N. (1992). First Course in Order Statistics. Wiley, New York.
  • Barlow, R. E. and Hsiung, J. H. (1983). Expected information from a life test experiment. The Statistician 48 18–21.
  • Belsley, D. A. (1991). Conditioning Diagnostics: Collinearity and Weak Data in Regression. Wiley, New York.
  • Bernardo, J. M. (2005). Reference analysis. In Handbook of Statistics (D. K. Dey and C. R. Rao, eds.) 25 17–90. Elsevier, Amsterdam.
  • Bernardo, J. M. and Rueda, R. (2002). Bayesian hypothesis testing: A reference approach. Int. Statist. Rev. 70 351–372.
  • Bernardo, J. M. (1979a). Expected information as expected utility. Ann. Statist. 7 686–690.
  • Bernardo, J. M. (1979b). Reference posterior distribution for Bayesian inference (with discussion). J. Roy. Statist. Soc. Ser. B 41 605–647.
  • Brooks, R. J. (1980). On the relative efficiency of two paired-data experiment. J. Roy. Statist. Soc. Ser. B 42 186–191.
  • Brooks, R. J. (1982). On loss of information through censoring. Biometrika 69 137–144.
  • Brown, L. D., George, E. I. and Xu, X. (2008). Admissible predictive density estimation. Ann. Statist. 36 1156–1170.
  • Chaloner, K. and Verdinelli, I. (1995). Bayesian experimental design: A review. Statist. Sci. 10 273–304.
  • Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley, New York.
  • Carlin, B. P. and Polson, N. G. (1991). An expected utility approach to influence diagnostics. J. Amer. Statist. Assoc. 87 1013–1021.
  • Carota, C., Parmigiani, G. and Polson, N. G. (1996). Diagnostic measures for model criticism. J. Amer. Statist. Assoc. 91 753–762.
  • Ebrahimi, N. (1992). Prediction intervals for future failures in the exponential distribution under hybrid censoring. IEEE Trans. Reliability 41 127–132.
  • Ebrahimi, N. and Soofi, E. S. (1990). Relative information loss under type II censored exponential data. Biometrika 77 429–435.
  • Ebrahimi, N., Soofi, E. S. and Soyer, R. (2010). Information measures in perspective. Int. Statist. Rev. 78 doi: 10.1111/j.1751-5823.2010.00105.x. To appear.
  • Ebrahimi, N., Soofi, E. S. and Zahedi, H. (2004). Information properties of order statistics and spacings. IEEE Trans. Inform. Theory 50 177–183.
  • El-Sayyed, G. M. (1969). Information and sampling from exponential distribution. Technometrics 11 41–45.
  • Geisser, S. (1993). Predictive Inference: An Introduction. Chapman and Hall, New York.
  • Good, I. J. (1971). Discussion of article by R. J. Buehler. In Foundations of Statistical Inference (V. P. Godambe and D. A. Sprott, eds.) 337–339. Holt, Rinehart and Winston, Toronto, ON.
  • Ibrahim, J. G. and Chen, M. H. (2000). Power prior distributions for regression models. Statist. Sci. 15 46–60.
  • Kaminsky, K. S. and Rhodin, L. S. (1985). Maximum likelihood prediction. Ann. Inst. Statist. Math. 37 505–517.
  • Keyes, T. K. and Levy, M. S. (1996). Goodness of prediction fit for multivariate linear models. J. Amer. Statist. Assoc. 91 191–197.
  • Kullback, S. (1959). Information Theory and Statistics. Wiley, New York (reprinted in 1968 by Dover).
  • Lawless, J. L. (1971). A prediction problem concerning samples from the exponential distribution with application in life testing. Technometrics 13 725–730.
  • Lindley, D. V. (1956). On a measure of information provided by an experiment. Ann. Math. Statist. 27 986–1005.
  • Lindley, D. V. (1957). Binomial sampling schemes and the concept of information. Biometrika 44 179–186.
  • Lindley, D. V. (1961). The use of prior probability distributions in statistical inference and decision. In Proceedings of the Fourth Berkeley Symposium Math. Statist. Probab. (J. Neyman, ed.) 1 436–468. Univ. California Press, Berkeley, CA.
  • Nicolae, D. L., Meng, X.-L. and Kong, A. (2008). Quantifying the fraction of missing information for hypothesis testing in statistical and genetic studies (with discussion). Statist. Sci. 23 287–331.
  • Maruyama, Y. and George, E. I. (2010). Fully Bayes model selection with a generalized g-prior. Working paper. Univ. Pennsylvania.
  • Müller, P. and Quintana, F. A. (2004). Nonparametric Bayesian data analysis. Statist. Sci. 23 287–331.
  • Parmigiani, G. and Berry, D. A. (1994). Applications of Lindley information measures to the design of clinical experiments. In Aspects of Uncertainty: A Tribute to D. V. Lindley (P. R. Freeman and A. F. M. Smith, eds.) 329–348. Wiley, Chichester, UK.
  • Polson, N. G. (1992). On the expected amount of information from a nonlinear model. J. Roy. Statist. Soc. Ser. B 54 889–895.
  • Polson, N. G. (1993). A Bayesian perspective on the design of accelerated life tests. In Advances in Reliability (A. P. Basu, ed.) 321–330. North-Holland, Amsterdam.
  • Pourahmadi, M. and Soofi, E. S. (2000). Predictive variance and information worth of observations in time series. J. Time Ser. Anal. 21 413–434.
  • San Martini, A. and Spezzaferri, F. (1984). A predictive model selection criteria. J. Roy. Statist. Soc. Ser. B 46 296–303.
  • Shannon, C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J. 27 379–423.
  • Singpurwalla, N. D. (1996). Entropy and information in reliability. In Bayesian Analysis of Statistics and Econometrics: Essays in Honor of Arnold Zellner (D. Berry, K. Chaloner and J. Geweke, eds.) 459–469. Wiley, New York.
  • Smith, A. F. M. and Verdinelli, I. (1980). A note on Bayesian design for inference using ahierarchical linear model. Biometrika 67 613–619.
  • Soofi, E. S. (1988). Principal component regression under exchangeability. Comm. Statist. Theory and Methods A17 1717–1733.
  • Soofi, E. S. (1990). Effects of collinearity on information about regression coefficients. J. Econometrics 43 255–274.
  • Stewart, G. W. (1987). On collinearity and least squares regression (with discussion). Statist. Sci. 2 68–100.
  • Stone, M. (1959). Application of a measure of information to the design and comparison of regression experiments. Ann. Math. Statist. 29 55–70.
  • Turrero, A. (1989). On the relative efficiency of grouped and censored survival data. Biometrika 76 125–131.
  • Verdinelli, I. (1992). Advances in Bayesian experimental design. In Bayesian Statistics (J. O. Berger, J. M. Bernardo, A. P. Dawid and A. F. M. Smith, eds.) 4 467–481. Wiley, New York.
  • Verdinelli, I. and Kadane, J. B. (1992). Bayesian designs for maximizing information and outcome. J. Amer. Statist. Assoc. 87 510–515.
  • Verdinelli, I., Polson, N. G. and Singpurwalla, N. D. (1993). Shannon information and Bayesian design for prediction in accelerated life-testing. In Reliability and Decision Making (R. E. Barlow, C. A. Clarotti and F. Spizzichino, eds.) 247–256. Chapman and Hall, London.
  • West, M. (2003). Bayesian factor regression models in the “large p, small n” paradigm. In Bayesian Statist. (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 7 723–732. Oxford Univ. Press.
  • Yuan, A. and Clarke, B. (1999). An information criterion for likelihood selection. IEEE Trans. Inform. Theory 45 562–571.
  • Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In Bayesian Inference and Decision Techniques (P. Goel and A. Zellner, eds.) 233–243. North-Holland, Amsterdam.
  • Zellner, A. (1977). Maximal data information prior distributions. In New Developments in the Applications of Bayesian Methods (A. Aykac and C. Brumat, eds.) 211–232. North-Holland, Amsterdam.
  • Zellner, A. (1988). Optimal information processing and Bayes’ theorem (with discussion). Amer. Statist. 42 278–284.