Bayesian Analysis

On the Half-Cauchy Prior for a Global Scale Parameter

Nicholas G. Polson and James G. Scott

Full-text: Open access

Abstract

This paper argues that the half-Cauchy distribution should replace the inverse-Gamma distribution as a default prior for a top-level scale parameter in Bayesian hierarchical models, at least for cases where a proper prior is necessary. Our arguments involve a blend of Bayesian and frequentist reasoning, and are intended to complement the case made by Gelman (2006) in support of folded-$t$ priors. First, we generalize the half-Cauchy prior to the wider class of hypergeometric inverted-beta priors. We derive expressions for posterior moments and marginal densities when these priors are used for a top-level normal variance in a Bayesian hierarchical model. We go on to prove a proposition that, together with the results for moments and marginals, allows us to characterize the frequentist risk of the Bayes estimators under all global-shrinkage priors in the class. These results, in turn, allow us to study the frequentist properties of the half-Cauchy prior versus a wide class of alternatives. The half-Cauchy occupies a sensible middle ground within this class: it performs well near the origin, but does not lead to drastic compromises in other parts of the parameter space. This provides an alternative, classical justification for the routine use of this prior. We also consider situations where the underlying mean vector is sparse, where we argue that the usual conjugate choice of an inverse-gamma prior is particularly inappropriate, and can severely distort inference. Finally, we summarize some open issues in the specification of default priors for scale terms in hierarchical models.

Article information

Source
Bayesian Anal. Volume 7, Number 4 (2012), 887-902.

Dates
First available in Project Euclid: 27 November 2012

Permanent link to this document
http://projecteuclid.org/euclid.ba/1354024466

Digital Object Identifier
doi:10.1214/12-BA730

Mathematical Reviews number (MathSciNet)
MR3000018

Zentralblatt MATH identifier
1330.62148

Keywords
hierarchical models normal scale mixtures shrinkage

Citation

Polson, Nicholas G.; Scott, James G. On the Half-Cauchy Prior for a Global Scale Parameter. Bayesian Anal. 7 (2012), no. 4, 887--902. doi:10.1214/12-BA730. http://projecteuclid.org/euclid.ba/1354024466.


Export citation

References

  • Abramowitz, M. and Stegun, I. A. (eds.) (1964). Handbook of Mathematical Functions With Formulas, Graphs, and Mathematical Tables, volume 55 of Applied Mathematics Series. Washington, DC: National Bureau of Standards. Reprinted in paperback by Dover (1974); on-line at http://www.math.sfu.ca/~cbm/aands/.
  • Berger, J. O. (1980). “A robust generalized Bayes estimator and confidence region for a multivariate normal mean.” The Annals of Statistics, 8(4): 716–761.
  • Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). “The horseshoe estimator for sparse signals.” Biometrika, 97(2): 465–80.
  • Fourdrinier, D., Strawderman, W., and Wells, M. T. (1998). “On the construction of Bayes minimax estimators.” The Annals of Statistics, 26(2): 660–71.
  • Gelman, A. (2006). “Prior distributions for variance parameters in hierarchical models.” Bayesian Analysis, 1(3): 515–33.
  • George, E. I., Liang, F., and Xu, X. (2006). “Improved minimax predictive densities under Kullback-Leibler loss.” The Annals of Statistics, 34(1): 78–91.
  • Gordy, M. B. (1998). “A generalization of generalized beta distributions.” Finance and Economics Discussion Series 1998-18, Board of Governors of the Federal Reserve System (U.S.).
  • Gradshteyn, I. and Ryzhik, I. (1965). Table of Integrals, Series, and Products. Academic Press.
  • Griffin, J. and Brown, P. (2012). “Alternative prior distributions for variable selection with very many more variables than observations.” Australian and New Zealand Journal of Statistics. (to appear).
  • Hans, C. M. (2009). “Bayesian Lasso Regression.” Biometrika, 96(4): 835–45.
  • Jeffreys, H. (1961). Theory of Probability. Oxford University Press, 3rd edition.
  • Maruyama, Y. (1999). “Improving on the James–Stein estimator.” Statistics and Decisions, 14: 137–40.
  • Maruyama, Y. and George, E. I. (2010). “$g$BF: A Fully Bayes Factor with a Generalized g-prior.” Technical report, University of Tokyo, arXiv:0801.4410v2.
  • Morris, C. and Tang, R. (2011). “Estimating Random Effects via Adjustment for Density Maximization.” Statistical Science, 26(2): 271–87.
  • Park, T. and Casella, G. (2008). “The Bayesian Lasso.” Journal of the American Statistical Association, 103(482): 681–6.
  • Polson, N. G. and Scott, J. G. (2011). “Shrink globally, act locally: sparse Bayesian regularization and prediction.” In Bernardo, J., Bayarri, M., Berger, J. O., Dawid, A., Heckerman, D., Smith, A., and West, M. (eds.), Proceedings of the 9th Valencia World Meeting on Bayesian Statistics. Oxford University Press.
  • — (2012). “Local shrinkage rules, Lévy processes, and regularized regression.” Journal of the Royal Statistical Society (Series B). (to appear).
  • Scott, J. G. and Berger, J. O. (2006). “An exploration of aspects of Bayesian multiple testing.” Journal of Statistical Planning and Inference, 136(7): 2144–2162.
  • Spiegelhalter, D. J., Thomas, A., Best, N. G., Gilks, W. R., and Lunn, D. (1994, 2003). BUGS: Bayesian inference using Gibbs sampling. MRC Biostatistics Unit, Cambridge, England.
  • Stein, C. (1981). “Estimation of the mean of a multivariate normal distribution.” The Annals of Statistics, 9: 1135–51.
  • Strawderman, W. (1971). “Proper Bayes minimax estimators of the multivariate normal mean.” The Annals of Statistics, 42: 385–8.
  • Tiao, G. C. and Tan, W. (1965). “Bayesian analysis of random-effect models in the analysis of variance. I. Posterior distribution of variance components.” Biometrika, 51: 37–53.
  • Yang, R. and Berger, J. O. (1997). “A Catalog of Noninformative Priors.” Technical Report 42, Duke University Department of Statistical Science.