## Statistical Science

### Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors

#### Abstract

In this paper, we introduce a new concept for constructing prior distributions. We exploit the natural nested structure inherent to many model components, which defines the model component to be a flexible extension of a base model. Proper priors are defined to penalise the complexity induced by deviating from the simpler base model and are formulated after the input of a user-defined scaling parameter for that model component, both in the univariate and the multivariate case. These priors are invariant to reparameterisations, have a natural connection to Jeffreys’ priors, are designed to support Occam’s razor and seem to have excellent robustness properties, all which are highly desirable and allow us to use this approach to define default prior distributions. Through examples and theoretical results, we demonstrate the appropriateness of this approach and how it can be applied in various situations.

#### Article information

Source
Statist. Sci. Volume 32, Number 1 (2017), 1-28.

Dates
First available in Project Euclid: 6 April 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1491465621

Digital Object Identifier
doi:10.1214/16-STS576

#### Citation

Simpson, Daniel; Rue, Håvard; Riebler, Andrea; Martins, Thiago G.; Sørbye, Sigrunn H. Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors. Statist. Sci. 32 (2017), no. 1, 1--28. doi:10.1214/16-STS576. https://projecteuclid.org/euclid.ss/1491465621

#### References

• Aitchison, J. (2003). The Statistical Analysis of Compositional Data. The Blackburn Press, Caldwell, NJ.
• Barnard, J., McCulloch, R. and Meng, X.-L. (2000). Modeling covariance matrices in terms of standard deviations and correlations, with application to shrinkage. Statist. Sinica 10 1281–1311.
• Bayarri, M. J. and García-Donato, G. (2008). Generalization of Jeffreys divergence-based priors for Bayesian hypothesis testing. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 981–1003.
• Berger, J. (2006). The case for objective Bayesian analysis. Bayesian Anal. 1 385–402.
• Berger, J. O., Bernardo, J. M. and Sun, D. (2009). The formal definition of reference priors. Ann. Statist. 37 905–938.
• Berger, J. O., Bernardo, J. M. and Sun, D. (2015). Overall objective priors. Bayesian Anal. 10 189–221.
• Bernardinelli, L., Clayton, D. and Montomoli, C. (1995). Bayesian estimates of disease maps: How important are priors? Stat. Med. 14 2411–2431.
• Bernardo, J.-M. (1979). Reference posterior distributions for Bayesian inference. J. Roy. Statist. Soc. Ser. B 41 113–147.
• Bernardo, J. M. (2011). Integrated objective Bayesian estimation and hypothesis testing. In Bayesian Statistics 9 1–68. Oxford Univ. Press, Oxford.
• Besag, J., York, J. and Mollié, A. (1991). Bayesian image restoration, with two applications in spatial statistics. Ann. Inst. Statist. Math. 43 1–59.
• Bhattacharya, A., Pati, D., Pillai, N. S. and Dunson, D. B. (2012). Bayesian shrinkage. Preprint. Available at arXiv:1212.6088.
• Bochkina, N. A. and Green, P. J. (2014). The Bernstein–von Mises theorem and nonregular models. Ann. Statist. 42 1850–1878.
• Browne, W. J. and Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Anal. 1 473–513 (electronic).
• Byrne, S. and Girolami, M. (2013). Geodesic Monte Carlo on embedded manifolds. Scand. J. Stat. 40 825–845.
• Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465–480.
• Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. W. (2014). Bayesian linear regression with sparse priors. Preprint. Available at arXiv:1403.0735.
• Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069–2101.
• Chib, S. and Greenberg, E. (1998). Analysis of multivariate probit models. Biometrika 85 347–361.
• Consonni, G. and Veronese, P. (2008). Compatibility of prior specifications across linear models. Statist. Sci. 23 332–353.
• Cui, Y., Hodges, J. S., Kong, X. and Carlin, B. P. (2010). Partitioning degrees of freedom in hierarchical and other richly parameterized models. Technometrics 52 124–136.
• Dean, C. B., Ugarte, M. D. and Militino, A. F. (2001). Detecting interaction between random region and fixed age effects in disease mapping. Biometrics 57 197–202.
• Draper, D. (2006). Coherence and calibration: Comments on subjectivity and “objectivity” in Bayesian analysis (comment on articles by Berger and by Goldstein). Bayesian Anal. 1 423–427 (electronic).
• Erisman, A. M. and Tinney, W. F. (1975). On computing certain elements of the inverse of a sparse matrix. Commun. ACM 18 177–179.
• Evans, M. and Jang, G. H. (2011). Weak informativity and the information in one prior relative to another. Statist. Sci. 26 423–439.
• Fong, Y., Rue, H. and Wakefield, J. (2010). Bayesian inference for generalized linear mixed models. Biostat. 11 397–412.
• Frühwirth-Schnatter, S. and Wagner, H. (2010). Stochastic model specification search for Gaussian and partial non-Gaussian state space models. J. Econometrics 154 85–100.
• Frühwirth-Schnatter, S. and Wagner, H. (2011). Bayesian variable selection for random intercept modeling of Gaussian and non-Gaussian data. In Bayesian Statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 165–200. Oxford Univ. Press, Oxford.
• Fuglstad, G.-A., Simpson, D., Lindgren, F. and Rue, H. (2015). Interpretable priors for hyperparameters for Gaussian random fields. Preprint. Available at arXiv:1503.00256.
• Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Anal. 1 515–533 (electronic).
• Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2 1360–1383.
• Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2013). Bayesian Data Analysis. CRC Press, London.
• Genest, C., Weerahandi, S. and Zidek, J. V. (1984). Aggregating opinions through logarithmic pooling. Theory and Decision 17 61–70.
• George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
• Geweke, J. (2006). Bayesian treatment of the independent Student-$t$ linear model. J. Appl. Econometrics 8 S19–S40.
• Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
• Ghosh, M. (2011). Objective priors: An introduction for frequentists. Statist. Sci. 26 187–202.
• Ghosh, J., Li, Y. and Mitra, R. (2015). On the use of Cauchy prior distributions for Bayesian logistic regression. Preprint. Available at arXiv:1507.07170.
• Goldstein, M. (2006). Subjective Bayesian analysis: Principles and practice. Bayesian Anal. 1 403–420 (electronic).
• Guo, J., Rue, H. and Riebler, A. (2015). Bayesian bivariate meta-analysis of diagnostic test studies with interpretable priors. Preprint. Available at arXiv:1512.06217.
• Gustafson, P. (2005). On model expansion, model contraction, identifiability and prior information: Two illustrative scenarios involving mismeasured variables. Statist. Sci. 20 111–140.
• Hastie, T. and Tibshirani, R. (1987). Generalized additive models: Some applications. J. Amer. Statist. Assoc. 82 371–386.
• Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43. Chapman & Hall, London.
• He, Y. and Hodges, J. S. (2008). Point estimates for variance-structure parameters in Bayesian analysis of hierarchical models. Comput. Statist. Data Anal. 52 2560–2577.
• He, Y., Hodges, J. S. and Carlin, B. P. (2007). Re-considering the variance parameterization in multiple precision models. Bayesian Anal. 2 529–556.
• Henderson, R., Shimakura, S. and Gorst, D. (2002). Modeling spatial variation in leukemia survival data. J. Amer. Statist. Assoc. 97 965–972.
• Hodges, J. S. (2014). Richly Parameterized Linear Models: Additive, Time Series, and Spatial Models Using Random Effects. CRC Press, Boca Raton, FL.
• Ishwaran, H. and Rao, J. S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Statist. 33 730–773.
• James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361–379. Univ. California Press, Berkeley, CA.
• Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proc. Roy. Soc. London Ser. A. 186 453–461.
• Jeffreys, H. (1961). Theory of Probability, 3rd ed. Clarendon Press, Oxford.
• Johnson, V. E. and Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 143–170.
• Jones, M. C. and Pewsey, A. (2009). Sinh-arcsinh distributions. Biometrika 96 761–780.
• Kamary, K. and Robert, C. P. (2014). Reflecting about selecting noninformative priors. Int. J. Appl. Comput. Math. 3.
• Kamary, K., Mengersen, K., Robert, C. P. and Rousseau, J. (2014). Testing hypotheses via a mixture estimation model. Preprint. Available at arXiv:1412.2044.
• Kass, R. E. and Wasserman, L. (1995). A reference Bayesian test for nested hypotheses and its relationship to the Schwarz criterion. J. Amer. Statist. Assoc. 90 928–934.
• Kass, R. and Wasserman, L. (1996). The selection of prior distributions by formal rules. J. Amer. Statist. Assoc. 91 1343–1370.
• Kennedy, M. C. and O’Hagan, A. (2001). Bayesian calibration of computer models. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 425–464.
• Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Stat. 22 79–86.
• Lawson, A. B. (2006). Statistical Methods in Spatial Epidemiology, 2nd ed. Wiley, Chichester.
• Lawson, A. B. (2009). Bayesian Disease Mapping: Hierarchical Modeling in Spatial Epidemiology, 2nd ed. CRC Press, Boca Raton, FL.
• Le Cam, L. (1990). Maximum likelihood: An introduction. Int. Stat. Rev. 153–171.
• Lee, J. M. (2003). Smooth Manifolds. Springer, Berlin.
• Lid Hjort, N., Holmes, C., Müller, P. and Walker, S. G., eds. (2010). Bayesian Nonparametrics. Cambridge Series in Statistical and Probabilistic Mathematics 28. Cambridge Univ. Press, Cambridge.
• Lindgren, F. and Rue, H. (2008). On the second-order random walk model for irregular locations. Scand. J. Stat. 35 691–700.
• Lindgren, F., Rue, H. and Lindström, J. (2011). An explicit link between Gaussian fields and Gaussian Markov random fields: The stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B Stat. Methodol. 73 423–498.
• Lindley, D. V. (1983). Empirical Bayes inference: Theory and applications: Comment. J. Amer. Statist. Assoc. 78 61–62.
• Liu, C. (2001). Baysian analysis of multivariate probit models—Discussion on the art of data augmentation by Van Dyk and Meng. J. Comput. Graph. Statist. 10 75–81.
• Lu, H., Hodges, J. S. and Carlin, B. P. (2007). Measuring the complexity of generalized linear hierarchical models. Canad. J. Statist. 35 69–87.
• Lunn, D., Spiegelhalter, D., Thomas, A. and Best, N. (2009). The BUGS project: Evolution, critique and future directions. Stat. Med. 28 3049–3067.
• Martins, T. G. and Rue, H. (2013). Prior for flexibility parameters: The Student’s $t$ case. Technical report No. S8-2013. Department of mathematical sciences, NTNU, Norway.
• Martins, T. G., Simpson, D., Lindgren, F. and Rue, H. (2013). Bayesian computing with INLA: New features. Comput. Statist. Data Anal. 67 68–83.
• Muff, S., Riebler, A., Held, L., Rue, H. and Saner, P. (2015). Bayesian analysis of measurement error models using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. C. Appl. Stat. 64 231–252.
• Natário, I. and Knorr-Held, L. (2003). Non-parametric ecological regression and spatial variation. Biom. J. 45 670–688.
• O’Hagan, A. and Pericchi, L. (2012). Bayesian heavy-tailed models and conflict resolution: A review. Braz. J. Probab. Stat. 26 372–401.
• Palacios, M. B. and Steel, M. F. J. (2006). Non-Gaussian Bayesian geostatistical modeling. J. Amer. Statist. Assoc. 101 604–618.
• Park, T. and Casella, G. (2008). The Bayesian lasso. J. Amer. Statist. Assoc. 103 681–686.
• Pati, A. B. D., Pillai, N. S. and Dunson, D. B. (2014). Dirichlet–Laplace priors for optimal shrinkage. Preprint. Available at arXiv:1401.5398.
• Piironen, J. and Vehtari, A. (2015). Projection predictive variable selection using Stan$+$ R. Preprint. Available at arXiv:1508.02502.
• Polson, N. G. and Scott, J. G. (2012). On the half-Cauchy prior for a global scale parameter. Bayesian Anal. 7 887–902.
• Rapisarda, F., Brigo, D. and Mercurio, F. (2007). Parameterizing correlations: A geometric interpretation. IMA J. Manag. Math. 18 55–73.
• Reich, B. J. and Hodges, J. S. (2008). Modeling longitudinal spatial periodontal data: A spatially adaptive model with tools for specifying priors and checking fit. Biometrics 64 790–799.
• Reid, N., Mukerjee, R. and Fraser, D. A. S. (2003). Some aspects of matching priors. In Mathematical Statistics and Applications: Festschrift for Constance van Eeden. Institute of Mathematical Statistics Lecture Notes—Monograph Series 42 31–43. IMS, Beachwood, OH.
• Riebler, A., Sørbye, S. H., Simpson, D. and Rue, H. (2016). An intuitive Bayesian spatial model for disease mapping that accounts for scaling. Stat. Methods Med. Res. 25 1145–1165.
• Robert, C. P. (2007). The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd ed. Springer, New York.
• Robert, C. P., Chopin, N. and Rousseau, J. (2009). Harold Jeffreys’s theory of probability revisited. Statist. Sci. 24 141–172.
• Roos, M. and Held, L. (2011). Sensitivity analysis in Bayesian generalized linear mixed models for binary data. Bayesian Anal. 6 259–278.
• Roos, M., Martins, T. G., Held, L. and Rue, H. (2015). Sensitivity analysis for Bayesian hierarchical models. Bayesian Anal. 10 321–349.
• Rousseau, J. (2015). Comment on article by Berger, Bernardo, and Sun [MR3420902]. Bayesian Anal. 10 233–236.
• Rousseau, J. and Robert, C. P. (2011). On moment priors for Bayesian model choice: a discussion. Discussion of “Moment priors for Bayesian model choice with applications to directed acyclic graphs” by G. Consonni and L. La Rocca. In Bayesian Statistics 9 136–137. Oxford University Press, Oxford.
• Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
• Rue, H. and Held, L. (2005). Gaussian Markov Random Fields: Theory and Applications. Monographs on Statistics and Applied Probability 104. Chapman & Hall, Boca Raton, FL.
• Rue, H. and Martino, S. (2007). Approximate Bayesian inference for hierarchical Gaussian Markov random field models. J. Statist. Plann. Inference 137 3177–3192.
• Rue, H., Martino, S. and Chopin, N. (2009). Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. Ser. B Stat. Methodol. 71 319–392.
• Self, S. G. and Liang, K.-Y. (1987). Asymptotic properties of maximum likelihood estimators and likelihood ratio tests under nonstandard conditions. J. Amer. Statist. Assoc. 82 605–610.
• Simpson, D., Rue, H., Riebler, A., Martins, T. G. and Sørbye, S. H. (2016). Supplement to “Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors.” DOI:10.1214/16-STS576SUPP.
• Sørbye, S. H. and Rue, H. (2011). Simultaneous credible bands for latent Gaussian models. Scand. J. Stat. 38 712–725.
• Sørbye, S. H. and Rue, H. (2014). Scaling intrinsic Gaussian Markov random field priors in spatial modelling. Spat. Stat. 8 39–51.
• Spiegelhalter, D. J., Thomas, A., Best, N. G. and Gilks, W. R. (1995). BUGS: Bayesian inference using Gibbs sampling Version 0.50. MRC Biostatistics Unit, Cambridge.
• Stein, C. M. (1981). Estimation of the mean of a multivariate normal distribution. Ann. Statist. 9 1135–1151.
• Talhouk, A., Doucet, A. and Murphy, K. (2012). Efficient Bayesian inference for multivariate probit models with sparse inverse correlation matrices. J. Comput. Graph. Statist. 21 739–757.
• van der Pas, S. L., Kleijn, B. J. K. and van der Vaart, A. W. (2014). The horseshoe estimator: Posterior concentration around nearly black vectors. Electron. J. Stat. 8 2585–2618.
• Wakefield, J. (2007). Disease mapping and spatial regression with count data. Biostat. 8 158–183.
• Wakefield, J. C., Best, N. G. and Waller, L. A. (2000). Bayesian approaches to disease mapping. In Spatial Epidemiology: Methods and Applications (P. Elliot, J. C. Wakefield, N. G. Best and D. J. Briggs, eds.) 104–107. Oxford Univ. Press, Oxford.
• Wakefield, J. and Lyons, H. (2010). Spatial aggregation and the ecological fallacy. In Handbook of Spatial Statistics 541–558. CRC Press, Boca Raton, FL.
• Waller, L. and Carlin, B. (2010). Disease mapping. In Handbook of Spatial Statistics (A. E. Gelfand, P. J. Diggle, M. Fuentes and P. Guttorp, eds.). Handbooks for Modern Statistical Methods 14 217–243. Chapman & Hall/CRC, London.
• Watanabe, S. (2009). Algebraic Geometry and Statistical Learning Theory. Cambridge Monographs on Applied and Computational Mathematics 25. Cambridge Univ. Press, Cambridge.
• Wood, S. and Kohn, R. (1998). A Bayesian approach to robust binary nonparametric regression. J. Amer. Statist. Assoc. 93 203–213.

• Supplement to “Penalising Model Component Complexity: A Principled, Practical Approach to Constructing Priors”. The supplementary material contains the proofs of all theorems contained in the paper. It also contains a detailed description of the Student $t$-simulation study used in Section 3.4. The R-code for analysing all examples and generating the corresponding figures in this report, is available at www.r-inla.org/examples/case-studies/.