We study frequentist properties of Bayesian and model selection, with a focus on (potentially non-linear) high-dimensional regression. We propose a construction to study how posterior probabilities and normalized criteria concentrate on the (Kullback-Leibler) optimal model and other subsets of the model space. When such concentration occurs, one also bounds the frequentist probabilities of selecting the correct model, type I and type II errors. These results hold generally, and help validate the use of posterior probabilities and criteria to control frequentist error probabilities associated to model selection and hypothesis tests. Regarding regression, we help understand the effect of the sparsity imposed by the prior or the penalty, and of problem characteristics such as the sample size, signal-to-noise, dimension and true sparsity. A particular finding is that one may use less sparse formulations than would be asymptotically optimal, but still attain consistency and often also significantly better finite-sample performance. We also prove new results related to misspecifying the mean or covariance structures, and give tighter rates for certain non-local priors than currently available.
DR was partially funded by the Europa Excelencia grant EUR2020-112096, NIH grant R01 CA158113-01, Ramón y Cajal Fellowship RYC-2015-18544, Plan Estatal PGC2018-101643-B-I00 and Ayudas Fundación BBVA a equipos de investigación científica en Big Data 2017.
The author thanks Gabor Lugosi and James O. Berger for helpful discussions, and the Editors and Referees for invaluable feedback in improving the exposition of this manuscript.
"Concentration of Posterior Model Probabilities and Normalized Criteria." Bayesian Anal. Advance Publication 1 - 27, 2021. https://doi.org/10.1214/21-BA1262