## The Annals of Statistics

### On the contraction properties of some high-dimensional quasi-posterior distributions

#### Abstract

We study the contraction properties of a quasi-posterior distribution $\check{\Pi}_{n,d}$ obtained by combining a quasi-likelihood function and a sparsity inducing prior distribution on $\mathbb{R}^{d}$, as both $n$ (the sample size), and $d$ (the dimension of the parameter) increase. We derive some general results that highlight a set of sufficient conditions under which $\check{\Pi}_{n,d}$ puts increasingly high probability on sparse subsets of $\mathbb{R}^{d}$, and contracts toward the true value of the parameter. We apply these results to the analysis of logistic regression models, and binary graphical models, in high-dimensional settings. For the logistic regression model, we shows that for well-behaved design matrices, the posterior distribution contracts at the rate $O(\sqrt{s_{\star}\log(d)/n})$, where $s_{\star}$ is the number of nonzero components of the parameter. For the binary graphical model, under some regularity conditions, we show that a quasi-posterior analog of the neighborhood selection of [Ann. Statist. 34 (2006) 1436–1462] contracts in the Frobenius norm at the rate $O(\sqrt{(p+S)\log(p)/n})$, where $p$ is the number of nodes, and $S$ the number of edges of the true graph.

#### Article information

Source
Ann. Statist., Volume 45, Number 5 (2017), 2248-2273.

Dates
Revised: September 2016
First available in Project Euclid: 31 October 2017

https://projecteuclid.org/euclid.aos/1509436834

Digital Object Identifier
doi:10.1214/16-AOS1526

Mathematical Reviews number (MathSciNet)
MR3718168

Zentralblatt MATH identifier
1383.62058

Subjects
Primary: 62F15: Bayesian inference 62Jxx: Linear inference, regression

#### Citation

Atchadé, Yves A. On the contraction properties of some high-dimensional quasi-posterior distributions. Ann. Statist. 45 (2017), no. 5, 2248--2273. doi:10.1214/16-AOS1526. https://projecteuclid.org/euclid.aos/1509436834

#### References

• [1] Alquier, P. and Lounici, K. (2011). PAC-Bayesian bounds for sparse regression estimation with exponential weights. Electron. J. Stat. 5 127–145.
• [2] Arias-Castro, E. and Lounici, K. (2014). Estimation and variable selection with exponential weights. Electron. J. Stat. 8 328–354.
• [3] Atchadé, Y. F. (2014). Estimation of high-dimensional partially-observed discrete Markov random fields. Electron. J. Stat. 8 2242–2263.
• [4] Atchadé, Y. F. (2015). A Moreau-Yosida approximation scheme for high-dimensional posterior and quasi-posterior distributions. Available at arXiv:1505.07072.
• [5] Atchadé, Y. F. (2015). A scalable quasi-Bayesian framework for Gaussian graphical models. Available at arXiv:1512.07934.
• [6] Atchadé, Y. F. (2017). Supplement to “On the contraction properties of some high-dimensional quasi-posterior distributions.” DOI:10.1214/16-AOS1526SUPP.
• [7] Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
• [8] Banerjee, S. and Ghosal, S. (2015). Bayesian structure learning in graphical models. J. Multivariate Anal. 136 147–162.
• [9] Barber, R. F. and Drton, M. (2015). High-dimensional Ising model selection with Bayesian information criteria. Electron. J. Stat. 9 567–607.
• [10] Baricz, A. (2008). Mills’ ratio: Monotonicity patterns and functional inequalities. J. Math. Anal. Appl. 340 1362–1370.
• [11] Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B 36 192–236.
• [12] Castillo, I., Schmidt-Hieber, J. and van der Vaart, A. (2015). Bayesian linear regression with sparse priors. Ann. Statist. 43 1986–2018.
• [13] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069–2101.
• [14] Catoni, O. (2004). Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Math. 1851. Springer, Berlin.
• [15] Chernozhukov, V. and Hong, H. (2003). An MCMC approach to classical estimation. J. Econometrics 115 293–346.
• [16] Dalalyan, A. S. and Tsybakov, A. B. (2007). Aggregation by exponential weighting and sharp oracle inequalities. In Learning Theory. Lecture Notes in Computer Science 4539 97–111. Springer, Berlin.
• [17] Florens, J.-P. and Simoni, A. (2012). Nonparametric estimation of an instrumental regression: A quasi-Bayesian approach based on regularized posterior. J. Econometrics 170 458–475.
• [18] Ghosh, J. K. and Ramamoorthi, R. V. (2003). Bayesian Nonparametrics. Springer Series in Statistics. Springer, New York.
• [19] Höfling, H. and Tibshirani, R. (2009). Estimation of sparse binary pairwise Markov networks using pseudo-likelihoods. J. Mach. Learn. Res. 10 883–906.
• [20] Kato, K. (2013). Quasi-Bayesian analysis of nonparametric instrumental variables models. Ann. Statist. 41 2359–2390.
• [21] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. Ann. Statist. 34 837–877.
• [22] Li, C. and Jiang, W. (2014). Model selection for likelihood-free Bayesian methods based on moment conditions: Theory and numerical examples. Available at arXiv:1405.6693v1.
• [23] Li, Y.-H., Scarlett, J., Ravikumar, P. and Cevher, V. (2014). Sparsistency of $\ell_{1}$-regularized M-estimators. Preprint. Available at arXiv:1410.7605v1.
• [24] Liao, Y. and Jiang, W. (2011). Posterior consistency of nonparametric conditional moment restricted models. Ann. Statist. 39 3003–3031.
• [25] Marin, J.-M., Pudlo, P., Robert, C. P. and Ryder, R. J. (2012). Approximate Bayesian computational methods. Stat. Comput. 22 1167–1180.
• [26] McAllester, D. A. (1999). Some pac-Bayesian theorems. Mach. Learn. 37 355–363.
• [27] Meinshausen, N. and Buhlmann, P. (2006). High-dimensional graphs with the lasso. Ann. Statist. 34 1436–1462.
• [28] Mitchell, T. J. and Beauchamp, J. J. (1988). Bayesian variable selection in linear regression. J. Amer. Statist. Assoc. 83 1023–1036.
• [29] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $m$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
• [30] Ravikumar, P., Wainwright, M. J. and Lafferty, J. D. (2010). High-dimensional Ising model selection using $\ell_{1}$-regularized logistic regression. Ann. Statist. 38 1287–1319.
• [31] Schreck, A., Fort, G., Le Corff, S. and Moulines, E. (2013). A shrinkage-thresholding Metropolis adjusted Langevin algorithm for Bayesian variable selection. Available at arXiv:1312.5658.
• [32] Sun, T. and Zhang, C.-H. (2013). Sparse matrix inversion with scaled lasso. J. Mach. Learn. Res. 14 3385–3418.
• [33] Yang, W. and He, X. (2012). Bayesian empirical likelihood for quantile regression. Ann. Statist. 40 1102–1131.
• [34] Zhang, T. (2006). From $\varepsilon$-entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 2180–2210.

#### Supplemental materials

• Supplement to “On the contraction properties of some high-dimensional quasi-posterior distributions”. The supplementary material contains the proof of Theorems 4, 9 and 10.