The Annals of Statistics

Exact minimax estimation of the predictive density in sparse Gaussian models

Abstract

We consider estimating the predictive density under Kullback–Leibler loss in an $\ell_{0}$ sparse Gaussian sequence model. Explicit expressions of the first order minimax risk along with its exact constant, asymptotically least favorable priors and optimal predictive density estimates are derived. Compared to the sparse recovery results involving point estimation of the normal mean, new decision theoretic phenomena are seen. Suboptimal performance of the class of plug-in density estimates reflects the predictive nature of the problem and optimal strategies need diversification of the future risk. We find that minimax optimal strategies lie outside the Gaussian family but can be constructed with threshold predictive density estimates. Novel minimax techniques involving simultaneous calibration of the sparsity adjustment and the risk diversification mechanisms are used to design optimal predictive density estimates.

Article information

Source
Ann. Statist., Volume 43, Number 3 (2015), 937-961.

Dates
Revised: June 2014
First available in Project Euclid: 15 May 2015

https://projecteuclid.org/euclid.aos/1431695634

Digital Object Identifier
doi:10.1214/14-AOS1251

Mathematical Reviews number (MathSciNet)
MR3346693

Zentralblatt MATH identifier
1328.62058

Citation

Mukherjee, Gourab; Johnstone, Iain M. Exact minimax estimation of the predictive density in sparse Gaussian models. Ann. Statist. 43 (2015), no. 3, 937--961. doi:10.1214/14-AOS1251. https://projecteuclid.org/euclid.aos/1431695634

References

• Aitchison, J. (1975). Goodness of prediction fit. Biometrika 62 547–554.
• Aitchison, J. and Dunsmore, I. R. (1975). Statistical Prediction Analysis. Cambridge Univ. Press, Cambridge.
• Aslan, M. (2006). Asymptotically minimax Bayes predictive densities. Ann. Statist. 34 2921–2938.
• Barndorff-Nielsen, O. E. and Cox, D. R. (1996). Prediction and asymptotics. Bernoulli 2 319–340.
• Bell, R. M. and Cover, T. M. (1980). Competitive optimality of logarithmic investment. Math. Oper. Res. 5 161–166.
• Brown, L. (1974). Lecture notes on statistical decision theory. Available at http://www-stat.wharton.upenn.edu/~lbrown.
• Brown, L. D., George, E. I. and Xu, X. (2008). Admissible predictive density estimation. Ann. Statist. 36 1156–1170.
• Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley, New York.
• Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over $l_{p}$-balls for $l_{q}$-error. Probab. Theory Related Fields 99 277–303.
• Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object. J. R. Stat. Soc. Ser. B. Stat. Methodol. 54 41–81.
• Fourdrinier, D., Marchand, É., Righi, A. and Strawderman, W. E. (2011). On improved predictive density estimation with parametric constraints. Electron. J. Stat. 5 172–191.
• Gatsonis, C. A. (1984). Deriving posterior distributions for a location parameter: A decision theoretic approach. Ann. Statist. 12 958–970.
• Geisser, S. (1993). Predictive Inference: An Introduction. Monographs on Statistics and Applied Probability 55. Chapman & Hall, New York.
• George, E. I., Liang, F. and Xu, X. (2006). Improved minimax predictive densities under Kullback–Leibler loss. Ann. Statist. 34 78–91.
• George, E. I., Liang, F. and Xu, X. (2012). From minimax shrinkage estimation to minimax shrinkage prediction. Statist. Sci. 27 82–94.
• Ghosh, M., Mergel, V. and Datta, G. S. (2008). Estimation, prediction and the Stein phenomenon under divergence loss. J. Multivariate Anal. 99 1941–1961.
• Hartigan, J. A. (1998). The maximum likelihood prior. Ann. Statist. 26 2083–2103.
• Johnstone, I. M. (2013). Gaussian estimation: Sequence and wavelet models. Available at http://www-stat.stanford.edu/~imj.
• Komaki, F. (1996). On asymptotic properties of predictive distributions. Biometrika 83 299–313.
• Komaki, F. (2001). A shrinkage predictive distribution for multivariate normal observables. Biometrika 88 859–864.
• Komaki, F. (2004). Simultaneous prediction of independent Poisson observables. Ann. Statist. 32 1744–1769.
• Larimore, W. E. (1983). Predictive inference, sufficiency, entropy and an asymptotic likelihood principle. Biometrika 70 175–181.
• McMillan, B. (1956). Two inequalities implied by unique decipherability. IRE Transactions on Information Theory 2 115–116.
• Mukherjee, G. (2013). Sparsity and shrinkage in predictive density estimation. Ph.D. thesis, Stanford Univ. Available at http://purl.stanford.edu/gm306wz2890.
• Mukherjee, G. and Johnstone, I. M. (2015). Supplement to “Exact minimax estimation of the predictive density in sparse Gaussian models.” DOI:10.1214/14-AOS1251SUPP.
• Murray, G. D. (1977). A note on the estimation of probability density functions. Biometrika 64 150–152.
• Ng, V. M. (1980). On the estimation of parametric density functions. Biometrika 67 505–506.
• Pinsker, M. S. (1980). Optimal filtration of square-integrable signals in Gaussian noise. Probl. Inf. Transm. 16 120–133. Originally in Russian in Problemy Peredachi Informatsii 16 52–67.
• Xu, X. and Liang, F. (2010). Asymptotic minimax risk of predictive density estimation for nonparametric regression. Bernoulli 16 543–560.
• Xu, X. and Zhou, D. (2011). Empirical Bayes predictive densities for high-dimensional normal models. J. Multivariate Anal. 102 1417–1428.

Supplemental materials

• Supplementary material to “Exact minimax estimation of the predictive density in sparse Gaussian models”. The supplement Mukherjee and Johnstone (2015) contains a brief description of the relevance of the predictive density estimation problem in related application areas along with the proof for the suboptimality of the univariate threshold density estimate $\hat{p}_{T,\mathrm{LF}}$ (in Section S.2) and the details of the proof of Proposition 1 (in Section S.3). The arguments for the maximum quadratic risk of hard threshold point estimates are reviewed in Section S.4 and the proof of Proposition 7 is presented in Section S.5. Links to R-codes used in producing Table 1 and Figure 3 are also provided.