The Annals of Statistics

Exact minimax estimation of the predictive density in sparse Gaussian models

Gourab Mukherjee and Iain M. Johnstone

Full-text: Open access


We consider estimating the predictive density under Kullback–Leibler loss in an $\ell_{0}$ sparse Gaussian sequence model. Explicit expressions of the first order minimax risk along with its exact constant, asymptotically least favorable priors and optimal predictive density estimates are derived. Compared to the sparse recovery results involving point estimation of the normal mean, new decision theoretic phenomena are seen. Suboptimal performance of the class of plug-in density estimates reflects the predictive nature of the problem and optimal strategies need diversification of the future risk. We find that minimax optimal strategies lie outside the Gaussian family but can be constructed with threshold predictive density estimates. Novel minimax techniques involving simultaneous calibration of the sparsity adjustment and the risk diversification mechanisms are used to design optimal predictive density estimates.

Article information

Ann. Statist., Volume 43, Number 3 (2015), 937-961.

Received: January 2014
Revised: June 2014
First available in Project Euclid: 15 May 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62C20: Minimax procedures
Secondary: 62M20: Prediction [See also 60G25]; filtering [See also 60G35, 93E10, 93E11] 60G25: Prediction theory [See also 62M20] 91G70: Statistical methods, econometrics

Predictive density risk diversification minimax sparsity high-dimensional mutual information plug-in risk thresholding


Mukherjee, Gourab; Johnstone, Iain M. Exact minimax estimation of the predictive density in sparse Gaussian models. Ann. Statist. 43 (2015), no. 3, 937--961. doi:10.1214/14-AOS1251.

Export citation


  • Aitchison, J. (1975). Goodness of prediction fit. Biometrika 62 547–554.
  • Aitchison, J. and Dunsmore, I. R. (1975). Statistical Prediction Analysis. Cambridge Univ. Press, Cambridge.
  • Aslan, M. (2006). Asymptotically minimax Bayes predictive densities. Ann. Statist. 34 2921–2938.
  • Barndorff-Nielsen, O. E. and Cox, D. R. (1996). Prediction and asymptotics. Bernoulli 2 319–340.
  • Bell, R. M. and Cover, T. M. (1980). Competitive optimality of logarithmic investment. Math. Oper. Res. 5 161–166.
  • Brown, L. (1974). Lecture notes on statistical decision theory. Available at
  • Brown, L. D., George, E. I. and Xu, X. (2008). Admissible predictive density estimation. Ann. Statist. 36 1156–1170.
  • Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory. Wiley, New York.
  • Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over $l_{p}$-balls for $l_{q}$-error. Probab. Theory Related Fields 99 277–303.
  • Donoho, D. L., Johnstone, I. M., Hoch, J. C. and Stern, A. S. (1992). Maximum entropy and the nearly black object. J. R. Stat. Soc. Ser. B. Stat. Methodol. 54 41–81.
  • Fourdrinier, D., Marchand, É., Righi, A. and Strawderman, W. E. (2011). On improved predictive density estimation with parametric constraints. Electron. J. Stat. 5 172–191.
  • Gatsonis, C. A. (1984). Deriving posterior distributions for a location parameter: A decision theoretic approach. Ann. Statist. 12 958–970.
  • Geisser, S. (1993). Predictive Inference: An Introduction. Monographs on Statistics and Applied Probability 55. Chapman & Hall, New York.
  • George, E. I., Liang, F. and Xu, X. (2006). Improved minimax predictive densities under Kullback–Leibler loss. Ann. Statist. 34 78–91.
  • George, E. I., Liang, F. and Xu, X. (2012). From minimax shrinkage estimation to minimax shrinkage prediction. Statist. Sci. 27 82–94.
  • Ghosh, M., Mergel, V. and Datta, G. S. (2008). Estimation, prediction and the Stein phenomenon under divergence loss. J. Multivariate Anal. 99 1941–1961.
  • Hartigan, J. A. (1998). The maximum likelihood prior. Ann. Statist. 26 2083–2103.
  • Johnstone, I. M. (2013). Gaussian estimation: Sequence and wavelet models. Available at
  • Komaki, F. (1996). On asymptotic properties of predictive distributions. Biometrika 83 299–313.
  • Komaki, F. (2001). A shrinkage predictive distribution for multivariate normal observables. Biometrika 88 859–864.
  • Komaki, F. (2004). Simultaneous prediction of independent Poisson observables. Ann. Statist. 32 1744–1769.
  • Larimore, W. E. (1983). Predictive inference, sufficiency, entropy and an asymptotic likelihood principle. Biometrika 70 175–181.
  • McMillan, B. (1956). Two inequalities implied by unique decipherability. IRE Transactions on Information Theory 2 115–116.
  • Mukherjee, G. (2013). Sparsity and shrinkage in predictive density estimation. Ph.D. thesis, Stanford Univ. Available at
  • Mukherjee, G. and Johnstone, I. M. (2015). Supplement to “Exact minimax estimation of the predictive density in sparse Gaussian models.” DOI:10.1214/14-AOS1251SUPP.
  • Murray, G. D. (1977). A note on the estimation of probability density functions. Biometrika 64 150–152.
  • Ng, V. M. (1980). On the estimation of parametric density functions. Biometrika 67 505–506.
  • Pinsker, M. S. (1980). Optimal filtration of square-integrable signals in Gaussian noise. Probl. Inf. Transm. 16 120–133. Originally in Russian in Problemy Peredachi Informatsii 16 52–67.
  • Xu, X. and Liang, F. (2010). Asymptotic minimax risk of predictive density estimation for nonparametric regression. Bernoulli 16 543–560.
  • Xu, X. and Zhou, D. (2011). Empirical Bayes predictive densities for high-dimensional normal models. J. Multivariate Anal. 102 1417–1428.

Supplemental materials

  • Supplementary material to “Exact minimax estimation of the predictive density in sparse Gaussian models”. The supplement Mukherjee and Johnstone (2015) contains a brief description of the relevance of the predictive density estimation problem in related application areas along with the proof for the suboptimality of the univariate threshold density estimate $\hat{p}_{T,\mathrm{LF}}$ (in Section S.2) and the details of the proof of Proposition 1 (in Section S.3). The arguments for the maximum quadratic risk of hard threshold point estimates are reviewed in Section S.4 and the proof of Proposition 7 is presented in Section S.5. Links to R-codes used in producing Table 1 and Figure 3 are also provided.