We propose a general maximum likelihood empirical Bayes (GMLEB) method for the estimation of a mean vector based on observations with i.i.d. normal errors. We prove that under mild moment conditions on the unknown means, the average mean squared error (MSE) of the GMLEB is within an infinitesimal fraction of the minimum average MSE among all separable estimators which use a single deterministic estimating function on individual observations, provided that the risk is of greater order than (log n)5/n. We also prove that the GMLEB is uniformly approximately minimax in regular and weak ℓp balls when the order of the length-normalized norm of the unknown means is between (log n)κ1/n1/(p∧2) and n/(log n)κ2. Simulation experiments demonstrate that the GMLEB outperforms the James–Stein and several state-of-the-art threshold estimators in a wide range of settings without much down side.
References
[1] Abramovich, F., Benjamini, Y., Donoho, D. L. and Johnstone, I. M. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
[2] Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
[3] Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. 3 203–268.
[4] Borell, C. (1975). The Brunn–Minkowski inequality in Gaussian space. Invent. Math. 30 207–216.
Mathematical Reviews (MathSciNet):
MR399402
[5] Brown, L. D. (1971). Admissible estimators, recurrent diffusions and insoluble boundary value problems. Ann. Math. Statist. 42 855–903.
Mathematical Reviews (MathSciNet):
MR286209
[6] Brown, L. D. and Greenshtein, E. (2007). Empirical Bayes and compound decision approaches for estimation of a high-dimensional vector of normal means. Ann. Statist. To appear.
[7] Cai, T. T. (2002). On block thresholding in wavelet regression. Statist. Sinica 12 1241–1273.
[8] Cai, T. T. and Silverman, B. W. (2001). Incorporating information on neighboring coefficients into wavelet estimation. Sankhyā Ser. B 63 127–148.
[9] Carathéodory, C. (1911). Über den variabilitätsbereich der fourierschen konstanten von positiven harmonischen funktionen. Rend. Circ. Mat. Palermo 32 193–217.
[10] Cover, T. M. (1984). An algorithm for maximizing expected log investment return. IEEE Trans. Inform. Theory 30 369–373.
Mathematical Reviews (MathSciNet):
MR754868
[11] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. Roy. Statist. Soc. Ser. B 39 1–38.
Mathematical Reviews (MathSciNet):
MR501537
[12] Donoho, D. L. and Johnstone, I. M. (1994a). Minimax risk over ℓp-balls for ℓq-error. Probab. Theory Related Fields 99 277–303.
[13] Donoho, D. L. and Johnstone, I. M. (1994b). Ideal spatial adaptation via wavelet shrinkage. Biometrika 81 425–455.
[14] Donoho, D. L. and Johnstone, I. M. (1995). Adapting to unknown smoothness via wavelet shrinkage. J. Amer. Statist. Assoc. 90 1200–1224.
[15] Efron, B. (2003). Robbins, empirical Bayes and microarrays. Ann. Statist. 31 366–378.
[16] Efron, B. and Morris, C. (1972). Empirical Bayes on vector observations: An extension of Stein’s method. Biometrika 59 335–347.
Mathematical Reviews (MathSciNet):
MR334386
[17] Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors—an empirical Bayes approach. J. Amer. Statist. Assoc. 68 117–130.
Mathematical Reviews (MathSciNet):
MR388597
[18] Foster, D. P. and George, E. I. (1994). The risk inflation criterion for multiple regression. Ann. Statist. 22 1947–1975.
[19] George, E. (1986). Mimimax multiple shrinkage estimation. Ann. Statist. 14 288–305.
Mathematical Reviews (MathSciNet):
MR829562
[20] Ghosal, S. and van der Vaart, A. W. (2001). Entropy and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Statist. 29 1233–1263.
[21] Ghosal, S. and van der Vaart, A. W. (2007). Posterior convergence rates for Dirichlet mixtures at smooth densities. Ann. Statist. 35 697–723.
[22] James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. Fourth Berkeley Symp. Math. Statist. and Prob. 1 361–379. Univ. California Press, Berkeley.
Mathematical Reviews (MathSciNet):
MR133191
[23] Johnstone, I. M. (1994). Minimax Bayes, asymptotic minimax and sparse wavelet priors. In Statistical Decision Theory and Related Topics V (S. Gupta and J. Berger, eds.) 303–326. Springer, New York.
[24] Johnstone, I. M. and Silverman, B. W. (2004). Needles and hay in haystacks: Empirical Bayes estimates of possibly sparse sequences. Ann. Statist. 32 1594–1649.
[25] Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27 887–906.
Mathematical Reviews (MathSciNet):
MR86464
[26] Morris, C. N. (1983). Parametric empirical Bayes inference: Theory and applications. J. Amer. Statist. Assoc. 78 47–55.
Mathematical Reviews (MathSciNet):
MR696849
[27] Robbins, H. (1951). Asymptotically subminimax solutions of compound statistical decision problems. In Proc. Second Berkeley Symp. Math. Statist. Probab. 1 131–148. Univ. California Press, Berkeley.
Mathematical Reviews (MathSciNet):
MR44803
[28] Robbins, H. (1956). An empirical Bayes approach to statistics. In Proc. Third Berkeley Symp. Math. Statist. Probab. 1 157–163. Univ. California Press, Berkeley.
Mathematical Reviews (MathSciNet):
MR84919
[29] Robbins, H. (1964). The empirical Bayes approach to statistical decision problems. Ann. Math. Statist. 35 1–20.
Mathematical Reviews (MathSciNet):
MR163407
[30] Robbins, H. (1983). Some thoughts on empirical Bayes estimation. Ann. Statist. 11 713–723.
Mathematical Reviews (MathSciNet):
MR707923
[31] Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proc. Third Berkeley Symp. Math. Statist. Probab. 1 157–163. Univ. California Press, Berkeley.
Mathematical Reviews (MathSciNet):
MR84922
[32] Tang, W. and Zhang, C.-H. (2005). Bayes and empirical Bayes approaches to controlling the false discovery rate. Technical Report 2005–2004, Dept. Statistics and Biostatistics, Rutgers Univ.
[33] Tang, W. and Zhang, C.-H. (2007). Empirical Bayes methods for controlling the false discovery rate with dependent data. In Complex Datasets and Inverse Problems: Tomography, Networks, and Beyond (R. Liu, W. Strawderman and C.-H. Zhang, eds.). Lecture Notes—Monograph Series 54 151–160. IMS, Beachwood, OH.
[34] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
[35] Vardi, Y. and Lee, D. (1993). From image deblurring to optimal investment: Maximum likelihood solutions for positive linear inverse problem (with discussion). J. Roy. Statist. Soc. Ser. B 55 569–612.
[36] Zhang, C.-H. (1997). Empirical Bayes and compound estimation of normal means. Statist. Sinica 7 181–193.
[37] Zhang, C.-H. (2003). Compound decision theory and empirical Bayes method. Ann. Statist. 31 379–390.
[38] Zhang, C.-H. (2005). General empirical Bayes wavelet methods and exactly adaptive minimax estimation. Ann. Statist. 33 54–100.
[39] Zhang, C.-H. (2008). Generalized maximum likelihood estimation of normal mixture densities. Statist. Sinica. To appear.