Open Access
2013 Hierarchical Bayes, maximum a posteriori estimators, and minimax concave penalized likelihood estimation
Robert L. Strawderman, Martin T. Wells, Elizabeth D. Schifano
Electron. J. Statist. 7: 973-990 (2013). DOI: 10.1214/13-EJS795
Abstract

Priors constructed from scale mixtures of normal distributions have long played an important role in decision theory and shrinkage estimation. This paper demonstrates equivalence between the maximum aposteriori estimator constructed under one such prior and Zhang’s minimax concave penalization estimator. This equivalence and related multivariate generalizations stem directly from an intriguing representation of the minimax concave penalty function as the Moreau envelope of a simple convex function. Maximum aposteriori estimation under the corresponding marginal prior distribution, a generalization of the quasi-Cauchy distribution proposed by Johnstone and Silverman, leads to thresholding estimators having excellent frequentist risk properties.

References

1.

[1] Abramowitz, M. and Stegun, I. (1970)., Handbook of mathematical functions. Dover Publications Inc., New York.[1] Abramowitz, M. and Stegun, I. (1970)., Handbook of mathematical functions. Dover Publications Inc., New York.

2.

[2] Antoniadis, A. and Fan, J. (2001). Regularization of Wavelet Approximations., J. Am. Statist. Assoc. 96 939-955. MR1946364 10.1198/016214501753208942[2] Antoniadis, A. and Fan, J. (2001). Regularization of Wavelet Approximations., J. Am. Statist. Assoc. 96 939-955. MR1946364 10.1198/016214501753208942

3.

[3] Armagan, A., Dunson, D. and Lee, J. (2011). Generalized double Pareto shrinkage., ArXiv e-printsMR3076161[3] Armagan, A., Dunson, D. and Lee, J. (2011). Generalized double Pareto shrinkage., ArXiv e-printsMR3076161

4.

[4] Baricz, A. (2008). Mills’ ratio: Monotonicity patterns and functional inequalities., J. Math. Anal. Applic. 340 1362-1370. MR2390935 10.1016/j.jmaa.2007.09.063[4] Baricz, A. (2008). Mills’ ratio: Monotonicity patterns and functional inequalities., J. Math. Anal. Applic. 340 1362-1370. MR2390935 10.1016/j.jmaa.2007.09.063

5.

[5] Berger, J. O. and Robert, C. (1990). Subjective hierarchical Bayes estimation of a multivariate normal mean: on the frequentist interface., Ann. Statist. 18 617–651. MR1056330 10.1214/aos/1176347619 euclid.aos/1176347619 [5] Berger, J. O. and Robert, C. (1990). Subjective hierarchical Bayes estimation of a multivariate normal mean: on the frequentist interface., Ann. Statist. 18 617–651. MR1056330 10.1214/aos/1176347619 euclid.aos/1176347619

6.

[6] Berger, J. O. and Strawderman, W. E. (1996). Choice of hierarchical priors: admissibility in estimation of normal means., Ann. Statist. 24 931–951. MR1401831 10.1214/aos/1032526950 euclid.aos/1032526950 [6] Berger, J. O. and Strawderman, W. E. (1996). Choice of hierarchical priors: admissibility in estimation of normal means., Ann. Statist. 24 931–951. MR1401831 10.1214/aos/1032526950 euclid.aos/1032526950

7.

[7] Berger, J. O., Strawderman, W. E. and Tang, D. (2005). Posterior Propriety and Admissibility of Hyperpriors in Normal Hierarchical Models., Ann. Statist. 33 606–646. MR2163154 10.1214/009053605000000075 euclid.aos/1117114331 [7] Berger, J. O., Strawderman, W. E. and Tang, D. (2005). Posterior Propriety and Admissibility of Hyperpriors in Normal Hierarchical Models., Ann. Statist. 33 606–646. MR2163154 10.1214/009053605000000075 euclid.aos/1117114331

8.

[8] Box, G. E. P. and Tiao, G. C. (1992)., Bayesian Inference in Statistical Analysis (1973 ed., Wiley Classics Library). John Wiley and Sons, New York. MR418321[8] Box, G. E. P. and Tiao, G. C. (1992)., Bayesian Inference in Statistical Analysis (1973 ed., Wiley Classics Library). John Wiley and Sons, New York. MR418321

9.

[9] Breheny, P. and Huang, J. (2009). Penalized methods for bi-level variable selection., Stat. Interface 2 369–380. MR2540094 10.4310/SII.2009.v2.n3.a10[9] Breheny, P. and Huang, J. (2009). Penalized methods for bi-level variable selection., Stat. Interface 2 369–380. MR2540094 10.4310/SII.2009.v2.n3.a10

10.

[10] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection., Ann. Appl. Stat. 5 232–253. MR2810396 10.1214/10-AOAS388 euclid.aoas/1300715189 [10] Breheny, P. and Huang, J. (2011). Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection., Ann. Appl. Stat. 5 232–253. MR2810396 10.1214/10-AOAS388 euclid.aoas/1300715189

11.

[11] Bruce, A. G. and Gao, H. Y. (1996)., Applied Wavelet Analysis with S-Plus. Springer, New York.[11] Bruce, A. G. and Gao, H. Y. (1996)., Applied Wavelet Analysis with S-Plus. Springer, New York.

12.

[12] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals., Biometrika 97 465-480. MR2650751 10.1093/biomet/asq017[12] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals., Biometrika 97 465-480. MR2650751 10.1093/biomet/asq017

13.

[13] Chen, M.-H., Ibrahim, J. G. and Shao, Q.-M. (2006). Posterior Propriety and Computation for the Cox Regression Model with Applications to Missing Covariates., Biometrika 93 pp. 791-807. MR2285072 10.1093/biomet/93.4.791[13] Chen, M.-H., Ibrahim, J. G. and Shao, Q.-M. (2006). Posterior Propriety and Computation for the Cox Regression Model with Applications to Missing Covariates., Biometrika 93 pp. 791-807. MR2285072 10.1093/biomet/93.4.791

14.

[14] Chen, M.-H. and Shao, Q.-M. (2001). Propriety of Posterior Distribution for Dichotomous Quantal Response Models., Proceedings of the American Mathematical Society 129 pp. 293-302. MR1694452 10.1090/S0002-9939-00-05513-1[14] Chen, M.-H. and Shao, Q.-M. (2001). Propriety of Posterior Distribution for Dichotomous Quantal Response Models., Proceedings of the American Mathematical Society 129 pp. 293-302. MR1694452 10.1090/S0002-9939-00-05513-1

15.

[15] Cox, D. R. (1972). Regression Models and Life-Tables., Journal of the Royal Statistical Society. Series B (Methodological) 34 pp. 187-220. MR341758[15] Cox, D. R. (1972). Regression Models and Life-Tables., Journal of the Royal Statistical Society. Series B (Methodological) 34 pp. 187-220. MR341758

16.

[16] Fan, J. and Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties., J. Am. Statist. Assoc. 96 1348–1360. MR1946581 10.1198/016214501753382273[16] Fan, J. and Li, R. (2001). Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties., J. Am. Statist. Assoc. 96 1348–1360. MR1946581 10.1198/016214501753382273

17.

[17] Fourdrinier, D., Strawderman, W. E. and Wells, M. T. (1998). On the construction of Bayes minimax estimators., Ann. Statist. 26 660–671. MR1626063 10.1214/aos/1028144853 euclid.aos/1028144853 [17] Fourdrinier, D., Strawderman, W. E. and Wells, M. T. (1998). On the construction of Bayes minimax estimators., Ann. Statist. 26 660–671. MR1626063 10.1214/aos/1028144853 euclid.aos/1028144853

18.

[18] Gao, H. and Bruce, A. G. (1997). Waveshrink with firm shrinkage., Statist. Sinica 7 855–874. MR1488646[18] Gao, H. and Bruce, A. G. (1997). Waveshrink with firm shrinkage., Statist. Sinica 7 855–874. MR1488646

19.

[19] Gomez-Sanchez-Manzano, E., Gomez-Villegas, M. A. and Marin, J. M. (2008). Multivariate exponential power distributions as mixtures of normal distributions with Bayesian applications., Comm. Stat. Thry. Meth. 37 972-985.[19] Gomez-Sanchez-Manzano, E., Gomez-Villegas, M. A. and Marin, J. M. (2008). Multivariate exponential power distributions as mixtures of normal distributions with Bayesian applications., Comm. Stat. Thry. Meth. 37 972-985.

20.

[20] Griffin, J. E. and Brown, P. J. (2007). Bayesian adaptive Lassos with non-convex penalization. Technical Report, Dept. of Statistics, University of, Warwick.[20] Griffin, J. E. and Brown, P. J. (2007). Bayesian adaptive Lassos with non-convex penalization. Technical Report, Dept. of Statistics, University of, Warwick.

21.

[21] Griffin, J. E. and Brown, P. J. (2010). Inference with normal-gamma prior distributions in regression problems., Bayesian Analysis 6 171–188. MR2596440 10.1214/10-BA507 euclid.ba/1340369797 [21] Griffin, J. E. and Brown, P. J. (2010). Inference with normal-gamma prior distributions in regression problems., Bayesian Analysis 6 171–188. MR2596440 10.1214/10-BA507 euclid.ba/1340369797

22.

[22] Hans, C. (2009). Bayesian Lasso regression., Biometrika 96 835–845. MR2564494 10.1093/biomet/asp047[22] Hans, C. (2009). Bayesian Lasso regression., Biometrika 96 835–845. MR2564494 10.1093/biomet/asp047

23.

[23] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences., Ann. Statist. 32 1594–1649. MR2089135 10.1214/009053604000000030 euclid.aos/1091626180 [23] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences., Ann. Statist. 32 1594–1649. MR2089135 10.1214/009053604000000030 euclid.aos/1091626180

24.

[24] Kass, R. E. and Wasserman, L. (1996). The Selection of Prior Distributions by Formal Rules., Journal of the American Statistical Association 91 pp. 1343-1370.[24] Kass, R. E. and Wasserman, L. (1996). The Selection of Prior Distributions by Formal Rules., Journal of the American Statistical Association 91 pp. 1343-1370.

25.

[25] Mazumder, R., Friedman, J. H. and Hastie, T. (2011). SparseNet: Coordinate Descent With Nonconvex Penalties., Journal of the American Statistical Association 106 1125-1138. MR2894769 10.1198/jasa.2011.tm09738[25] Mazumder, R., Friedman, J. H. and Hastie, T. (2011). SparseNet: Coordinate Descent With Nonconvex Penalties., Journal of the American Statistical Association 106 1125-1138. MR2894769 10.1198/jasa.2011.tm09738

26.

[26] Park, T. and Casella, G. (2008). The Bayesian Lasso., J. Am. Statist. Assoc. 103 681–686. MR2524001 10.1198/016214508000000337[26] Park, T. and Casella, G. (2008). The Bayesian Lasso., J. Am. Statist. Assoc. 103 681–686. MR2524001 10.1198/016214508000000337

27.

[27] Polson, N. G. and Scott, J. G. (2011). Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction (with discussion). In, Bayesian Statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman and M. West, eds.) 501–525. Oxford University Press. MR3204017 10.1093/acprof:oso/9780199694587.003.0017[27] Polson, N. G. and Scott, J. G. (2011). Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction (with discussion). In, Bayesian Statistics 9 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman and M. West, eds.) 501–525. Oxford University Press. MR3204017 10.1093/acprof:oso/9780199694587.003.0017

28.

[28] Robert, C. P. (2007)., The Bayesian Choice. Springer-Verlag, New York. MR2723361[28] Robert, C. P. (2007)., The Bayesian Choice. Springer-Verlag, New York. MR2723361

29.

[29] Rockafellar, R. T. and Wets, R. J. B. (2004)., Variational Analysis. Springer-Verlag, Berlin. MR1491362[29] Rockafellar, R. T. and Wets, R. J. B. (2004)., Variational Analysis. Springer-Verlag, Berlin. MR1491362

30.

[30] Sampford, M. R. (1953). Some Inequalities on Mill’s Ratio and Related Functions., Ann. Math. Statist. 24 130–132. MR54890 10.1214/aoms/1177729093 euclid.aoms/1177729093 [30] Sampford, M. R. (1953). Some Inequalities on Mill’s Ratio and Related Functions., Ann. Math. Statist. 24 130–132. MR54890 10.1214/aoms/1177729093 euclid.aoms/1177729093

31.

[31] Schifano, E. D. (2010). Topics in Penalized Estimation PhD thesis, Cornell, University. MR2801759[31] Schifano, E. D. (2010). Topics in Penalized Estimation PhD thesis, Cornell, University. MR2801759

32.

[32] Schifano, E. D., Strawderman, R. L. and Wells, M. T. (2010). Majorization-minimization algorithms for nonsmoothly penalized objective functions., Electron. J. Stat. 4 1258–1299. MR2738533 10.1214/10-EJS582 euclid.ejs/1289575960 [32] Schifano, E. D., Strawderman, R. L. and Wells, M. T. (2010). Majorization-minimization algorithms for nonsmoothly penalized objective functions., Electron. J. Stat. 4 1258–1299. MR2738533 10.1214/10-EJS582 euclid.ejs/1289575960

33.

[33] Strawderman, W. E. (1971). Proper Bayes minimax estimators of the normal multivariate normal distribution., Ann. Math. Statist. 42 385-388. MR397939 10.1214/aoms/1177693528 euclid.aoms/1177693528 [33] Strawderman, W. E. (1971). Proper Bayes minimax estimators of the normal multivariate normal distribution., Ann. Math. Statist. 42 385-388. MR397939 10.1214/aoms/1177693528 euclid.aoms/1177693528

34.

[34] Strawderman, R. L. and Wells, M. T. (2012). On Hierarchical Prior Specifications and Penalized Likelihood. In, Contemporary Developments in Bayesian Analysis and Statistical Decision Theory: A Festricht for William E. Strawderman, (D. Fourdrinier, E. Marchand and A. Ruhkin, eds.) 8 154-180. Institute of Mathematical Statistics, Hayward, CA. MR3202509 10.1214/11-IMSCOLL811[34] Strawderman, R. L. and Wells, M. T. (2012). On Hierarchical Prior Specifications and Penalized Likelihood. In, Contemporary Developments in Bayesian Analysis and Statistical Decision Theory: A Festricht for William E. Strawderman, (D. Fourdrinier, E. Marchand and A. Ruhkin, eds.) 8 154-180. Institute of Mathematical Statistics, Hayward, CA. MR3202509 10.1214/11-IMSCOLL811

35.

[35] Takada, Y. (1979). Stein’s positive part estimator and Bayes estimator., Ann. Inst. Statist. Math. 31 177-183. MR550792 10.1007/BF02480275[35] Takada, Y. (1979). Stein’s positive part estimator and Bayes estimator., Ann. Inst. Statist. Math. 31 177-183. MR550792 10.1007/BF02480275

36.

[36] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso., J. R. Statist. Soc. B 58 267–288. MR1379242[36] Tibshirani, R. (1996). Regression Shrinkage and Selection via the Lasso., J. R. Statist. Soc. B 58 267–288. MR1379242

37.

[37] Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine., J. Mach. Learn. Res. 1 211–244. MR1875838[37] Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine., J. Mach. Learn. Res. 1 211–244. MR1875838

38.

[38] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., J. R. Statist. Soc. B 68 49–67. MR2212574 10.1111/j.1467-9868.2005.00532.x[38] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables., J. R. Statist. Soc. B 68 49–67. MR2212574 10.1111/j.1467-9868.2005.00532.x

39.

[39] Zhang, C.-H. (2010). Nearly Unbiased Variable Selection Under Minimax Concave Penalty., Ann. Statist. 38 894–942. MR2604701 10.1214/09-AOS729 euclid.aos/1266586618 [39] Zhang, C.-H. (2010). Nearly Unbiased Variable Selection Under Minimax Concave Penalty., Ann. Statist. 38 894–942. MR2604701 10.1214/09-AOS729 euclid.aos/1266586618

40.

[40] Zlobec, S. (2003). Estimating convexifiers in continuous optimization., Math. Comm. 8 129-137. MR2026391[40] Zlobec, S. (2003). Estimating convexifiers in continuous optimization., Math. Comm. 8 129-137. MR2026391

41.

[41] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models., Ann. Statist. 36 1509–1533. MR2435443 10.1214/009053607000000802 euclid.aos/1216237287 [41] Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models., Ann. Statist. 36 1509–1533. MR2435443 10.1214/009053607000000802 euclid.aos/1216237287
Copyright © 2013 The Institute of Mathematical Statistics and the Bernoulli Society
Robert L. Strawderman, Martin T. Wells, and Elizabeth D. Schifano "Hierarchical Bayes, maximum a posteriori estimators, and minimax concave penalized likelihood estimation," Electronic Journal of Statistics 7(none), 973-990, (2013). https://doi.org/10.1214/13-EJS795
Published: 2013
Back to Top