Institute of Mathematical Statistics Collections

On hierarchical prior specifications and penalized likelihood

Robert L. Strawderman and Martin T. Wells

Full-text: Open access

Abstract

Using a Bayesian model with a class of hierarchically specified scale-mixture-of-normals priors as motivation, we consider a generalization of the grouped LASSO in which an additional penalty is placed on the penalty parameter of the L2 norm. We show that the resulting MAP estimator obtained by jointly minimizing the corresponding objective function in both the mean and penalty parameter is a thresholding estimator that generalizes (i) the grouped lasso estimator of Yuan and Lin (2006) and (ii) the univariate minimax concave penalization procedure of Zhang (2010) to the setting of a vector of parameters. An exact formula for the risk and a corresponding SURE formula are obtained for the proposed class of estimators.

A new universal threshold is proposed under appropriate sparsity assumptions; in combination with the proposed class of estimators, we subsequently obtain a new and interesting motivation for the class of positive part estimators. In particular, we establish that the original positive part estimator corresponds to a suboptimal choice of this thresholding parameter. Numerical comparisons between the proposed class of estimators and the positive part estimator show that the former can achieve further, significant reductions in risk near the origin.

Chapter information

Source
Dominique Fourdrinier, Éric Marchand and Andrew L. Rukhin, eds., Contemporary Developments in Bayesian Analysis and Statistical Decision Theory: A Festschrift for William E. Strawderman (Beachwood, Ohio, USA: Institute of Mathematical Statistics, 2012), 154-180

Dates
First available: 14 March 2012

Permanent link to this document
http://projecteuclid.org/euclid.imsc/1331731618

Digital Object Identifier
doi:10.1214/11-IMSCOLL811

Subjects
Primary: 62C10: Bayesian problems; characterization of Bayes procedures 62C12: Empirical decision procedures; empirical Bayes procedures 62C20: Minimax procedures 62F10: Point estimation 62F15: Bayesian inference

Keywords
hierarchical models grouped lasso lasso maximum aposteriori estimate minimax concave penalty penalized likelihood positive-part estimator regularization restricted parameter space Stein estimation

Citation

Strawderman, Robert L.; Wells, Martin T. On hierarchical prior specifications and penalized likelihood. Contemporary Developments in Bayesian Analysis and Statistical Decision Theory: A Festschrift for William E. Strawderman, 154--180, Institute of Mathematical Statistics, Beachwood, Ohio, USA, 2012. doi:10.1214/11-IMSCOLL811. http://projecteuclid.org/euclid.imsc/1331731618.


Export citation

References

  • [1] Baranchik, A. J. (1964). Multiple regression and estimation of the mean of a multivariate normal distribution Technical Report No. 51, Department of Statistics, Stanford University.
  • [2] Baranchik, A. J. (1970). A family of minimax estimators of the mean of a multivariate normal distribution. Annals of Mathematical Statistics 41 642–645.
  • [3] Berger, J. O. (1976). Admissible minimax estimation of a multivariate normal mean with arbitrary quadratic loss. Annals of Statistics 4 223–226.
  • [4] Berger, J. O. (1980). A robust generalized Bayes estimator and confidence region for a multivariate normal mean. Annals of Statistics 8 716–761.
  • [5] Berger, J. O. and Strawderman, W. E. (1996). Choice of hierarchical priors: admissibility in estimation of normal means. Annals of Statistics 24 931–951.
  • [6] Berger, J. O., Strawderman, W. E. and Tang, D. (2005). Posterior propriety and admissibility of hyperpriors in normal hierarchical models. Annals of Statistics 33 606–646.
  • [7] Bishop, C. M. and Tipping, M. E. (2000). Variational relevance vector machines. In UAI ’00: Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence ( C. Boutilier and M. Goldszmidt, eds.) 46–53. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
  • [8] Bock, M. E. (1988). Shrinkage estimator: pseudo-Bayes estimators for normal mean vectors. In Statistical Decision Theory and Related Topics 4, ( S. S. Gupta and J. Berger, eds.) 1 281–298. Springer-Verlag, New York.
  • [9] Brandwein, A. C. and Strawderman, W. E. (1990). Stein Estimation: the spherically symmetric case. Statistical Science 5 356–369.
  • [10] Brown, L. D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Annals of Mathematical Statistics 42 855–903.
  • [11] Brown, L. D. (1988). The differential inequality of a statistical estimation problem. In Statistical Decision Theory and Related Topics 4, ( S. S. Gupta and J. Berger, eds.) 1 299–324. Springer-Verlag, New York.
  • [12] Cai, T. T. and Zhou, H. H. (2009). A data-driven block thresholding approach to wavelet estimation. Ann. Statist. 37 569–595.
  • [13] Carvalho, C. M., Polson, N. G. and Scott, J. G. (2010). The horseshoe estimator for sparse signals. Biometrika 97 465-480.
  • [14] Casella, G. and Strawderman, W. E. (1981). Estimating a bounded normal mean. Annals of Statistics 9 870-878.
  • [15] Donoho, D. and Johnstone, I. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • [16] Donoho, D. and Johnstone, I. (1995). Adapting to unknown smoothness via wavelet shrinkage. Journal of the American Statistical Association 90 1200–1224.
  • [17] Efron, B. and Morris, C. (1973). Stein’s estimation rule and its competitors–an empirical Bayes approach. Journal of the American Statistical Association 68 117–130.
  • [18] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association 456.
  • [19] Figueiredo, M. A. T. (2003). Adaptive sparseness for supervised learning. Pattern Analysis and Machine Intelligence, IEEE Transactions on 25 1150–1159.
  • [20] Fourdrinier, D., Strawderman, W. E. and Wells, M. T. (1998). On the construction of Bayes minimax estimators. Annals of Statistics 26 660–671.
  • [21] Fourdrinier, D. and Wells, M. T. (2010). On loss estimation. Statistical Science (to appear).
  • [22] Ghosh, M. (1992). Hierarchical and empirical Bayes multivariate estimation. In Current Issues in Statistical Inference: Essays in Honor of D. Basu. IMS Lecture Notes Monogr. Ser. 17 151–177. Institute of Mathematical Statistics, Hayward, CA.
  • [23] Gomez-Sanchez-Manzano, E., Gomez-Villegas, M. A. and Marin, J. M. (2008). Multivariate exponential power distributions as mixtures of normal distributions with Bayesian applications. Communications in Statistics - Theory and Methods 37 972–985.
  • [24] Griffin, J. E. and Brown, P. J. (2005). Alternative prior distributions for variable selection with very many more variables than observations Technical Report, Department of Statistics, University of Warwick.
  • [25] Griffin, J. E. and Brown, P. J. (2007). Bayesian adaptive lassos with non-convex penalization Technical Report, Department of Statistics, University of Warwick.
  • [26] Gupta, A. K. and Peña, E. A. (1991). A simple motivation for James-Stein estimators. Statistics and Probability Letters 12 337–340.
  • [27] James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361–379. Univ. California Press, Berkeley.
  • [28] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: empirical Bayes estimates of possibly sparse sequences. Annals of Statistics 32 1594–1649.
  • [29] Johnstone, I. M. and Silverman, B. W. (2005). Empirical Bayes selection of wavelet thresholds. Annals of Statistics 33 1700–1752.
  • [30] Marchand, E. and Perron, F. (2001). Improving on the MLE of a bounded normal mean. Annals of Statistics 29 1078–1093.
  • [31] Mazumder, R., Friedman, J. and Hastie, T. (2009). SparseNet: coordinate descent with non-convex penalties Technical Report, Department of Statistics, Stanford University.
  • [32] Moulin, P. and Liu, J. (1999). Analysis of multiresolution image denoising schemes using generalized Gaussian and complexity priors. IEEE Transactions on Information Theory 45 909–919.
  • [33] Natalini, P. and Palumbo, B. (2000). Inequalities for the incomplete gamma function. Mathematical Inequalities & Applications 3 69–77.
  • [34] Robert, C. (1988). An explicit formula for the risk of the positive-part James-Stein estimator. The Canadian Journal of Statistics 16 161–168.
  • [35] Schifano, E. D. (2010). Topics in Penalized Estimation PhD Dissertation, Cornell University, Department of Statistical Science.
  • [36] Shao, P. Y. and Strawderman, W. E. (1994). Improving on the James-Stein positive-part estimator. Annals of Statistics 22 1517–1538.
  • [37] Shevade, and Keerthy, S. S. (2003). A simple and efficient algorithm for gene selection using sparse logistic regression. Bioinformatics 19 2246–2253.
  • [38] Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivariate distribution. In Proc. 1st Berkeley Sympos. Math. Statist. and Prob., Vol. I 197–206. Univ. California Press, Berkeley.
  • [39] Stein, C. (1981). Estimation of the mean of multivariate normal distribution. Annals of Statistics 9 1135–1151.
  • [40] Strawderman, W. E. (1971). Proper Bayes minimax estimators of the multivariate normal mean. Annals of Mathematical Statistics 42 385–388.
  • [41] Strawderman, W. E. (1972). On the existence of proper Bayes minimax estimators of the mean of a multivariate normal distribution. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. I: Theory of Statistics 51–55. Univ. California Press, Berkeley.
  • [42] Takada, Y. (1979). Steins positive part estimator and Bayes estimator. Annals of the Institute of Statistical Mathematics 31 177–183.
  • [43] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 1 267–288.
  • [44] Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society. Series B 67 91–108.
  • [45] Tipping, M. E. (2001). Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1 211–244.
  • [46] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society. Series B 68 49–67.
  • [47] Zhang, C.-H. (2007). Penalized linear unbiased selection. Technical Report, Department of Statistics, Rutgers University.
  • [48] Zhang, C. H. (2010). Nearly unbiased variable selection under minimax concave penalty. Annals of Statistics 38 894–942.
  • [49] Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society, Series B 2 301–320.