Bayesian Analysis

Compound Poisson Processes, Latent Shrinkage Priors and Bayesian Nonconvex Penalization

Zhihua Zhang and Jin Li

Full-text: Open access

Abstract

In this paper we discuss Bayesian nonconvex penalization for sparse learning problems. We explore a nonparametric formulation for latent shrinkage parameters using subordinators which are one-dimensional Lévy processes. We particularly study a family of continuous compound Poisson subordinators and a family of discrete compound Poisson subordinators. We exemplify four specific subordinators: Gamma, Poisson, negative binomial and squared Bessel subordinators. The Laplace exponents of the subordinators are Bernstein functions, so they can be used as sparsity-inducing nonconvex penalty functions. We exploit these subordinators in regression problems, yielding a hierarchical model with multiple regularization parameters. We devise ECME (Expectation/Conditional Maximization Either) algorithms to simultaneously estimate regression coefficients and regularization parameters. The empirical evaluation of simulated data shows that our approach is feasible and effective in high-dimensional data analysis.

Article information

Source
Bayesian Anal. Volume 10, Number 2 (2015), 247-274.

Dates
First available in Project Euclid: 2 February 2015

Permanent link to this document
https://projecteuclid.org/euclid.ba/1422884974

Digital Object Identifier
doi:10.1214/14-BA892

Mathematical Reviews number (MathSciNet)
MR3420882

Zentralblatt MATH identifier
1335.62073

Keywords
nonconvex penalization subordinators latent shrinkage parameters Bernstein functions ECME algorithms

Citation

Zhang, Zhihua; Li, Jin. Compound Poisson Processes, Latent Shrinkage Priors and Bayesian Nonconvex Penalization. Bayesian Anal. 10 (2015), no. 2, 247--274. doi:10.1214/14-BA892. https://projecteuclid.org/euclid.ba/1422884974


Export citation

References

  • Aalen, O. O. (1992). “Modelling heterogeneity in survival analysis by the compound Poisson distribution.” The Annals of Applied Probability, 2(4): 951–972.
  • Applebaum, D. (2004). Lévy Processes and Stochastic Calculus. Cambridge, UK: Cambridge University Press.
  • Armagan, A., Dunson, D., and Lee, J. (2013). “Generalized double Pareto shrinkage.” Statistica Sinica, 23: 119–143.
  • Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2012). “Bayesian Shrinkage.” Technical report. arXiv:1212.6088.
  • Brix, A. (1999). “Generalized Gamma measures and shot-noise Cox processes.” Advances in Applied Probability, 31(4): 929–953.
  • Broderick, T., Jordan, M. I., and Pitman, J. (2012). “Beta Processes, Stick-Breaking and Power Laws.” Bayesian Analysis, 7(2): 439–476.
  • Caron, F. and Doucet, A. (2008). “Sparse Bayesian nonparametric regression.” In Proceedings of the 25th International Conference on Machine Learning (ICML), 88.
  • Carvalho, C. M., Polson, N. G., and Scott, J. G. (2009). “Handling sparsity via the horsehoe.” In The Twelfth International Conference on Artificial Intelligence and Statistics, 73–80.
  • — (2010). “The horseshoe estimator for sparse signals.” Biometrika, 97: 465–480.
  • Cevher, V. (2009). “Learning with compressible priors.” In Advances in Neural Information Processing Systems (NIPS) 22, 261–269.
  • Feller, W. (1971). An Introduction to Probability Theory and Its Applications, volume II. New York: John Wiley and Sons, second edition.
  • Figueiredo, M. A. T. (2003). “Adaptive Sparseness for Supervised Learning.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9): 1150–1159.
  • Garrigues, P. J. and Olshausen, B. A. (2010). “Group Sparse Coding with a Laplacian Scale Mixture Prior.” In Advances in Neural Information Processing Systems (NIPS) 22.
  • Ghahramani, Z., Griffiths, T., and Sollich, P. (2006). “Bayesian Nonparametric latent feature models.” In World Meeting on Bayesian Statistics.
  • Griffin, J. E. and Brown, P. J. (2010). “Inference with normal-gamma prior distributions in regression problems.” Bayesian Analysis, 5(1): 171–183.
  • — (2011). “Bayesian Hyper-lassos with Non-convex Penalization.” Australian & New Zealand Journal of Statistics, 53(4): 423–442.
  • Hans, C. (2009). “Bayesian lasso regression.” Biometrika, 96: 835–845.
  • Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). “Penalized regression, standard errors, and Bayesian lassos.” Bayesian Analysis, 5(2): 369–382.
  • Lee, A., Caron, F., Doucet, A., and Holmes, C. (2010). “A Hierarchical Bayesian Framework for Constructing Sparsity-inducing Priors.” Technical report, University of Oxford, UK.
  • Li, Q. and Lin, N. (2010). “The Bayesian Elastic Net.” Bayesian Analysis, 5(1): 151–170.
  • Liu, C. and Rubin, D. B. (1994). “The ECME algorithm: A simple extension of EM and ECM with faster monotone convergence.” Biometrika, 84(4): 633–648.
  • Magnus, J. R. and Neudecker, H. (1999). Matrix Calculus with Applications in Statistics and Econometrics. New York: John Wiley & Sons, revised edition.
  • Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979). Multivariate Analysis. New York: Academic Press.
  • Mazumder, R., Friedman, J., and Hastie, T. (2011). “SparseNet: Coordinate Descent with Nonconvex Penalties.” Journal of the American Statistical Association, 106(495): 1125–1138.
  • Paisley, J. and Carin, L. (2009). “Nonparametric factor analysis with beta process priors.” In The 26th International Conference on Machine Learning (ICML).
  • Park, T. and Casella, G. (2008). “The Bayesian Lasso.” Journal of the American Statistical Association, 103(482): 681–686.
  • Polson, N. G. and Scott, J. G. (2010). “Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction.” In Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 9. Oxford University Press.
  • — (2011). “Data Augmentation for Support Vector Machines.” Bayesian Analysis, 6(1): 1–24.
  • — (2012). “Local shrinkage rules, Lévy processes and regularized regression.” Journal of the Royal Statistical Society, Series B, 74(2): 287–311.
  • Sato, S.-I. P. (1999). Lévy Processes and Infinitely Divisible Distributions. Cambridge, UK: Cambridge University Press.
  • Teh, Y. W. and Görür, D. (2009). “Indian buffet processes with power-law behavior.” In Advances in Neural Information Processing Systems (NIPS).
  • Thibaux, R. and Jordan, M. I. (2007). “Hierachical Beta Processes and the Indian Buffet Processes.” In The International Conference on AI and Statistics.
  • Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso.” Journal of the Royal Statistical Society, Series B, 58: 267–288.
  • Titsias, M. K. (2007). “The Infinite Gamma-Poisson Feature Models.” In Advances in Neural Information Processing Systems (NIPS) 20.
  • Yuan, L. and Kalbfleisch, J. D. (2000). “On the Bessel Distribution and Related Problems.” Annals of the Institute of Statistical Mathematics, 52(3): 438–447.
  • Zhang, Z. and Tu, B. (2012). “Nonconvex Penalization Using Laplace Exponents and Concave Conjugates.” In Advances in Neural Information Processing Systems (NIPS) 26.
  • Zhang, Z., Wang, S., Liu, D., and Jordan, M. I. (2012). “EP-GIG priors and applications in Bayesian sparse learning.” Journal of Machine Learning Research, 13: 2031–2061.
  • Zou, H. and Li, R. (2008). “One-step sparse estimates in nonconcave penalized likelihood models.” The Annals of Statistics, 36(4): 1509–1533.