Bayesian Analysis

Locally Adaptive Smoothing with Markov Random Fields and Shrinkage Priors

James R. Faulkner and Vladimir N. Minin

Full-text: Open access

Abstract

We present a locally adaptive nonparametric curve fitting method that operates within a fully Bayesian framework. This method uses shrinkage priors to induce sparsity in order-k differences in the latent trend function, providing a combination of local adaptation and global control. Using a scale mixture of normals representation of shrinkage priors, we make explicit connections between our method and kth order Gaussian Markov random field smoothing. We call the resulting processes shrinkage prior Markov random fields (SPMRFs). We use Hamiltonian Monte Carlo to approximate the posterior distribution of model parameters because this method provides superior performance in the presence of the high dimensionality and strong parameter correlations exhibited by our models. We compare the performance of three prior formulations using simulated data and find the horseshoe prior provides the best compromise between bias and precision. We apply SPMRF models to two benchmark data examples frequently used to test nonparametric methods. We find that this method is flexible enough to accommodate a variety of data generating models and offers the adaptive properties and computational tractability to make it a useful addition to the Bayesian nonparametric toolbox.

Article information

Source
Bayesian Anal. Volume 13, Number 1 (2018), 225-252.

Dates
First available in Project Euclid: 24 February 2017

Permanent link to this document
https://projecteuclid.org/euclid.ba/1487905413

Digital Object Identifier
doi:10.1214/17-BA1050

Keywords
nonparametric horseshoe prior Lévy process Hamiltonian Monte Carlo

Rights
Creative Commons Attribution 4.0 International License.

Citation

Faulkner, James R.; Minin, Vladimir N. Locally Adaptive Smoothing with Markov Random Fields and Shrinkage Priors. Bayesian Anal. 13 (2018), no. 1, 225--252. doi:10.1214/17-BA1050. https://projecteuclid.org/euclid.ba/1487905413


Export citation

References

  • Abramovich, F., Sapatinas, T., and Silverman, B. W. (1998). “Wavelet thresholding via a Bayesian approach.”Journal of the Royal Statistical Society: Series B (Statistical Methodology), 60(4): 725–749.
  • Adams, R. P., Murray, I., and MacKay, D. J. (2009). “Tractable nonparametric Bayesian inference in Poisson processes with Gaussian process intensities.” InProceedings of the 26th Annual International Conference on Machine Learning, 9–16. ACM.
  • Andrews, D. F. and Mallows, C. L. (1974). “Scale mixtures of normal distributions.”Journal of the Royal Statistical Society. Series B (Methodological), 36(1): 99–102.
  • Armagan, A., Dunson, D. B., and Lee, J. (2013). “Generalized Double Pareto Shrinkage.”Statistica Sinica, 23(1): 119–143.
  • Betancourt, M. and Girolami, M. (2015). “Hamiltonian Monte Carlo for hierarchical models.” In Upadhyay, S. K., Singh, U., Dey, D. K., and Loganathan, A. (eds.),Current Trends in Bayesian Methodology with Applications, 79–97. CRC Press.
  • Bhadra, A., Datta, J., Polson, N. G., and Willard, B. (2015). “The horseshoe+ estimator of ultra-sparse signals.” arXiv:1502.00560.
  • Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). “Dirichlet-Laplace priors for optimal shrinkage.”Journal of the American Statistical Association, 110(512): 1479–1490.
  • Bochner, S. (1955).Harmonic Analysis and the Theory of Probability. University of California Press.
  • Brahim-Belhouari, S. and Bermak, A. (2004). “Gaussian process for nonstationary time series prediction.”Computational Statistics & Data Analysis, 47(4): 705–712.
  • Carlin, B. P., Gelfand, A. E., and Smith, A. F. (1992). “Hierarchical Bayesian analysis of changepoint problems.”Applied Statistics, 41(2): 389–405.
  • Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P., and Riddell, A. (2016). “Stan: a probabilistic modeling language.”Journal of Statistical Software, 76(1): 1–32.
  • Carvalho, C. M., Polson, N. G., and Scott, J. G. (2010). “The Horseshoe Estimator for Sparse Signals.”Biometrika, 97(2): 465–480.
  • Clark, P. K. (1973). “A subordinated stochastic process model with finite variance for speculative prices.”Econometrica, 41(1): 135–155.
  • DiMatteo, I., Genovese, C. R., and Kass, R. E. (2001). “Bayesian curve-fitting with free-knot splines.”Biometrika, 88(4): 1055–1071.
  • Duane, S., Kennedy, A. D., Pendleton, B. J., and Roweth, D. (1987). “Hybrid Monte Carlo.”Physics Letters B, 195(2): 216–222.
  • Faulkner, J. R. and Minin, V. N. (2017). “Supplementary Materials for “Locally Adaptive Smoothing with Markov Random Fields and Shrinkage Priors”.”Bayesian Analysis.
  • Figueiredo, M. A. (2003). “Adaptive sparseness for supervised learning.”IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9): 1150–1159.
  • Fong, Y., Rue, H., and Wakefield, J. (2010). “Bayesian inference for generalized linear mixed models.”Biostatistics, 11(3): 397–412.
  • Frank, I. E. and Friedman, J. H. (1993). “A statistical view of some chemometrics regression tools.”Technometrics, 35(2): 109–135.
  • Gelman, A. and Rubin, D. B. (1992). “Inference from iterative simulation using multiple sequences.”Statistical Science, 7(4): 457–472.
  • Gelman, A. et al. (2006). “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).”Bayesian Analysis, 1(3): 515–534.
  • Green, P. J. (1995). “Reversible Jump Markov Chain Monte Carlo Computation and Bayesian Model Determination.”Biometrika, 82(4): 711–732.
  • Griffin, J. E., Brown, P. J., et al. (2010). “Inference with normal-gamma prior distributions in regression problems.”Bayesian Analysis, 5(1): 171–188.
  • Griffin, J. E., Brown, P. J., et al. (2013). “Some priors for sparse regression modelling.”Bayesian Analysis, 8(3): 691–702.
  • Hoerl, A. E. and Kennard, R. W. (1970). “Ridge regression: Biased estimation for nonorthogonal problems.”Technometrics, 12(1): 55–67.
  • Hoffman, M. D. and Gelman, A. (2014). “The no-U-turn sampler: Adaptively setting path lengths in Hamiltonian Monte Carlo.”The Journal of Machine Learning Research, 15(1): 1593–1623.
  • Jarrett, R. G. (1979). “A Note on the Intervals Between Coal-Mining Disasters.”Biometrika, 66(1): 191–193.
  • Johnstone, I. M. and Silverman, B. W. (2005). “Empirical Bayes selection of wavelet thresholds.”Annals of Statistics, 1700–1752.
  • Kim, S.-J., Koh, K., Boyd, S., and Gorinevsky, D. (2009). “$\ell_{1}$ Trend Filtering.”SIAM Review, 51(2): 339–360.
  • Kitagawa, G. (1987). “Non-Gaussian state-space modeling of nonstationary time series.”Journal of the American Statistical Association, 82(400): 1032–1041.
  • Knorr-Held, L. and Rue, H. (2002). “On block updating in Markov random field models for disease mapping.”Scandinavian Journal of Statistics, 29(4): 597–614.
  • Kyung, M., Gill, J., Ghosh, M., and Casella, G. (2010). “Penalized Regression, Standard Errors, and Bayesian Lassos.”Bayesian Analysis, 5(2): 369–411.
  • Lang, S., Fronk, E.-M., and Fahrmeir, L. (2002). “Function estimation with locally adaptive dynamic models.”Computational Statistics, 17(4): 479–499.
  • Lindgren, F. and Rue, H. (2008). “On the Second-Order Random Walk Model for Irregular Locations.”Scandinavian Journal of Statistics, 35(4): 691–700.
  • Maguire, B. A., Pearson, E. S., and Wynn, A. H. A. (1952). “The Time Intervals Between Industrial Accidents.”Biometrika, 39(1/2): 168–180.
  • Mitchell, T. J. and Beauchamp, J. J. (1988). “Bayesian variable selection in linear regression.”Journal of the American Statistical Association, 83(404): 1023–1032.
  • Murray, I. and Adams, R. P. (2010). “Slice sampling covariance hyperparameters of latent Gaussian models.” InAdvances in Neural Information Processing Systems, 1732–1740.
  • Murray, I., Adams, R. P., and Mackay, D. (2010). “Elliptical slice sampling.” InInternational Conference on Artificial Intelligence and Statistics, 541–548.
  • Neal, R. (2011). “MCMC Using Hamiltonian Dynamics.”Handbook of Markov Chain Monte Carlo, 2.
  • Neal, R. M. (1993). “Probabilistic inference using Markov chain Monte Carlo methods.” Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto.
  • Neal, R. M. (1998). “Regression and classification using Gaussian process priors.”Bayesian Statistics, 6: 475–501.
  • Paciorek, C. and Schervish, M. (2004). “Nonstationary covariance functions for Gaussian process regression.”Advances in Neural Information Processing Systems, 16: 273–280.
  • Paciorek, C. J. and Schervish, M. J. (2006). “Spatial modelling using a new class of nonstationary covariance functions.”Environmetrics, 17(5): 483–506.
  • Papaspiliopoulos, O., Roberts, G. O., and Sköld, M. (2003). “Non-centered parameterisations for hierarchical models and data augmentation.” In Bernardo, J., Bayarri, M., Berger, J., Dawid, A., Heckerman, D., Smith, A., and West, M. (eds.),Bayesian Statistics 7: Proceedings of the Seventh Valencia International Meeting, 307–326. Oxford University Press, USA.
  • Papaspiliopoulos, O., Roberts, G. O., and Sköld, M. (2007). “A general framework for the parametrization of hierarchical models.”Statistical Science, 22(1): 59–73.
  • Park, T. and Casella, G. (2008). “The Bayesian Lasso.”Journal of the American Statistical Association, 103(482): 681–686.
  • Polson, N. G. and Scott, J. G. (2010). “Shrink globally, act locally: sparse Bayesian regularization and prediction.”Bayesian Statistics, 9: 501–538.
  • Polson, N. G. and Scott, J. G. (2012a). “Local Shrinkage Rules, Lévy Processes and Regularized Regression.”Journal of the Royal Statistical Society: Series B (Methodological), 74(2): 287–311.
  • Polson, N. G. and Scott, J. G. (2012b). “On the Half-Cauchy Prior for a Global Scale Parameter.”Bayesian Analysis, 7(4): 887–902.
  • R Core Team (2014).R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URLhttp://www.R-project.org.
  • Raftery, A. E. and Akman, V. E. (1986). “Bayesian analysis of a Poisson process with a change-point.”Biometrika, 73(1): 85–89.
  • Rasmussen, C. E. and Williams, C. K. I. (2006).Gaussian Processes for Machine Learning. MIT Press.
  • Reinsch, C. H. (1967). “Smoothing by spline functions.”Numerische Mathematik, 10(3): 177–183.
  • Reményi, N. and Vidakovic, B. (2015). “Wavelet Shrinkage with Double Weibull Prior.”Communications in Statistics – Simulation and Computation, 44(1): 88–104.
  • Roberts, G. O. and Stramer, O. (2002). “Langevin diffusions and Metropolis–Hastings algorithms.”Methodology and Computing in Applied Probability, 4(4): 337–357.
  • Roualdes, E. A. (2015). “Bayesian Trend Filtering.” arXiv:1505.07710.
  • Rue, H. (2001). “Fast sampling of Gaussian Markov random fields.”Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2): 325–338.
  • Rue, H. and Held, L. (2005).Gaussian Markov Random Fields: Theory and Applications. CRC Press.
  • Scheipl, F. and Kneib, T. (2009). “Locally Adaptive Bayesian P-splines with a Normal-Exponential-Gamma Prior.”Computational Statistics & Data Analysis, 53(10): 3533–3552.
  • Simpson, D. P., Martins, T. G., Riebler, A., Fuglstad, G.-A., Rue, H., and Sørbye, S. H. (2014). “Penalising model component complexity: A principled, practical approach to constructing priors.” arXiv:1403.4630.
  • Sørbye, S. H. and Rue, H. (2014). “Scaling intrinsic Gaussian Markov random field priors in spatial modelling.”Spatial Statistics, 8: 39–51.
  • Speckman, P. L. and Sun, D. (2003). “Fully Bayesian spline smoothing and intrinsic autoregressive priors.”Biometrika, 90(2): 289–302.
  • Stan Development Team (2015a). “RStan: the R interface to Stan, Version 2.6.2.” URLhttp://mc-stan.org/rstan.html.
  • Stan Development Team (2015b).Stan Modeling Language Users Guide and Reference Manual, Version 2.6.2. URLhttp://mc-stan.org/.
  • Teh, Y. W. and Rao, V. (2011). “Gaussian Process Modulated Renewal Processes.” InAdvances in Neural Information Processing Systems, 2474–2482.
  • Tibshirani, R. (1996). “Regression shrinkage and selection via the lasso.”Journal of the Royal Statistical Society. Series B (Methodological), 58(1): 267–288.
  • Tibshirani, R. (2011). “Regression shrinkage and selection via the lasso: a retrospective.”Journal of the Royal Statistical Society: Series B (Statistical Methodology), 73(3): 273–282.
  • Tibshirani, R. J. (2014). “Adaptive Piecewise Polynomial Estimation via Trend Filtering.”The Annals of Statistics, 42(1): 285–323.
  • Tibshirani, R. J. and Taylor, J. (2011). “The Solution Path of the Generalized Lasso.”The Annals of Statistics, 39(3): 1335–1371.
  • Wahba, G. (1975). “Smoothing noisy data with spline functions.”Numerische Mathematik, 24(5): 383–393.
  • West, M. (1987). “On scale mixtures of normal distributions.”Biometrika, 74(3): 646–648.
  • Whittaker, E. T. (1922). “On a new method of graduation.”Proceedings of the Edinburgh Mathematical Society, 41: 63–75.
  • Yue, Y. R., Simpson, D., Lindgren, F., and Rue, H. (2014). “Bayesian Adaptive Smoothing Splines Using Stochastic Differential Equations.”Bayesian Analysis, 9(2): 397–424.
  • Yue, Y. R., Speckman, P. L., and Sun, D. (2012). “Priors for Bayesian adaptive spline smoothing.”Annals of the Institute of Statistical Mathematics, 64(3): 577–613.
  • Zhu, B. and Dunson, D. B. (2013). “Locally adaptive Bayes nonparametric regression via nested Gaussian processes.”Journal of the American Statistical Association, 108(504): 1445–1456.

Supplemental materials