Statistical Science

Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies

Terrance Savitsky, Marina Vannucci, and Naijun Sha

Full-text: Open access


This paper presents a unified treatment of Gaussian process models that extends to data from the exponential dispersion family and to survival data. Our specific interest is in the analysis of data sets with predictors that have an a priori unknown form of possibly nonlinear associations to the response. The modeling approach we describe incorporates Gaussian processes in a generalized linear model framework to obtain a class of nonparametric regression models where the covariance matrix depends on the predictors. We consider, in particular, continuous, categorical and count responses. We also look into models that account for survival outcomes. We explore alternative covariance formulations for the Gaussian process prior and demonstrate the flexibility of the construction. Next, we focus on the important problem of selecting variables from the set of possible predictors and describe a general framework that employs mixture priors. We compare alternative MCMC strategies for posterior inference and achieve a computationally efficient and practical approach. We demonstrate performances on simulated and benchmark data sets.

Article information

Statist. Sci., Volume 26, Number 1 (2011), 130-149.

First available in Project Euclid: 9 June 2011

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian variable selection generalized linear models Gaussian processes latent variables MCMC nonparametric regression survival data


Savitsky, Terrance; Vannucci, Marina; Sha, Naijun. Variable Selection for Nonparametric Gaussian Process Priors: Models and Computational Strategies. Statist. Sci. 26 (2011), no. 1, 130--149. doi:10.1214/11-STS354.

Export citation


  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • Banerjee, S., Gelfand, A. E., Finley, A. O. and Sang, H. (2008). Gaussian predictive process models for large spatial data sets. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 825–848.
  • Bishop, C. M. (2006). Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York.
  • Breiman, L. and Friedman, J. H. (1985). Estimating optimal transformations for multiple regression and correlation (with discussion). J. Amer. Statist. Assoc. 80 580–619.
  • Brown, P. J., Vannucci, M. and Fearn, T. (1998a). Bayesian wavelength selection in multicomponent analysis. J. Chemometrics 12 173–182.
  • Brown, P. J., Vannucci, M. and Fearn, T. (1998b). Multivariate Bayesian variable selection and prediction. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 627–641.
  • Brown, P. J., Vannucci, M. and Fearn, T. (2002). Bayes model averaging with selection of regressors. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 519–536.
  • Chen, M.-H., Ibrahim, J. G. and Yiannoutsos, C. (1999). Prior elicitation, variable selection and Bayesian computation for logistic regression models. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 223–242.
  • Chipman, H., George, E. and McCulloch, R. (2001). Practical implementation of Bayesian model selection. In Model Selection (P. Lahiri, ed.) 65–134. IMS, Beachwood, OH.
  • Chipman, H., George, E. and McCulloch, R. (2002). Bayesian treed models. Machine Learning 48 303–324.
  • Cox, D. R. (1972). Regression models and life-tables (with discussion). J. Roy. Statist. Soc. Ser. B 34 187–220.
  • Diggle, P. J., Tawn, J. A. and Moyeed, R. A. (1998). Model-based geostatistics (with discussion). J. Roy. Statist. Soc. Ser. C 47 299–350.
  • Fahrmeir, L., Kneib, T. and Lang, S. (2004). Penalized structured additive regression for space-time data: A Bayesian perspective. Statist. Sinica 14 731–761.
  • Freund, Y. and Schapire, R. E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci. 55 119–139.
  • George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • George, E. I. and McCulloch, R. E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339–373.
  • Gottardo, R. and Raftery, A. E. (2008). Markov chain Monte Carlo with mixtures of mutually singular distributions. J. Comput. Graph. Statist. 17 949–975.
  • Gramacy, R. B. and Lee, H. K. H. (2008). Bayesian treed Gaussian process models with an application to computer modeling. J. Amer. Statist. Assoc. 103 1119–1130.
  • Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • Guan, Y. and Stephens, M. (2011). Bayesian variable selection regression for genome-wide association studies, and other large-scale problems. Ann. Appl. Stat. To appear.
  • Haario, H., Saksman, E. and Tamminen, J. (2001). An adaptive metropolis algorithm. Bernoulli 7 223–242.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York.
  • Ji, C. and Schmidler, S. (2009). Adaptive Markov chain Monte Carlo for Bayesian variable selection. Technical report.
  • Kalbfleisch, J. D. (1978). Non-parametric Bayesian analysis of survival time data. J. Roy. Statist. Soc. Ser. B 40 214–221.
  • Lee, K. E. and Mallick, B. K. (2004). Bayesian methods for variable selection in survival models with application to DNA microarray data. Sankhyā 66 756–778.
  • Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of g priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
  • Linkletter, C., Bingham, D., Hengartner, N., Higdon, D. and Ye, K. Q. (2006). Variable selection for Gaussian process models in computer experiments. Technometrics 48 478–490.
  • Long, J. (1997). Regression Models for Categorical and Limited Dependent Variables. Sage, Thousand Oaks, CA.
  • Madigan, D. and York, J. (1995). Bayesian graphical models for discrete data. Internat. Statist. Rev. 63 215–232.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall, London.
  • Neal, R. M. (1999). Regression and classification using Gaussian process priors. In Bayesian Statistics 6 ( A. P. Dawid, J. M. Bernardo, J. O. Berger and A. F. M. Smith, eds.) 475–501. Oxford Univ. Press, New York.
  • Neal, R. M. (2000). Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 9 249–265.
  • O’Hagan, A. (1978). Curve fitting and optimal design for prediction. J. Roy. Statist. Soc. Ser. B 40 1–42.
  • Panagiotelis, A. and Smith, M. (2008). Bayesian identification, selection and estimation of semiparametric functions in high-dimensional additive models. J. Econometrics 143 291–316.
  • Parzen, E. (1963). Probability density functionals and reproducing kernel Hilbert spaces. In Proc. Sympos. Time Series Analysis (Brown Univ., 1962) ( M. Rosenblatt, ed.) 155–169. Wiley, New York.
  • Qian, P. Z. G., Wu, H. and Wu, C. F. J. (2008). Gaussian process models for computer experiments with qualitative and quantitative factors. Technometrics 50 383–396.
  • Raftery, A. E., Madigan, D. and Volinsky, C. T. (1996). Accounting for model uncertainty in survival analysis improves predictive performance. In Bayesian Statistics 5. Oxford Sci. Publ. 323–349. Oxford Univ. Press, New York.
  • Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Adaptive Computation and Machine Learning. MIT Press, Cambridge, MA.
  • Roberts, G. O. and Rosenthal, J. S. (2007). Coupling and ergodicity of adaptive Markov chain Monte Carlo algorithms. J. Appl. Probab. 44 458–475.
  • Ruppert, D., Wand, M. P. and Carroll, R. J. (2003). Semiparametric Regression. Cambridge Series in Statistical and Probabilistic Mathematics 12. Cambridge Univ. Press, Cambridge.
  • Sacks, J., Schiller, S. B. and Welch, W. J. (1989). Designs for computer experiments. Technometrics 31 41–47.
  • Savitsky, T. D. (2010). Generalized Gaussian process models with Bayesian variable selection. Ph.D. thesis, Dept. Statistics, Rice Univ.
  • Sha, N., Tadesse, M. G. and Vannucci, M. (2006). Bayesian variable selection for the analysis of microarray data with censored outcomes. Bioinformatics 22 2262–2268.
  • Sha, N., Vannucci, M., Tadesse, M. G., Brown, P. J., Dragoni, I., Davies, N., Roberts, T. C., Contestabile, A., Salmon, M., Buckley, C. and Falciani, F. (2004). Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60 812–828.
  • Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge Univ. Press, Cambridge.
  • Sinha, D., Ibrahim, J. G. and Chen, M.-H. (2003). A Bayesian justification of Cox’s partial likelihood. Biometrika 90 629–641.
  • Thrun, S., Saul, L. K. and Scholkopf, B. (2004). Advances in Neural Information Processing Systems. MIT Press, Cambridge.
  • Tokdar, S., Zhu, Y. and Ghosh, J. (2010). Bayesian density regression with logistic Gaussian process and subspace projection. Bayesian Anal. 5 319–344.
  • Volinsky, C., Madigan, D., Raftery, A. and Kronmal, R. (1997). Bayesian model averaging in proportional hazard models: Assessing the risk of stroke. Appl. Statist. 46 433–448.
  • Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA.
  • Weng, Y. P. and Wong, K. F. (2007). Baseline Survival Function Estimators under Proportional Hazards Assumption, Ph.D. thesis, Institute of Statistics, National Univ. Kaohsiung, Taiwan.