Bayesian Analysis

A Fully Nonparametric Modeling Approach to Binary Regression

Maria DeYoreo and Athanasios Kottas

Full-text: Open access

Abstract

We propose a general nonparametric Bayesian framework for binary regression, which is built from modeling for the joint response–covariate distribution. The observed binary responses are assumed to arise from underlying continuous random variables through discretization, and we model the joint distribution of these latent responses and the covariates using a Dirichlet process mixture of multivariate normals. We show that the kernel of the induced mixture model for the observed data is identifiable upon a restriction on the latent variables. To allow for appropriate dependence structure while facilitating identifiability, we use a square-root-free Cholesky decomposition of the covariance matrix in the normal mixture kernel. In addition to allowing for the necessary restriction, this modeling strategy provides substantial simplifications in implementation of Markov chain Monte Carlo posterior simulation. We present two data examples taken from areas for which the methodology is especially well suited. In particular, the first example involves estimation of relationships between environmental variables, and the second develops inference for natural selection surfaces in evolutionary biology. Finally, we discuss extensions to regression settings with ordinal responses.

Article information

Source
Bayesian Anal. Volume 10, Number 4 (2015), 821-847.

Dates
First available in Project Euclid: 17 July 2015

Permanent link to this document
https://projecteuclid.org/euclid.ba/1437137636

Digital Object Identifier
doi:10.1214/15-BA963SI

Mathematical Reviews number (MathSciNet)
MR3432241

Zentralblatt MATH identifier
1334.62035

Keywords
Bayesian nonparametrics Dirichlet process mixture model identifiability Markov chain Monte Carlo ordinal regression

Citation

DeYoreo, Maria; Kottas, Athanasios. A Fully Nonparametric Modeling Approach to Binary Regression. Bayesian Anal. 10 (2015), no. 4, 821--847. doi:10.1214/15-BA963SI. https://projecteuclid.org/euclid.ba/1437137636.


Export citation

References

  • Albert, J. and Chib, S. (1993). “Bayesian analysis of binary and polychotomous response data.” Journal of the American Statistical Association, 88: 669–679.
  • Antoniano-Villalobos, I., Wade, S., and Walker, S. G. (2014). “A Bayesian nonparametric regression model with normalized weights: A study of hippocampal atrophy in Alzheimer’s disease.” Journal of the American Statistical Association, 109: 477–490.
  • Basu, S. and Mukhopadhyay, S. (2000). “Bayesian analysis of binary regression using symmetric and asymmetric links.” Sankhya: The Indian Journal of Statistics, Series B, 62: 372–387.
  • Box, G. E. P. and Tiao, G. C. (1973). Bayesian Inference in Statistical Analysis. Reading, Massachusetts: Addison-Wesley.
  • Canale, A. and Dunson, D. (2011). “Bayesian kernel mixtures for counts.” Journal of the American Statistical Association, 106: 1528–1539.
  • Choudhuri, N., Ghosal, S., and Roy, A. (2007). “Nonparametric binary regression using a Gaussian process prior.” Statistical Methodology, 4: 227–243.
  • Daniels, M. and Pourahmadi, M. (2002). “Bayesian analysis of covariance matrices and dynamic models for longitudinal data.” Biometrika, 89: 553–566.
  • DeIorio, M., Müller, P., Rosner, G., and MacEachern, S. (2004). “An ANOVA model for dependent random measures.” Journal of the American Statistical Association, 99: 205–215.
  • Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith, A. F. M. (2002). Bayesian Methods for Nonlinear Classification and Regression. Chichester: Wiley.
  • Di Lucca, M. A., Guglielmi, A., Müller, P., and Quintana, F. A. (2013). “A simple class of Bayesian nonparametric autoregression models.” Bayesian Analysis, 8: 63–88.
  • Dunson, D. and Park, J. (2008). “Kernel stick-breaking processes.” Biometrika, 95: 307–323.
  • Dunson, D. B. and Bhattacharya, A. (2011). “Nonparametric Bayes regression and classification through mixtures of product kernels.” In: Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 9, Proceedings of the Ninth Valencia International Meeting, 145–164. Oxford University Press.
  • Eaton, M. (2007). Multivariate Statistics: A Vector Space Approach. Beachwood, Ohio: Institute of Mathematical Statistics.
  • Ferguson, T. (1973). “A Bayesian analysis of some nonparametric problems.” The Annals of Statistics, 1: 209–230.
  • Follmann, D. and Lamberdt, E. (1989). “Generalizing logistic regression by nonparametric modelling.” Journal of the American Statistical Association, 84: 295–300.
  • Gelfand, A. and Ghosh, S. (1998). “Model choice: A minimum posterior predictive loss approach.” Biometrika, 85: 1–11.
  • Gelfand, A. E. and Kottas, A. (2002). “A computational approach for full nonparametric Bayesian inference under Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 11: 289–305.
  • Gelfand, A. E., Kottas, A., and MacEachern, S. (2005). “Bayesian nonparametric spatial modeling with Dirichlet process mixing.” Journal of the American Statistical Association, 100: 1021–1035.
  • Griffin, J. and Holmes, C. (2010). “Computational issues arising in Bayesian nonparametric hierarchical models.” In: Hjort, N. L., Holmes, C., Müller, P., and Walker, S. G. (eds.), Bayesian Nonparametrics, 208–222. Cambridge University Press.
  • Griffin, J. and Steel, M. (2006). “Order-based dependent Dirichlet processes.” Journal of the American Statistical Association, 101: 179–194.
  • Hannah, L. A., Blei, D. M., and Powell, W. B. (2011). “Dirichlet process mixtures of generalized linear models.” Journal of Machine Learning Research, 1: 1–33.
  • Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. New York: Chapman and Hall.
  • Hobert, J. and Casella, G. (1996). “The effect of improper priors on Gibbs sampling in hierarchical linear mixed models.” Journal of the American Statistical Association, 61: 1461–1473.
  • Ishwaran, H. and James, L. (2001). “Gibbs sampling methods for stick-breaking priors.” Journal of the American Statistical Association, 96: 161–173.
  • Ishwaran, H. and Zarepour, M. (2000). “Markov Chain Monte Carlo in approximate Dirichlet and Beta two-parameter process Hierarchical Models.” Biometrika, 87: 371–390.
  • Ishwaran, H. and Zarepour, M. (2002). “Exact and approximate sum representations for the Dirichlet process.” The Canadian Journal of Statistics, 30: 269–283.
  • Janzen, F. and Stern, H. (1998). “Logistic regression for empirical studies of multivariate selection.” Evolution, 52: 1564–1571.
  • Kalli, M., Griffin, J., and Walker, S. (2011). “Slice sampling mixture models.” Statistics and Computing, 21: 93–105.
  • Koop, G. (2003). Bayesian Econometrics. Chichester: John Wiley and Sons.
  • Kottas, A., Müller, P., and Quintana, F. (2005). “Nonparametric Bayesian modeling for multivariate ordinal data.” Journal of Computational and Graphical Statistics, 14: 610–625.
  • Lande, R. and Arnold, S. (1983). “The measurement of selection on correlated characters.” Evolution, 37: 1210–1226.
  • MacEachern, S. (2000). “Dependent Dirichlet processes.” Technical report, The Ohio State University Department of Statistics.
  • McCulloch, P., Polson, N., and Rossi, P. (2000). “A Bayesian analysis of the multinomial probit model with fully identified parameters.” Journal of Econometrics, 99: 173–193.
  • Mukhopadyay, S. and Gelfand, A. (1997). “Dirichlet process mixed generalized linear models.” Journal of the American Statistical Association, 92: 633–639.
  • Müller, P., Erkanli, A., and West, M. (1996). “Bayesian curve fitting using multivariate normal mixtures.” Biometrika, 83: 67–79.
  • Müller, P. and Mitra, R. (2013). “Bayesian nonparametric inference – why and how (with discussion).” Bayesian Analysis, 8: 269–360.
  • Müller, P. and Quintana, F. (2010). “Random partition models with regression on covariates.” Journal of Statistical Planning and Inference, 140: 2801–2808.
  • Neal, R. (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9: 249–265.
  • Newton, M., Czado, C., and Chappell, R. (1996). “Bayesian inference for semiparametric binary regression.” Journal of the American Statistical Association, 91: 142–153.
  • Park, J.-H. and Dunson, D. B. (2010). “Bayesian generalized product partition model.” Statistica Sinica, 20: 1203–1226.
  • Rodriguez, A. and Dunson, D. (2011). “Nonparametric Bayesian models through probit stick-breaking processes.” Bayesian Analysis, 6: 145–178.
  • Schluter, D. (1988). “Estimating the form of natural selection on a quantitative trait.” Evolution, 42: 849–861.
  • Schluter, D. and Smith, J. (1986). “Natural selection on beak and body size in the song sparrow.” International Journal of Organic Evolution, 40: 221–231.
  • Sethuraman, J. (1994). “A constructive definition of Dirichlet priors.” Statistica Sinica, 4: 639–650.
  • Shahbaba, B. and Neal, R. (2009). “Nonlinear modeling using Dirichlet process mixtures.” Journal of Machine Learning Research, 10: 1829–1850.
  • Taddy, M. (2010). “Autoregressive mixture models for dynamic spatial Poisson processes: Application to tracking the intensity of violent crime.” Journal of the American Statistical Association, 105: 1403–1417.
  • Taddy, M. and Kottas, A. (2010). “A Bayesian nonparametric approach to inference for quantile regression.” Journal of Business and Economic Statistics, 28: 357–369.
  • Taddy, M. and Kottas, A. (2012). “Mixture modeling for marked Poisson processes.” Bayesian Analysis, 7: 335–362.
  • The Environmental Protection Agency (2014). “Policy Assessment for the Review of the Ozone National Ambient Air Quality Standards.” Available online at: http://www.epa.gov/ttn/naaqs/standards/ozone/data/20140829pa.pdf.
  • Trippa, L. and Muliere, P. (2009). “Bayesian nonparametric binary regression via random tesselations.” Statistics and Probability Letters, 79: 2273–2282.
  • Wade, S., Dunson, D. B., Petrone, S., and Trippa, L. (2014). “Improving prediction from Dirichlet process mixtures via enrichment.” Journal of Machine Learning Research, 15: 1041–1071.
  • Walker, S. and Mallick, B. (1997). “Hierarchical generalized linear models and frailty models with Bayesian nonparametric mixing.” Journal of the Royal Statistical Society, Series B (Statistical Methodology), 59: 845–860.
  • Webb, E. and Forster, J. (2008). “Bayesian model determination for multivariate ordinal and binary data.” Computational Statistics and Data Analysis, 52: 2632–2649.
  • Wood, S. and Kohn, R. (1998). “A Bayesian approach to robust binary nonparametric regression.” Journal of the American Statistical Association, 93: 203–213.