The Annals of Applied Statistics

Bayesian multinomial regression with class-specific predictor selection

Paul Gustafson and Geneviève Lefebvre

Full-text: Open access


Consider a multinomial regression model where the response, which indicates a unit’s membership in one of several possible unordered classes, is associated with a set of predictor variables. Such models typically involve a matrix of regression coefficients, with the (j, k) element of this matrix modulating the effect of the kth predictor on the propensity of the unit to belong to the jth class. Thus, a supposition that only a subset of the available predictors are associated with the response corresponds to some of the columns of the coefficient matrix being zero. Under the Bayesian paradigm, the subset of predictors which are associated with the response can be treated as an unknown parameter, leading to typical Bayesian model selection and model averaging procedures. As an alternative, we investigate model selection and averaging, whereby a subset of individual elements of the coefficient matrix are zero. That is, the subset of predictors associated with the propensity to belong to a class varies with the class. We refer to this as class-specific predictor selection. We argue that such a scheme can be attractive on both conceptual and computational grounds.

Article information

Ann. Appl. Stat., Volume 2, Number 4 (2008), 1478-1502.

First available in Project Euclid: 8 January 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian model averaging classification Markov chain Monte Carlo multinomial models


Gustafson, Paul; Lefebvre, Geneviève. Bayesian multinomial regression with class-specific predictor selection. Ann. Appl. Stat. 2 (2008), no. 4, 1478--1502. doi:10.1214/08-AOAS188.

Export citation


  • Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
  • Asuncion, A. and Newman, D. J. (2007). UCI machine learning repository. Univ. California, Dept. Information and Computer Science, Irvine, CA. Available at
  • Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
  • Brown, P. J., Vannucci, M. and Fearn, T. (1998). Multivariate Bayesian variable selection and prediction. J. Roy. Statist. Soc. Ser. B 60 627–641.
  • Bunch, D. (1991). Estimability in the multinomial probit model. Transportation Res. Part B 25 1–12.
  • Fernandez, C., Ley, E. and Steel, M. F. J. (2001). Benchmark priors for model averaging. J. Econometrics 100 381–427.
  • Figueiredo, M. A. T. (2003). Adaptive sparseness for supervised learning. IEEE Trans. Pattern Anal. Machine Intelligence 25 1150–1159.
  • Friedman, J. H. and Meulman, J. J. (2004). Clustering objects on subsets of attributes (with discussion). J. Roy. Statist. Soc. Ser. B 66 1–25.
  • George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • Gustafson, P., Thompson, N. and de Freitas, N. (2007). Bayesian variable selection for semi-supervised learning, with application to object recognition. Technical Report #231, Dept. Statistics, Univ. British Columbia.
  • Hans, C., Dobra, A. and West, M. (2007). Shotgun search for “large p” regression. J. Amer. Statist. Assoc. 102 507–516.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2001). The Elements of Statistical Learning. Springer: New York.
  • Hedenfalk, I., Duggan, D., Chen, Y., Radmacher, M., Bittner, M., Simon, R., Meltzer, P., Gusterson, B., Esteller, M., Raffeld, M., Yakhini, Z., Ben-Dor, A., Dougherty, E., Kononen, J., Bubendorf, L., Fehrle, W., Pittaluga, S., Gruvberger, S., Loman, N., Johannsson, O., Olsson, H., Wilfond, B., Sauter, G., Kallioniemi, O.-P., Borg, A. and Trent, J. (2001). Gene expression profiles in hereditary breast cancer. New England J. Medicine 344 539–548.
  • Hoff, P. D. (2006). Model-based subspace clustering. Bayesian Anal. 1 321–344.
  • Holmes, C. C. and Held, L. (2006). Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Anal. 1 145–168.
  • Keane, M. P. (1992). A note on identification in the multinomial probit model. J. Business and Economic Statistics 10 193–200.
  • Kim, S., Tadesse, M. G. and Vannucci, M. (2006). Variable selection in clustering via Dirichlet process mixture models. Biometrika 93 877–893.
  • Kueck, H., Carbonetto, P. and de Freitas, N. (2004). A constrained semi-supervised learning approach to data association. In Proceedings of the 8th European Conference on Computer Vision, Part 3 (T. Pajdla and J. Matas, eds.) 1–12. Springer, New York.
  • Lefebvre, G. and Gustafson, P. (2008). Supplement to “Bayesian multinomial regression with class-specific predictor selection.” DOI: 10.1214/08-AOAS188SUPP.
  • Liu, J. S., Zhang, J. L., Palumbo, M. J. and Lawrence, C. E. (2003). Bayesian clustering with variable and transformation selection. In Bayesian Statistics 7 (J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith and M. West, eds.) 249–275. Oxford Univ. Press, New York.
  • Ripley, B. D. (1996). Pattern Recognition and Neural Networks. Cambridge Univ. Press.
  • Sha, N., Vannucci, M., Tadesse, M. G., Brown, P. J., Dragoni, I., Davies, N., Roberts, T. C., Contestabile, A., Salmon, M., Buckley, C. and Falciani, F. (2004). Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics 60 812–819.
  • Smith, M. and Kohn, R. (1996). Nonparametric regression using Bayesian variable selection. J. Econometrics 75 317–343.
  • Tadesse, M. G., Sha, N. and Vannucci, M. (2005). Bayesian variable selection in clustering high-dimensional data. J. Amer. Statist. Assoc. 100 602–617.
  • Train, K. (2003). Discrete Choice Methods with Simulation. Cambridge Univ. Press.
  • Weeks, M. (1997). The multinomial probit model revisited: A discussion of parameter estimability, identification, and specification testing. J. Economic Surveys 11 297–320.
  • Yau, P., Kohn, R. and Wood, S. (2003). Bayesian variable selection and model averaging in high dimensional multinomial nonparametric regression. J. Comput. Graph. Statist. 12 23–54.
  • Yeung, K. Y., Bumgarner, R. E. and Raftery, A. E. (2005). Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21 2394–2402.
  • Zhou, X., Wang, X. and Dougherty, E. R. (2006). Multi-class cancer classification using multinomial probit regression with Bayesian gene selection. Systems Biology, IEE Proceedings 153 70–78.

Supplemental materials