## Bayesian Analysis

### Matrix-Variate Dirichlet Process Priors with Applications

#### Abstract

In this paper we propose a matrix-variate Dirichlet process (MATDP) for modeling the joint prior of a set of random matrices. Our approach is able to share statistical strength among regression coefficient matrices due to the clustering property of the Dirichlet process. Moreover, since the base probability measure is defined as a matrix-variate distribution, the dependence among the elements of each random matrix is described via the matrix-variate distribution. We apply MATDP to multivariate supervised learning problems. In particular, we devise a nonparametric discriminative model and a nonparametric latent factor model. The interest is in considering correlations both across response variables (or covariates) and across response vectors. We derive Markov chain Monte Carlo algorithms for posterior inference and prediction, and illustrate the application of the models to multivariate regression, multi-class classification and multi-label prediction problems.

#### Article information

Source
Bayesian Anal., Volume 9, Number 2 (2014), 259-286.

Dates
First available in Project Euclid: 26 May 2014

Permanent link to this document
https://projecteuclid.org/euclid.ba/1401148309

Digital Object Identifier
doi:10.1214/13-BA853

Mathematical Reviews number (MathSciNet)
MR3216996

Zentralblatt MATH identifier
1327.62175

#### Citation

Zhang, Zhihua; Wang, Dakan; Dai, Guang; Jordan, Michael I. Matrix-Variate Dirichlet Process Priors with Applications. Bayesian Anal. 9 (2014), no. 2, 259--286. doi:10.1214/13-BA853. https://projecteuclid.org/euclid.ba/1401148309

#### References

• Albert, J. H. and Chib, S. (1993). “Bayesian Analysis of Binary and Polychotomous Response Data.” Journal of the American Statistical Association, 88(422): 669–679.
• Alcock, R. J. and Manolopoulos, Y. (1999). “Time-Series Similarity Queries Employing a Feature-Based Approach.” In Seventh Hellenic Conference on Informatics. Ioannina, Greece.
• Antoniak, C. E. (1974). “Mixtures of Dirichlet Processes with applications to Bayesian nonparametric problems.” The Annals of Statistics, 2: 1152–1174.
• Blackwell, D. and MacQueen, J. B. (1973). “Ferguson distributions via Pólya urn schemes.” The Annals of Statistics, 1: 353–355.
• Bohanec, M. and Rajkovic, V. (1990). “Expert system for decision making.” Sistemica, 1(1): 145–157.
• Breiman, L. and Friedman, J. (1997). “Predicting multivariate responses in multiple linear regression (with discussion).” Journal of the Royal Statistical Society, B, 59(1): 3–54.
• Bush, C. A., Lee, J., and MacEachern, S. N. (2010). “Minimally informative prior distributions for non-parametric Bayesian analysis.” Journal of the Royal Statistical Society, B, 72(2): 253–268.
• Bush, C. A. and MacEachern, S. N. (1996). “A semiparametric Bayesian model for randomised block designs.” Biometrika, 83: 275–285.
• Caruana, R. (1997). “Multitask Learning.” Machine Learning, 28(1): 41–75.
• Carvalho, C. M., Chang, J., Lucas, J. E., Nevins, J. R., Wang, Q., and West, M. (2010). “High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics.” Journal of the American Statistical Association, 103(484): 1438–1456.
• Chen, M., Silva, J., Paisley, J., Wang, C., Dunson, D., and Carin, L. (2010). “Compressive Sensing on Manifolds Using a Nonparametric Mixture of Factor Analyzers: Algorithm and Performance Bounds.” Technical report, Electrical & Computer Engineering Department, Duke University, USA.
• Denison, D. G. T., Holmes, C. C., Mallick, B. K., and Smith, A. F. M. (2002). Bayesian Methods for Nonlinear Classification and Regression. New York: John Wiley and Sons.
• Dunson, D. B., Pillai, N., and Park, J.-H. (2007). “Bayesian Density Regression.” Journal of the Royal Statistical Society Series B, 69(2): 163–183.
• Dunson, D. B., Xue, Y., and Carin, L. (2008). “The Matrix Stick-Breaking Process.” Journal of the American Statistical Association, 103(481): 317–327.
• Escobar, M. D. and West, M. (1995). “Bayesian Density Estimation and Inference Using Mixtures.” Journal of the American Statistical Association, 90: 577–588.
• Ferguson, T. S. (1973). “A Bayesian Analysis of Some Nonparametric Problems.” The Annals of Statistics, 1: 209–230.
• Gelfand, A. E., Kottas, A., and MacEachern, S. N. (2005). “Bayesian Nonparametric Spatial Modeling with Dirichlet Process Mixing.” Journal of the American Statistical Association, 100: 1021–1035.
• Golub, G. H. and Loan, C. F. V. (1996). Matrix Computations. Baltimore, MD: The Johns Hopkins University Press.
• Görür, D. and Rasmussen, C. E. (2007). “Dirichlet Process Mixtures of Factor Analysers.” In Fifth Workshop on Bayesian Inference in Stochastic Processes.
• Griffiths, T. L. and Ghahramani, Z. (2005). “Infinite latent feature models and the Indian buffet process.” In Advances in Neural Information Processing Systems.
• Gupta, A. K. and Nagar, D. K. (2000). Matrix Variate Distributions. Chapman & Hall/CRC.
• Hannah, L. A., Blei, D. M., and Powell, W. B. (2010). “Dirichlet Process Mixtures of Generalized Linear Models.” In The Thirteenth International Conference on AI and Statistics.
• Holmes, C. C. and Held, L. (2006). “Bayesian auxiliary variable methods for binary and multinomial regression.” Bayesian Analysis, 1(1): 145–168.
• Ibrahim, J. G. and Kleinman, K. P. (1998). “Semiparametric Bayesian Methods for Random Effects Models.” In Dey, D., Müller, P., and Sinha, D. (eds.), Practical Nonparametric and Semiparametric Bayesian Statistics, 89–114. New York: Springer-Verlag.
• MacEachern, S. N. (1998). “Computational Methods for Mixture of Dirichlet Process Models.” In Dey, D., Müller, P., and Sinha, D. (eds.), Practical Nonparametric and Semiparametric Bayesian Statistics, 23–43. New York: Springer-Verlag.
• — (1999). “Dependent Nonparametric Processes.” In The Section on Bayesian Statistical Science, 50–55. American Statistical Association.
• MacEachern, S. N. (2000). “Dependent Dirichlet processes.” Technical report, Ohio State University, Department of Statistics.
• Mardia, K. V., Kent, J. T., and Bibby, J. M. (1979). Multivariate Analysis. New York: Academic Press.
• Murphy, K. P. (2012). Machine Learning: A Probabilistic Perspective. Cambridge, Massachusetts: The MIT Press.
• Neal, R. M. (2000). “Markov chain sampling methods for Dirichlet process mixture models.” Journal of Computational and Graphical Statistics, 9: 249–265.
• — (2003). “Slice sampling.” The Annals of Statistics, 31(3): 705–767.
• Ng, A. Y. and Jordan, M. I. (2002). “On discriminative vs. generative classifiers: A comparison of logistic regression and naïve Bayes.” In Advances in Neural Information Processing Systems 14.
• Paisley, J. and Carin, L. (2009). “Nonparametric factor analysis with beta process priors.” In Proceedings of the 26th Annual International Conference on Machine Learning.
• Rai, P. and Daumé III, H. (2009). “Multi-Label Prediction via Sparse Infinite CCA.” In Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C. K. I., and Culotta, A. (eds.), Advances in Neural Information Processing Systems 22, 1518–1526.
• — (2010). “Infinite Predictor Subspace Models for Multitask Learning.” In Proceedings of the Conference on Artificial Intelligence and Statistics (AISTATS). Sardinia, Italy.
• Schölkopf, B. and Smola, A. (2002). Learning with Kernels. The MIT Press.
• Shahbaba, B. and Neal, R. (2009). “Nonlinear Models Using Dirichlet Process Mixtures.” Journal of Machine Learning Research, 10(2): 1829–1850.
• Skagerberg, B., MacGregor, J., and Kiparissides, C. (1992). “Multivariate data analysis applied to low-density polythylene reactors.” Chemometrics and intelligent laboratory systems, 14: 341–356.
• Teh, Y. W., Seeger, M., and Jordan, M. I. (2005). “Semiparametric Latent Factor Models.” In Workshop on AI and Statistics 10.
• Tewari, A. and Bartlett, P. L. (2007). “On the consistency of multiclass classification methods.” Journal of Machine Learning Research, 8: 1007–1025.
• West, M. (2003). “Bayesian factor regression models in the “large $p$, small $n$” paradigm.” In Bernardo, J. M., Bayarri, M. J., Berger, J. O., Dawid, A. P., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 7, 723–732. Oxford University Press.
• Xue, Y., Liao, X., Carin, L., and Krishnapuram, B. (2007). “Multi-Task learning for classification with Dirichlet Process priors.” Journal of Machine Learning Research, 8: 35–63.
• Yu, S., Yu, K., Tresp, V., Kriegel, H.-P., and Wu, M. (2006). “Supervised probabilistic principal component analysis.” In KDD ’06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 464–473. New York, NY, USA.
• Zhang, Z., Dai, G., and Jordan, M. I. (2010). “Matrix-variate Dirichlet process mixture models.” In Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (AISTATS).