## Bayesian Analysis

### Improving classification when a class hierarchy is available using a hierarchy-based prior

#### Abstract

We introduce a new method for building classification models when we have prior knowledge of how the classes can be arranged in a hierarchy, based on how easily they can be distinguished. The new method uses a Bayesian form of the multinomial logit (MNL, a.k.a. "softmax") model, with a prior that introduces correlations between the parameters for classes that are nearby in the tree. We compare the performance on simulated data of the new method, the ordinary MNL model, and a model that uses the hierarchy in a different way. We also test the new method on page layout analysis and document classification problems, and find that it performs better than the other methods.

#### Article information

Source
Bayesian Anal. Volume 2, Number 1 (2007), 221-237.

Dates
First available in Project Euclid: 22 June 2012

http://projecteuclid.org/euclid.ba/1340390069

Digital Object Identifier
doi:10.1214/07-BA209

Mathematical Reviews number (MathSciNet)
MR2289929

Zentralblatt MATH identifier
1331.62316

Subjects
Primary: Database Expansion Item

#### Citation

Shahbaba, Babak; Neal, Radford M. Improving classification when a class hierarchy is available using a hierarchy-based prior. Bayesian Anal. 2 (2007), no. 1, 221--237. doi:10.1214/07-BA209. http://projecteuclid.org/euclid.ba/1340390069.

#### References

• Agresti, A. (2002) Categorical Data Analysis. John Willey and Son, Hoboken, New Jersy.
• Cai, L. and Hoffmann, T. (2004) Hierarchical document categorization with support vector machines. ACM 13th Conference on Information and Knowledge Management.
• Cesa-Bianchi, N., Gentile, C. and Zaniboni, L. (2006) Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7, 31–54.
• Dekel, O., Keshet, J. and Singer, Y. (2004) Large margin hierarchical classification. In Proceedings of the 21st International Conference on Machine Learning (ICML).
• Dumais, S. T. and Chen, H. (2000) Hierachical classification of web content. In Proceedings of the 23rd ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 256–263.
• Fox, J. (1997) Applied Regression Analysis, Linear Models and Related Methods. Sage.
• Goodman, J. (2001) Classes for fast maximum entropy training. Proceedings of the IEEE International Conference on Acoustics, Speach and Signal Processing (ICASSP), IEEE press.
• Koller, D. and Sahami, M. (1997) Hierarchically classifying documents using very few words. In Proceedings of the 14th International Conference on Machine Learning (ICML).
• Laven, K., Leishman, S. and Roweis, S. (2005) A statistical learning approach to document image analysis. Conference on Document Analysis and Recognition (ICDAR), Seoul, South Korea.
• McCallum, A., Rosenfeld, R., Mitchell, T. and A., N. (1998) Improving text classification by shrinkage in a hierarchy of classes. In Proceedings of the International Conference on Machine Learning (ICML), pp. 359–360.
• McFadden, D. (1980) Econometric models for probabilistic choice among products. Journal of Business, 53, 13–36.
• Mitchell, T. M. (1998) Conditions for the equivalence of hierarchical and flat Bayesian classifiers. http://www.cs.cmu.edu/$\sim$tom/hierproof.ps.
• Neal, R. M. (1993) Probabilistic Inference Using Markov Chain Monte Carlo Methods. Technical Report CRG-TR-93-1, Department of Computer Science, University of Toronto.
• Neal, R. M. (1996) Bayesian Learning for Neural Networks. Springer Verlag, New York.
• Neal, R. M. (2003) Slice sampling. Annals of Statistics, 31, 705–767.
• Riley, M. (1993) Functions of the gene products of Escherichia coli. Microbiology Review, 57, 862–952.
• Sattath, S. and Tversky, A. (1977) Additive similarity trees. Psychometrika, 42, 319–345.
• Shahbaba, B. and Neal, R. M. (2006) Gene function classification using Bayesian models with hierarchy-based priors. BMC Bioinformatics, 7:448.
• Tsochantaridis, I., Hoffmann, T., Joachims, T. and Altum, Y. (2004) Support vector machine learning for independent and structured output spaces. Proceedings of the 21st International Conference on Machine Learning (ICML).
• van Rijsbergen, C. J. (1972) Automatic Information Structuring and Retrieval. Ph.D. thesis, King's College, Cambridge.
• Weigend, A. S., Wiener, E. D. and Pedersen, J. O. (1999) Exploiting hierarchy in text categorization. Information Retrieval, 1, 193–216.