Learning to Rank in Vector Spaces and Social Networks

Soumen Chakrabarti

doi:im/1243430609

2007 Learning to Rank in Vector Spaces and Social Networks

Soumen Chakrabarti

Internet Math. 4(2-3): 267-298 (2007).

Abstract

We survey machine learning techniques to learn ranking functions for entities represented as feature vectors as well as nodes in a social network. In the feature-vector scenario, an entity, e.g., a document $x$, is mapped to a feature vector $\psi(x)\in\mathbb{R}^d$ in a $d$-dimensional space, and we have to search for a weight vector $\beta\in\mathbb{R}^d$. The ranking is then based on the values of $\beta\cdot\psi(x)$. This case corresponds to information retrieval in the ``vector space'' model. Training data consists of a partial order of preference among entities. We study probabilistic Bayesian and maximum-margin approaches to solving this problem, including recent efficient near-linear-time approximate algorithms. In the graph node-ranking scenario, we briefly review PageRank, generalize it to arbitrary Markov conductance matrices, and consider the problem of learning conductance parameters from partial orders between nodes. In another class of formulation, the graph does not establish PageRank or prestige-flow relationships between nodes, but encourages a certain smoothness between the scores (ranks) of neighboring nodes. Some of these techniques have been used by Web search companies with very large query logs. We review some of the issues that arise when applying the theory to practical systems. Finally, we review connections between the stability of a score/rank-learning algorithm and its power to generalize to unforeseen test data.