Registered users receive a variety of benefits including the ability to customize email alerts, create favorite journals list, and save searches.
Please note that a Project Euclid web account does not automatically grant access to full-text content. An institutional or society member subscription is required to view non-Open Access content.
Contact firstname.lastname@example.org with any questions.
Recently, the research community has devoted increased attention to reducing the computational time needed by web ranking algorithms. In particular, many techniques have been proposed to speed up the well-known PageRank algorithm used by Google. This interest is motivated by two dominant factors: (1) the web graph has huge dimensions and is subject to dramatic updates in terms of nodes and links, therefore the PageRank assignment tends to became obsolete very soon; (2) many PageRank vectors need to be computed according to different choices of the personalization vectors or when adopting strategies of collusion detection.
In this paper, we show how the PageRank computation in the original random surfer model can be transformed in the problem of computing the solution of a sparse linear system. The sparsity of the obtained linear system makes it possible to exploit the effectiveness of the Markov chain index reordering to speed up the PageRank computation. In particular, we rearrange the system matrix according to several permutations, and we apply different scalar and block iterative methods to solve smaller linear systems. We tested our approaches on web graphs crawled from the net. The largest one contains about 24 millions nodes and more than 100 million links. Upon this web graph, the cost for computing the PageRank is reduced by 65% in terms of Mflops and by 92% in terms of time respect to the power method commonly used.
In this paper we study the size of generalised dominating sets in two graph processes that are widely used to model aspects of the World Wide Web. On the one hand, we show that graphs generated this way have fairly large dominating sets (i.e., linear in the size of the graph). On the other hand, we present efficient strategies to construct small dominating sets.
The algorithmic results represent an application of a particular analysis technique which can be used to characterise the asymptotic behaviour of a number of dynamic processes related to the web.
The link structure of the web is analyzed to measure the authority of pages, which can be taken into account for ranking query results. Due to the enormous dynamics of the web, with millions of pages created, updated, deleted, and linked to every day, temporal aspects of web pages and links are crucial factors for their evaluation. Users are interested in important pages (i.e., pages with high authority score) but are equally interested in the recency of information. Time—and thus the freshness of web content and link structure—emanates as a factor that should be taken into account in link analysis when computing the importance of a page. So far only minor effort has been spent on the integration of temporal aspects into link-analysis techniques. In this paper we introduce T-Rank Light and T-Rank, two link-analysis approaches that take into account the temporal aspects freshness (i.e., timestamps of most recent updates) and activity (i.e., update rates) of pages and links. Experimental results show that T-Rank Light and T-Rank can produce better rankings of web pages.
Personalized PageRank expresses link-based page quality around userselected pages in a similar way as PageRank expresses quality over the entire web. Existing personalized PageRank algorithms can, however, serve online queries only for a restricted choice of pages. In this paper we achieve full personalization by a novel algorithm that precomputes a compact database; using this database, it can serve online responses to arbitrary user-selected personalization. The algorithm uses simulated random walks; we prove that for a fixed error probability the size of our database is linear in the number of web pages. We justify our estimation approach by asymptotic worst-case lower bounds: we show that on some sets of graphs, exact personalized PageRank values can only be obtained from a database of size quadratic in the number of vertices. Furthermore, we evaluate the precision of approximation experimentally on the Stanford WebBase graph.
The small-world phenomenon includes both small average distance and the clustering effect. Randomly generated graphs with a power law degree distribution are widely used to model large real-world networks, but while these graphs have small average distance, they generally do not exhibit the clustering effect. We introduce an improved hybrid model that combines a global graph (a random power law graph) with a local graph (a graph with high local connectivity defined by network flow). We present an efficient algorithm that extracts a local graph from a given realistic network. We show that the underlying local graph is robust in the sense that when our extraction algorithm is applied to a hybrid graph, it recovers the original local graph with a small error. The proof involves a probabilistic analysis of the growth of neighborhoods in the hybrid graph model.
Deciding which kind of visiting strategy accumulates high-quality pages more quickly is one of the most often debated issues in the design of web crawlers.
This paper proposes a related, and previously overlooked, measure of effectiveness for crawl strategies: whether the graph obtained after a partial visit is in some sense representative of the underlying web graph as far as the computation of PageRank is concerned. More precisely, we are interested in determining how rapidly the computation of PageRank over the visited subgraph yields node orders that agree with the ones computed in the complete graph; orders are compared using Kendall’s τ .
We describe a number of large-scale experiments that show the following paradoxical effect: visits that gather PageRank more quickly (e.g., highest-quality first) are also those that tend to miscalculate PageRank. Finally, we perform the same kind of experimental analysis on some synthetic random graphs, generated using well-known web-graph models: the results are almost opposite to those obtained on real web graphs.