I started to read those papers by Domingos to follow their idea on Markov logic. Now I have already read a few pages from their tutorials and wish to know more about their past research background.
This paper talks about ranking, which I am not quite familiar with. This paper introduces two previous methods, i.e. HITS and PageRank, which I'd like to scan later. The HITS model is little bit complicated and can't be served online (they have to compute hubs and authorities at query time) while PageRank can utilize the PageRank at offline stage (i.e. after the crawling, they may compute the PageRank using incoming links and outgoing links, which later serves as a factor contributing to the final ranking function).
The graph PageRank builds consists of links:
- using the outgoing links and incoming links;
- if a webpage has no outgoing links, it links to all other pages;
\displaystyle P(j) = \frac{1 - \beta}{N} + \beta \sum_{i \in B(j)} \frac{P(i)}{|F_i|}where
B(j)
contains all pages link to page j
and F(i)
contains the pages linked from page i
.This paper's main idea is to incorporate the query information into the PageRank, which, you may have already seen from tabove, contains nothing about the query
q
. LetP_q(j) = (1 - \beta) P_q'(j) + \beta \sum_{i \in B_j} P_q(i) P_q(i \to j), where those query related terms are all derived from query relevance scores, e.g.
P_q'(j) = R_q(i) / \sum_{i \in W}R_q(i), \qquad P_q(i \to j) = R_q(i) / \sum_{j \in F_i} R_q(j).
To overcome the computation problem, the author suggests we pre-compute the ranking score for the search queries offline.
No comments:
Post a Comment