Search code examples
matrixtransitionprobabilityinformation-retrievalpagerank

Information Retrieval - Adjancey Matrix Graph Sketch, Teleportation Probability, Calculate PageRank


I am doing a few thing on Information Retrieval and have an exam coming up and I am absolutely clueless. First of, could anyone recommend me the shortest and best description possible for what PageRank actually is in Information Retrieval? Maybe even a good short video or your own description. I know Google use to, or did use it.

I know there are a lot of questions here but I could use as MUCH help as possible in a short length of time.

So my first question (taken from past papers, and making my own examples):

I am wanting to take a table such as:

    A   B   C
A   0   1   0
B   1   0   1
C   0   0   0

And create a graph. I believe this is correct but unsure (I could use a "yes that is correct" or a "no": enter image description here

And if I was given a graph such as: enter image description here

The table would be:

    A   B   C
A   0   1   0
B   0   0   1
C   0   0   0

Is that correct? If not, could I please get help and get it described somewhere? The lecture I am reading is not great at explaining and my lecturer isn't great at helping either.

Next I will probably be asked to use Teleportation Probability on the first table. This I desperately need help in. If the probability(the special a symbol)=1/2, does this mean multiply everything, including the 0's in the table such as 0x1/2? also 1x1/2? This is for the matrix of transition probabilities.

Next would be, how can I calculate PageRank from the above matrix. Using matrix multiplication. In words or in Pseudocode.

Another question I want to know is, will a user's page rank on twitter increase if they follow another user? I was assuming this would be a no because they are not following the user back?

Does a user's pagerank depend on how frequently you find said user if you start at a random user and click on another random persona and such till you find them? I assume this one is definitely not true. Because they might not be following said user.

I know this is a lot to ask. Does anyone have tutorials I can follow for either that are not complicated and I can look at and get it mastered today?

Thanks I really appreciate all your help. I know not one person can answer them all but can help provide assistance for some.


Solution

  • here's my stab at answering your questions:

    good learning resource: http://en.wikipedia.org/wiki/PageRank#Simplified_algorithm (no doubt you've see it already, but it's a pretty good one). Start there, understand the algorithm first, then do the implementation.

    this might be a good simple method to implement? http://pr.efactory.de/e-pagerank-algorithm.shtml

    or this: http://www.cs.princeton.edu/~chazelle/courses/BIB/pagerank.htm

    I'm guessing you can program in Python (common school language), in that case you might be interested in a package for handling graphs which has pagerank calculations: http://networkx.lanl.gov/reference/generated/networkx.algorithms.link_analysis.pagerank_alg.pagerank.html. If you have to write your own pagerank algorithm (very doable), you could use that to check the results.

    For the matrix -> graph conversion question: your professor needs to specify how directionality is encoded in the matrix. Does a 1 at B,C specify a link from B to C or from C to B? My guess would be B to C. If that's true, your first graph is wrong there, but the second graph is ok. Directionality is very important in PageRank.

    I believe the Teleportation probability is the probability that a random walker executing a new step will jump to a random node in the graph. It's in the wikipedia page under "damping factor". I don't know how it ties into multiplying numbers in your matrix.

    For the Twitter question - yes, I think you have it right. Linking to (or presumably following) a second person does nothing directly to the the first person's pagerank, but it likely increases the second person's pagerank. In practice, there could be secondary effects, like the second person noticing that the first person is interesting and following them back.

    second to last question - yes, one formulation of the pagerank algorithm is as a random walk along links with the frequency of encountering a node (page) going into the pagerank.

    good luck!