Search code examples
rnetwork-programmingigraphpagerankpower-law

Fitting pagerank results to a power law distribution


I have calculated pagerank values for a hyperlink network of websites (about 1000 nodes). I have done this in R using igraph package.

I would now like to take the Top 10 pagerank values and visualise these top 10 websites against a power law graph, to give an idea of where they are situated in the graph.

How would I go about taking these results and plotting them against a power law graph (e.g. to illustrate which sites are further down the long tail).

I am just trying to figure out a general formula or technique.

The values are as follows:

0.0810
0.0330
0.0318
0.0186
0.0161
0.0160
0.0158
0.0149
0.0136
0.0133

Solution

  • The way I would do this is to plot the density of the connectivity, and overlay the plot with the top 10 points.

    Assuming you have the connectivity of all nodes already:

    d <- density(connectivity)
    top10 <- sort(connectivity, decreasing=TRUE)[1:10]
    
    # get the height of the density for each of the top10 nodes:
    top10y <- sapply(top10, function(node) {
      diffs <- abs(node - d$x)
      yloc <- which(diffs == min(diffs))[1] # in case more than one match
      d$y[yloc]
    })
    
    # now plot
    plot(d)
    points(top10, top10y, col="red")
    

    For example I've simulated the connectivity of 1000 nodes to follow a normal distribution:

    enter image description here