Search code examples
rnetwork-programmingigraphmeasure

Difference in closeness function in R and manual computation


I have a undirected weighted graph where I want to calculate the closeness measure for. As per igraph documentation, it is the reciprocal of average shortest paths. I compute the shortest paths and inverse their average but still don't get the same value as in closeness function. Why is this happening? What am I missing?

Here's my code:

dput(c$estimate)
structure(c(1, 10000, 10000, 2.69857209553848, 5.77115055524614, 
1.95672007809809, 2.98690863617922, 1.92161847347611, 10000, 
10000, 10000, 10000, 1, 1.97201563662035, 5.4078452590091, 10000, 
6.8534542161595, 3.51453278996925, 10000, 10000, 2.08964950396744, 
10000, 10000, 1.97201563662034, 1, 2.78868220464485, 10000, 3.41857460835551, 
10000, 1.96044036389546, 10000, 10000, 10000, 2.69857209553835, 
5.40784525900909, 2.78868220464486, 1, 10000, 10000, 3.54317409176484, 
10000, 2.33889236077342, 10000, 10000, 5.77115055524604, 10000, 
10000, 10000, 1, 10000, 10000, 10000, 10000, 10000, 10000, 1.95672007809807, 
6.85345421615961, 3.41857460835555, 10000, 10000, 1, 10000, 10000, 
2.49075030691086, 10000, 10000, 2.98690863617922, 3.51453278996926, 
10000, 3.54317409176474, 10000, 10000, 1, 10000, 10000, 10000, 
1.73687483250751, 1.92161847347613, 10000, 1.96044036389548, 
10000, 10000, 10000, 10000, 1, 4.24032760636799, 3.11756167665886, 
5.07827243244947, 10000, 10000, 10000, 2.33889236077345, 10000, 
2.49075030691088, 10000, 4.24032760636804, 1, 10000, 1.69643890905686, 
10000, 2.08964950396742, 10000, 10000, 10000, 10000, 10000, 3.11756167665892, 
10000, 1, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 1.73687483250752, 
5.0782724324492, 1.69643890905687, 10000, 1), .Dim = c(11L, 11L
), .Dimnames = list(c("jpm", "gs", "ms", "bofa", "schwab", "brk", 
"wf", "citi", "amex", "spgl", "pnc"), c("jpm", "gs", "ms", "bofa", 
"schwab", "brk", "wf", "citi", "amex", "spgl", "pnc")))

g <- graph_from_adjacency_matrix(c$estimate, weighted="wt", mode="undirected", diag=F)

closeness(g,weights= round(E(g)$wt,2))
       jpm         gs         ms       bofa     schwab        brk         wf       citi 
0.02503756 0.01877229 0.02203614 0.02151463 0.01088495 0.02189621 0.02226180 0.02418380 
      amex       spgl        pnc 
0.01988072 0.01632387 0.01913509 

# manual
a <- shortest.paths(g,weights=round(E(g)$wt,2))
1/rowMeans(a)
      jpm        gs        ms      bofa    schwab       brk        wf      citi      amex 
0.2799695 0.2143414 0.2435245 0.2457002 0.1205876 0.2408583 0.2448798 0.2660218 0.2276490 
     spgl       pnc 
0.1855914 0.2140078 

Solution

  • There are two places you may need to be aware of:

    1. You should enable normalized = TRUE in closeness
    2. When you attempt to use shortest path lengths to define closeness centrality, you should know that the the distance is averaged over the distances excluding itself. Thus, vcount(g)-1 is the denominator for averaging, instead of vcount(g), and that's why should shouldn't use rowMeans.

    From the code below, you can see that the results by two methods are close to each other (minor difference might come from the precision, but I am not sure)

    > closeness(g,weights = E(g)$wt,normalized = TRUE)
          jpm        gs        ms      bofa    schwab       brk        wf      citi 
    0.2504451 0.1876864 0.2203154 0.2151935 0.1088503 0.2190827 0.2226391 0.2418350
         amex      spgl       pnc
    0.1988941 0.1632546 0.1914826
    
    > (vcount(g) - 1) / rowSums(shortest.paths(g, weights = E(g)$wt))
          jpm        gs        ms      bofa    schwab       brk        wf      citi
    0.2545725 0.1947856 0.2213624 0.2234093 0.1096228 0.2190827 0.2226391 0.2418350 
         amex      spgl       pnc
    0.2070431 0.1687258 0.1946688