Search code examples
rigraphsnanetwork-analysis

How to compute closeness centrality measure on a network with disconnected components in R?


I want to compute closeness centrality measure on a network with disconnected components. closeness function in igraph does not give meaningful results on such graphs. (see)

Then I came accross this site where it is explained that closeness can be measured on graphs with disconnected components as well.

The following code is what is suggested to achieve this:

# Load tnet
library(tnet)
 
# Load network 
# Node K is assigned node id 8 instead of 10 as isolates at the end of id sequences are not recorded in edgelists
net <- cbind(
  i=c(1,1,2,2,2,3,3,3,4,4,4,5,5,6,6,7,9,10,10,11),
  j=c(2,3,1,3,5,1,2,4,3,6,7,2,6,4,5,4,10,9,11,10),
  w=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
 
# Calculate measures
closeness_w(net, gconly=FALSE)

In my case, I have a transaction data, so the network I build on this data is directed and weighted. Weights consist of 1/(transaction amount).

This is my example data:

structure(list(id = c(2557L, 1602L, 18669L, 35900L, 48667L, 51341L
), from = c("5370", "6390", "5370", "5370", "8934", "5370"), 
    to = c("5636", "5370", "8933", "8483", "5370", "7626"), date = structure(c(13099, 
    13113, 13117, 13179, 13238, 13249), class = "Date"), amount = c(2921, 
    8000, 169.2, 71.5, 14.6, 4214)), row.names = c(NA, -6L), class = "data.frame")

I use the following code to achieve what I want:

df2 <- select(df,c(from,to,amount)) %>% 
    group_by(from,to) %>% mutate(weights=1/sum(amount)) %>% select(-amount) %>% distinct
  
  network <- cbind(df2$from,df2$to,df2$weights)

  cl <- closeness_w(network, directed = T, gconly=FALSE)  # here it gives the error: "Error in net[, "w"]^alpha : non-numeric argument to binary operator"

  # so I modify from and to columns as follows to solve the error mentioned above
  df2$from <- as.integer(df2$from)
  df2$to <- as.integer(df2$to)
  # then I run the code again
  network <- cbind(df2$from,df2$to,df2$weights)
  cl <- closeness_w(network, directed = T, gconly=FALSE)

However the output is not like the one on the website that is only consisting closeness scores for each node, instead it created so many rows with 0 value, I dont know why.

The output I got is as follows:

     node  closeness    n.closeness
   [1,]    1 0.00000000 0.000000000000
   [2,]    2 0.00000000 0.000000000000
   [3,]    3 0.00000000 0.000000000000
   [4,]    4 0.00000000 0.000000000000
   [5,]    5 0.00000000 0.000000000000
   ...........................................................
 [330,]  330 0.00000000 0.000000000000
 [331,]  331 0.00000000 0.000000000000
 [332,]  332 0.00000000 0.000000000000
 [333,]  333 0.00000000 0.000000000000
 [ reached getOption("max.print") -- omitted 8600 rows ]

Also, inputs in i and j columns in the data given on the website are reciprocal that is 1->2 exists iff 2->1 exists. But my data is not like that, so in my data 5370 sent money to 5636, but 5636 haven't sent any money to 5370. So, how can I compute closeness measure correctly on such directed network of transaction data. Is there anyone that tried a similar computation before?

EDIT: Since the weights are not considered as distance in closeness_w function, but rather they are considered as strength, I should have determined weights as sum(amount) instead of 1/sum(amount)


Solution

  • The reason you get many rows with zero values is because it provides a closeness value for nodes 1 to 8934 (max value in your matrix). If you filter for the values in your dataframe you'll find the values you're looking for:

    cl <- closeness_w(df2, directed = T, gconly=FALSE)
    cl[cl[, "node"] %in% c(df2$from), ]
    
         node  closeness  n.closeness
    [1,] 5370 1.37893704 1.543644e-04
    [2,] 6390 0.03668555 4.106745e-06
    [3,] 8934 5.80008056 6.492870e-04
    

    The direction has been accounted for, if you filter for the 'to' nodes you'll see only 5370 has a value:

    cl[cl[, "node"] %in% c(df2$to), ]
    
         node closeness  n.closeness
    [1,] 5370  1.378937 0.0001543644
    [2,] 5636  0.000000 0.0000000000
    [3,] 7626  0.000000 0.0000000000
    [4,] 8483  0.000000 0.0000000000
    [5,] 8933  0.000000 0.0000000000
    

    If you go back to the example you're following, if you remove nodes from the middle of the data you'll see that it gives zeros for missing nodes, and try setting directed = F and you'll notice the difference.

    Update:

    If you want an alternative to creating your network, after you create df2 you can just pass that into the closeness_w function and your node labels will become indices and the node column gets reduced to 1:n:

    df2 <- df %>% 
      group_by(from, to) %>% 
      mutate(weights = 1/sum(amount)) %>% 
      select(from, to, weights) %>% 
      distinct
    
    cl <- closeness_w(df2, directed = T, gconly=FALSE)
    cl 
    
         node  closeness n.closeness
    5370    1 1.37893704 0.229822840
    5636    2 0.00000000 0.000000000
    7626    3 0.00000000 0.000000000
    8483    4 0.00000000 0.000000000
    8933    5 0.00000000 0.000000000
    6390    6 0.03668555 0.006114259
    8934    7 5.80008056 0.966680093