Search code examples
rigraphnetwork-efficiency

Computing graph efficiency metrics of a very large network


What are strategies for calculating the nodal and global efficiency of very large graphs in R?

I'm attempting to calculate the global efficiency of a very large igraph with brainGraph::efficiency(my_graph, type = "global").

library(igraph); library(brainGraph)  
g <- readRDS("path_to_my_graph.rds")  

> ecount(g); vcount(g) # count of edges and vertices
[1] 715758
[1] 290190

It reliably crashes R every time. Global efficiency is the mean of all nodal efficiencies, so I've attempted to calculate it that way with no success. The weights on each edge of my graph are all 1, so I've omitted the weights, but R still crashes.

# all of these cause R to crash
efficiency(g, type = "global")
efficiency(g, type = "nodal")
efficiency(g, type = "global", weights = NA)
efficiency(g, type = "nodal",  weights = NA)

My graph (~37MB) is available here on GoogleDrive as an .rds file for those who want data to test.


Solution

  • R crashes because brainGraph::efficiency() attempts to calculate an enormous and dense distance matrix, which overwhelms the memory of my machine (32 GB). But I found a solution that chunks out the operation and runs in parallel.

    Global efficiency is the mean of all nodal efficiencies in the graph. The nodal efficiency of a vertex i is:

    enter image description here

    We can sequentially calculate the nodal efficiency for each vertex in the graph, which splits the distance matrix calculation into smaller manageable bits. Because the efficiency of each vertex is independent, we can parallelize the operation so it doesn't take forever.

    library(igraph)  
    library(doParallel)
    
    # nodal efficiency function
    get_eff <- function(i){return((1/(length(V(g)) - 1))*sum(1/distances(g, V(g)[i])[-i]))}
    
    no_cores <- detectCores() - 1 
    cl       <- makeCluster(no_cores)
    registerDoParallel(cl)  
    
    result <- foreach(i = seq_along(V(g)), .combine = c, .packages = "igraph") %dopar% get_eff(i)
    
    stopCluster(cl)
    rm(cl)
    
    global_eff <- mean(result)
    

    Moreover, we can plot the distribution of nodal efficiencies, as well as the global (mean) efficiency, which gives us a better understanding of the network.

    library(ggplot2)
    data.frame(x = result) %>% 
      ggplot(aes(x)) + 
      geom_histogram() + 
      geom_vline(xintercept = mean(result), col = "red") # global efficiency
      theme_minimal() +
      labs(title = "Nodal efficiences", x = "", y = "Count")
    

    enter image description here