Search code examples
rperformanceoptimizationcluster-analysisigraph

igraph in R: efficiently count number of edges between multiple sets of vertices


I want to calculate a matrix of edge counts for a given graph and partition of this graph into groups. The solution I have at the moment does not scale for large graphs and I wonder if it is possible to speed up the computation.

I want to use the igraph R-package for this, so for a graph G and two sets of vertices set1, set2 I currently calculate the number of edges from one set to the other by using igraph's %->% operator.

el <- E(G)[set1 %->% set2]
length(el)

I wonder if there is a faster way to do this either natively in igraph or by homebrewing some solution (maybe using Rcpp)?

Example code

library(igraph)

# Set up toy graph and partition into two blocks
G <- make_full_graph(20, directed=TRUE)
m <- c(rep(1,10), rep(2,10))

block_edge_counts <- function(G, m){
  # Get list of vertices per group.
  c <- make_clusters(G, m, modularity = FALSE)
  # Calculate matrix of edge counts between blocks.
  E <- sapply(seq_along(c), function(r){
    sapply(seq_along(c), function(s){
      # Iterate over all block pairs
      el <- E(G)[c[[r]] %->% c[[s]]] # list of edges from block r to block s
      length(el) # get number of edges
    })
  })
}

block_edge_counts(G, m)
#>      [,1] [,2]
#> [1,]   90  100
#> [2,]  100   90

Example benchmark

#> install.packages("bench")

results <- bench::press(
  Nsize = c(10,100,1000),
  {
    G <- make_full_graph(Nsize, directed=TRUE)
    m <- c(rep(1,.5*Nsize), rep(2,.5*Nsize))
    bench::mark(block_edge_counts(G,m))
  }
)
#> Running with:
#>   Nsize
#> 1    10
#> 2   100
#> 3  1000
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
results
#> # A tibble: 3 × 7
#>   expression              Nsize      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>              <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 block_edge_counts(G, m)    10   1.24ms   1.31ms    752.     39.97KB     12.6
#> 2 block_edge_counts(G, m)   100   3.24ms   3.37ms    284.      4.11MB     32.1
#> 3 block_edge_counts(G, m)  1000 262.73ms 323.61ms      3.29  408.35MB     34.2

Solution

For now I'm going with

block_edge_counts <- function(graph, partition) {
  CG <- contract(graph, partition)
  E <- as_adjacency_matrix(CG, sparse=FALSE)
}

Solution

  • One way is to contract the two sets into single vertices, then count edges between them.

    Contract the two sets, obtaining vertex id 1 for the first and 2 for the second:

    CG <- contract(G, m)
    

    Now count 1 -> 2 and 2 -> 1 edges:

    > count_multiple(CG, get.edge.ids(CG, c(1,2, 2,1)))
    [1] 100 100
    

    If you have a full partitioning of the graph, then you can count the fraction of intra-partition edges using

    modularity(G, m, resolution = 0)
    

    Your example has a full partitioning into two sets, so you can get the number of edges between these two as

    > ecount(G)*(1 - modularity(G, m, resolution = 0))
    [1] 200