I want to calculate a matrix of edge counts for a given graph and partition of this graph into groups. The solution I have at the moment does not scale for large graphs and I wonder if it is possible to speed up the computation.
I want to use the igraph
R-package for this, so for a graph G
and two sets of vertices set1
, set2
I currently calculate the number of edges from one set to the other by using igraph's %->%
operator.
el <- E(G)[set1 %->% set2]
length(el)
I wonder if there is a faster way to do this either natively in igraph
or by homebrewing some solution (maybe using Rcpp
)?
library(igraph)
# Set up toy graph and partition into two blocks
G <- make_full_graph(20, directed=TRUE)
m <- c(rep(1,10), rep(2,10))
block_edge_counts <- function(G, m){
# Get list of vertices per group.
c <- make_clusters(G, m, modularity = FALSE)
# Calculate matrix of edge counts between blocks.
E <- sapply(seq_along(c), function(r){
sapply(seq_along(c), function(s){
# Iterate over all block pairs
el <- E(G)[c[[r]] %->% c[[s]]] # list of edges from block r to block s
length(el) # get number of edges
})
})
}
block_edge_counts(G, m)
#> [,1] [,2]
#> [1,] 90 100
#> [2,] 100 90
#> install.packages("bench")
results <- bench::press(
Nsize = c(10,100,1000),
{
G <- make_full_graph(Nsize, directed=TRUE)
m <- c(rep(1,.5*Nsize), rep(2,.5*Nsize))
bench::mark(block_edge_counts(G,m))
}
)
#> Running with:
#> Nsize
#> 1 10
#> 2 100
#> 3 1000
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
results
#> # A tibble: 3 × 7
#> expression Nsize min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 block_edge_counts(G, m) 10 1.24ms 1.31ms 752. 39.97KB 12.6
#> 2 block_edge_counts(G, m) 100 3.24ms 3.37ms 284. 4.11MB 32.1
#> 3 block_edge_counts(G, m) 1000 262.73ms 323.61ms 3.29 408.35MB 34.2
For now I'm going with
block_edge_counts <- function(graph, partition) {
CG <- contract(graph, partition)
E <- as_adjacency_matrix(CG, sparse=FALSE)
}
One way is to contract the two sets into single vertices, then count edges between them.
Contract the two sets, obtaining vertex id 1 for the first and 2 for the second:
CG <- contract(G, m)
Now count 1 -> 2
and 2 -> 1
edges:
> count_multiple(CG, get.edge.ids(CG, c(1,2, 2,1)))
[1] 100 100
If you have a full partitioning of the graph, then you can count the fraction of intra-partition edges using
modularity(G, m, resolution = 0)
Your example has a full partitioning into two sets, so you can get the number of edges between these two as
> ecount(G)*(1 - modularity(G, m, resolution = 0))
[1] 200