I am trying to find all downstream nodes in a large graph object (~1 million edges).
In this example, I want to see all nodes that are downstream from 4:
The below code is adapted from this answer using igraph
but it ran for over 20 minutes before I ended it. I saw the hR
package (see here) but it also has trouble with the size of the data.
I'm new to graph analyses and not sure if there are better methods
library(tidyverse)
library(igraph)
graph_df <- # real example has 1.2M connections
make_tree(100000, children = 3)
leafnodes <- # bottleneck
sapply(
X = V(graph_df),
FUN = \(x) length(neighbors(graph_df, x)) == 0
)
find_all_children <- function(root) {
paths <-
get.all.shortest.paths(
graph = graph_df,
from = V(graph_df)[root],
to = leafnodes
)$res
tibble(
path = map_chr(paths, ~paste(.x, collapse = " -> "))
)
}
# desired output
find_all_children(12)
#> 12 -> 35 -> 104 -> 311 -> 932 -> 2795 -> 8384
#> 12 -> 35 -> 104 -> 311 -> 932 -> 2795 -> 8385
#> 12 -> 35 -> 104 -> 311 -> 932 -> 2795 -> 8386
find_all_children(1234)
#> 1234 -> 3701 -> 11102 -> 33305 -> 99914
#> 1234 -> 3701 -> 11102 -> 33305 -> 99915
#> 1234 -> 3701 -> 11102 -> 33305 -> 99916
Bonus question:
Is there a purrr
method to replace the sapply()
here? I didn't have any luck with map()
Probably you can try subgraph
+ subcomponent
to prune your tree first and then find all_simple_paths
, e.g.,
n <- 100000
g <- make_tree(n, children = 3) %>%
set_vertex_attr(name = "name", value = seq(vcount(.)))
find_all_children <- function(g, root) {
g %>%
subgraph(subcomponent(., root, "out")) %>%
all_simple_paths(1, V(.)[degree(., mode = "out") == 0])
}
and run the following lines for example
r12 <- find_all_children(g, 12)
r1234 <- find_all_children(g, 1234)
which might be done in seconds even with n <- 100000
.
n <- 20
g <- make_tree(n, children = 2) %>%
set_vertex_attr(name = "name", value = seq(vcount(.)))
plot(g)
and we can obtain
> find_all_children(g, 3)
[[1]]
+ 3/7 vertices, named, from dab4822:
[1] 3 6 12
[[2]]
+ 3/7 vertices, named, from dab4822:
[1] 3 6 13
[[3]]
+ 3/7 vertices, named, from dab4822:
[1] 3 7 14
[[4]]
+ 3/7 vertices, named, from dab4822:
[1] 3 7 15