Say I have 100 nodes, then I give each a unique ID from 1:100.
If I wanted a list of every combination of nodes, I believe it would be of length 2^100
. This is if any node can be missing from the diagram.
But say I have a dataframe that represents a connections between nodes:
head(conn_)
from to
2 1 2
3 1 4
4 2 3
5 2 5
6 4 6
7 154 100
8 102 101
so, row one of this df states there exists a connection from node 11
to node 10
Say I want to enumerate every combination of valid nodes, but a combination is only valid if there is no broken connection between the elements of the set. How could I do this?
For example if I have nodes 1->2->3->4->5->6->7->8->9
, where ->
represents a two way connection (1
connects to 2
and 2
connects to 1
), then two valid subsets would be {1, 2, 3} & {4, 5, 6}
, but an invalid subset would be {1, 3, 4, 6}
. It would be invalid because there is a broken connection between two elements in the set.
If one node connects to multiple other nodes, that counts as a valid connection, meaning for the dataframe above I can have a valid set {1, 2, 4, 6}
I'm having a really hard time trying to figure out a method to do this, recursively or with for/while loops.
Also, if there is a max of five two-way connections per node, for the case of 100 nodes, then is it possible to enumerate all? How does this problem grow?
edit:
Here is an example of input / output:
conn_ =
from to
1 2
1 4
2 3
2 5
4 6
Expected output : { {1}, {1, 2}, {1, 4}, {1, 2, 4}, {1, 2, 3}, {1, 2, 5}, {1, 4, 6}, {1, 2, 4, 6}, {1, 2, 3, 4}, {1, 2, 3, 4, 6}, {1, 2, 3, 4, 5, 6}, {2}, {2, 3}, {2, 5}, {2, 3, 5}, {3}, {4}, {4, 6} }
Notice {1, 3, 5}
is not in the output, because there can't exist a break between elements in the set, but {1, 2, 4, 6}
is valid because 1
connects to 2
and 1
connects to 4
Here is a solution with igraph. It will quickly exhaust your resources for big graphs with high connectivity.
Basically, we search for all paths from each vertex. That will give us each combination twice, so we subset that to unique combinations in the end. Someone who knows more about graphs than I do might be able to create a more efficient solution.
DF <- read.table(text = "from to
1 2
1 4
2 3
2 5
4 6", header = TRUE)
library(igraph)
g <- graph_from_data_frame(DF, directed = FALSE)
plot(g)
#all paths starting from each vertex
paths <- unlist(lapply(V(g), function(from) all_simple_paths(g, from)), FALSE)
res <- lapply(paths, names) #extract vertex names from each path
res <- c(as.list(names(V(g))), res) #add single vertices
res <- lapply(res, sort) #sort
res <- res[!duplicated(res)] #remove duplicates
#for compact printing:
unname(sapply(res, paste, collapse = ","))
#[1] "1" "2" "4" "3" "5" "6" "1,2" "1,2,3" "1,2,5" "1,4" "1,4,6" "1,2,4" "1,2,4,6" "2,3"
#[15] "2,5" "1,2,3,4" "1,2,4,5" "4,6" "1,2,3,4,6" "2,3,5" "1,2,4,5,6"