I have a dataset that consists of linked nodes, that I'm trying to convert to a simple relational table. The structure is like this:
Key1 Key2
A A
A B
A C
B A
B B
B C
C A
C B
C C
D D
D E
E D
E E
F F
At the end of the day, I'm trying to figure out if there's a way in R (outside of a loop, which would be too slow given the size of the set) to get every possible related values under a new unique master ID. The final dataset would like something like this:
Master Key
1 A
1 B
1 C
2 D
2 E
3 F
I can't find anything on the topic because I'm likely asking the question without the proper terminology.
Any help is appreciated!
This is simply getting the connected components of the graph.
Using your data:
Dat = read.table(text="Key1 Key2
A A
A B
A C
B A
B B
B C
C A
C B
C C
D D
D E
E D
E E
F F",
header=TRUE)
We turn the edges into a graph and get the connected components.
library(igraph)
g = graph_from_edgelist(as.matrix(Dat), directed=FALSE)
components(g)$membership
A B C D E F
1 1 1 2 2 3
Note that components(g)$membership
is a vector with named components. The A,B,C,D,E,F can be accessed with names(components(g)$membership)