I am trying to create a new ID for groups that should be linked together as they share common members. Example below shows the problem.
The first table tells me that dedupe id 1 has members 12,23 and 34
Dedupe ID # | Member |
---|---|
1 | 12 |
1 | 23 |
1 | 34 |
2 | 56 |
... | ... |
While the second table tells me dedupe id 5 has members 12, 23 and 47. Noting that 12 and 23 are common with dedupe id 1 in the first table.
Dedupe ID # | Member |
---|---|
5 | 12 |
5 | 23 |
5 | 47 |
2 | 7 |
... | ... |
As groups 1 and 5 overlap - I would like to ensure that members 12, 23, 34 and 47 are joined into 1 dedupe id, like below
dedupe ID # | Member |
---|---|
1 | 12 |
1 | 23 |
1 | 34 |
1 | 47 |
2 | 56 |
... | ... |
@akrun, good question. Maybe the better way of doing this is to update table 1 so that dedupeid 1 maps to Member 47 (see example above). All of the other mappings in table 1 should stay.
I fear this might succumb to multiple common members, and its iteratively appending to dat1
is a bit inefficient at scale, but ...
members1 <- split(dat1$Member, dat1$Group)
members2 <- split(dat2$Member, dat2$Group)
for (G in unique(dat1$Group)) {
m1 <- dat1$Member[dat1$Group == G]
incommon <- lengths(lapply(members2, function(m2) intersect(dat1$Member[dat1$Group == G], m2)))
move2 <- subset(dat2, Group %in% names(members2)[incommon > 0] & !Member %in% m1)
if (nrow(move2)) {
dat1 <- rbind(dat1, transform(move2, Group = G))
dat2 <- subset(dat2, !Group %in% names(members2)[incommon > 0])
}
}
rbind(dat1, dat2)
# Group Member
# 1 A 1
# 2 A 2
# 3 A 3
# 4 C 5
# 31 A 4
# 41 D 7
Data
dat1 <- structure(list(Group = c("A", "A", "A", "C"), Member = c(1L, 2L, 3L, 5L)), class = "data.frame", row.names = c(NA, -4L))
dat2 <- structure(list(Group = c("B", "B", "B", "D"), Member = c(2L, 3L, 4L, 7L)), class = "data.frame", row.names = c(NA, -4L))