I am doing grouping for some data, like entity data. I have found the groups based on some entity attributes, like this:
df <- data.frame(uniq_index.x = c(1426, 1426, 1426, 1426, 7796, 7796, 7796, 7796,
7159, 7159, 7159, 7159, 7857, 7857, 7857, 7857,
7158, 7158, 7158, 7158, 5440, 9861, 1641, 8685,
1644, 7525, 6030, 5672),
uniq_index.y = c(7796, 7159, 7857, 7158, 1426, 7159, 7857, 7158,
1426, 7796, 7857, 7158, 1426, 7796, 7159, 7158,
1426, 7796, 7159, 7857, 9861, 5440, 8685, 1641,
7525, 1644, 5673, 6030)
)
# grouping
a <- df %>%
group_by(uniq_index.x) %>%
group_split
From the above data, "1426", "7796", "7159", "7877" and "7158" should be in the same group; 5672, 5673 and 6030 should be in another group. I can use group_by
and group_split
to get groups.
However, giving there are duplicated groups, I used the following code to get the unique groups:
# initial an empty dataframe
b <- data.frame(V1 = character())
# loop through a (which is obtained from group_split)
for (i in 1:length(a)) {
x <- a[[i]][,1]
y <- a[[i]][,2]
x <- x %>%
mutate(uniq_index = uniq_index.x) %>%
select(uniq_index)
y <- y %>%
mutate(uniq_index = uniq_index.y) %>%
select(uniq_index)
z <- unique(x) %>%
rbind(y) %>%
arrange(uniq_index)
b <- b %>%
rbind(paste(z))
}
# unique groups
b <- b %>%
unique() %>%
mutate(
uniq_agency_id = 100000 + 1:nrow(unique(b))
)
Then, I noticed this issue:
Similar to the sample data that (6030, 5672) and (5673, 6030) are two separate groups. These two groups should be in one big group.
I am struggling to think of a logic to obtained the combine unique groups.
The solution to this is everywhere in this website. Here is one way using igraph
:
igraph::components(igraph::graph_from_data_frame(df))$membership
1426 7796 7159 7857 7158 5440 9861 1641 8685 1644 7525 6030 5672 5673
1 1 1 1 1 2 2 3 3 4 4 5 5 5