I'd like to extract the link between the duplicated rows. I can find duplicated rows within one data frame, as
duplicated(df)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
[15] FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
[29] FALSE FALSE TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
[43] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
[57] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
I would like to find out the count of each duplicated case,
What I expected is of the format:
Row X --> Row Y, Row Z
which refers that X, Y, Z are duplicated, and the count of this group is 3.
Depending on how many columns you have, this could be an option. You'd need to join on all the columns though:
df <- data.frame(col1 = c(1, 1, 2, 3, 4, 5, 6),
col2 = c(1, 1, 2, 3, 4, 5, 6))
df <- data.frame(idx = 1:7, df)
df <- inner_join(df, df, by = c("col1" = "col1", "col2" = "col2"))
df <- df %>% filter(idx.y > idx.x)
df[, c("idx.x", "idx.y")]