What is the dplyr equivalent to df[duplicated(df[,subset]),]
, that is for each set of duplicates based on subset
columns, keeps all the rows but the first match?
This will show all duplicated rows, optionally by subset:
df %>% filter(n() > 1, .by = col)
This is the best SQL-esque I could come up with, using a GROUP BY (I believe dplyr should maintain the row order):
# replace with group_by_all for all columns
df %>% group_by(col) %>% filter(row_number() > 1)
Alternatives by @Onyambu:
df %>% filter(duplicated(df[,cols]))
df %>% filter(row_number() > 1, .by = cols)