I have a dataset with many duplicated rows, and I would like to isolate only non duplicated values. my df looks something like this
df <- data.frame("group" = c("A", "A", "A","A","A","B","B","B"),
"id" = c("id1", "id2", "id3", "id1", "id2","id1","id2","id1"),
"Val" = c(10,10,10,10,10,12,12,12))
What I would like to extract are only the rows that do not have a duplicate. i.e. my final dataset should look like this
final <- data.frame("group" = c("A","B"),
"id" = c("id3","id2"),
"Val" = c(10,12))
Note I am not interested in finding unique values, but rather non duplicated ones.
I know how to find unique values, for instance df %>% distinct()
does the job. it is individuating non-duplicated rows that I am struggling with
Here is one option.
library(dplyr)
df %>%
group_by(group) %>%
filter(!(duplicated(id)|duplicated(id, fromLast = TRUE)))
Or with dplyr
alone
df %>%
group_by_all %>%
filter(n() ==1)
Or in the newer version of dplyr
(suggested by @Pål Bjartan)
df %>%
group_by(across(everything())) %>%
filter(n() ==1)
Or using base R
df[!(duplicated(df[1:2])|duplicated(df[1:2], fromLast = TRUE)),]