Search code examples
runiquerowsdata-manipulation

Identify only non duplicated rows


I have a dataset with many duplicated rows, and I would like to isolate only non duplicated values. my df looks something like this

df <- data.frame("group" = c("A", "A", "A","A","A","B","B","B"), 
                    "id" = c("id1", "id2", "id3", "id1", "id2","id1","id2","id1"), 
                    "Val" = c(10,10,10,10,10,12,12,12))

What I would like to extract are only the rows that do not have a duplicate. i.e. my final dataset should look like this

final <- data.frame("group" = c("A","B"), 
                 "id" = c("id3","id2"), 
                 "Val" = c(10,12))

Note I am not interested in finding unique values, but rather non duplicated ones. I know how to find unique values, for instance df %>% distinct() does the job. it is individuating non-duplicated rows that I am struggling with


Solution

  • Here is one option.

    library(dplyr)
    df %>%
       group_by(group) %>% 
       filter(!(duplicated(id)|duplicated(id, fromLast = TRUE)))
    

    Or with dplyr alone

    df %>% 
         group_by_all %>%
         filter(n() ==1)
    

    Or in the newer version of dplyr (suggested by @Pål Bjartan)

    df %>% 
      group_by(across(everything())) %>% 
      filter(n() ==1)
    

    Or using base R

    df[!(duplicated(df[1:2])|duplicated(df[1:2], fromLast = TRUE)),]