Search code examples
runique

Find unique pairs of words ignoring their order in two columns in R


I have a data frame that contains duplicated values in two columns.

   dat<-data.frame(V1 = c("home","cat","fire","sofa","kitchen","sofa"), 
                    V2 = c("cat","home","water","TV","knife","TV"), V3 = c('date1','date1','date2','date3','date4','date3'))

       V1    V2    V3
1    home   cat date1
2     cat  home date1
3    fire water date2
4    sofa    TV date3
5 kitchen knife date4
6    sofa    TV date1

I would like to obtain from this dataframe unique pairs ignoring the order in which the pair is presented between the two columns.

This would be the result that I would like to obtain:

       V1    V2    V3
1    home   cat date1
2    fire water date2
3    sofa    TV date3
4 kitchen knife date4

Solution

  • dat[!duplicated(t(apply(dat, 1, sort))),]

    Using apply and sort will loop through each row and sort. We can then transpose the output and determine duplicates using duplicated. Because duplicated returns a boolean we then subset all rows in dat where duplicated = FALSE.