I have a data frame that contains duplicated values in two columns.
dat<-data.frame(V1 = c("home","cat","fire","sofa","kitchen","sofa"),
V2 = c("cat","home","water","TV","knife","TV"), V3 = c('date1','date1','date2','date3','date4','date3'))
V1 V2 V3
1 home cat date1
2 cat home date1
3 fire water date2
4 sofa TV date3
5 kitchen knife date4
6 sofa TV date1
I would like to obtain from this dataframe unique pairs ignoring the order in which the pair is presented between the two columns.
This would be the result that I would like to obtain:
V1 V2 V3
1 home cat date1
2 fire water date2
3 sofa TV date3
4 kitchen knife date4
dat[!duplicated(t(apply(dat, 1, sort))),]
Using apply
and sort
will loop through each row and sort. We can then transpose the output and determine duplicates using duplicated
. Because duplicated
returns a boolean we then subset all rows in dat
where duplicated = FALSE
.