There are many similar questions but I'd like to compare 2 columns and delete all the duplicates in both columns so that all that is left is the unique observations in each column. Note: Duplicates are not side-by-side. If possible, I would also like a list of the duplicates (not just TRUE/FALSE). Thanks!
C1 C2
1 a z
2 c d
3 f a
4 e c
would become
C1 C2
1 f z
2 e d
with duplicate list
duplicates: a, c
Here is a base R method using duplicated
and lapply
.
temp <- unlist(df)
# get duplicated elements
myDupeVec <- unique(temp[duplicated(temp)])
# get list without duplicates
noDupesList <- lapply(df, function(i) i[!(i %in% myDupeVec)])
noDupesList
$C1
[1] "f" "e"
$C2
[1] "z" "d"
data
df <- read.table(header=T, text=" C1 C2
1 a z
2 c d
3 f a
4 e c ", as.is=TRUE)
Note that this returns a list. This is much more flexible structure, as there is generally a possibility that a level may be repeated more than once in a particular variable. If this is not the case, you can use do.call
and data.frame
to put the result into a rectangular structure.
do.call(data.frame, noDupesList)
C1 C2
1 f z
2 e d