Search code examples
rduplicatescompareunique

R Compare non side-by-side duplicates in 2 columns


There are many similar questions but I'd like to compare 2 columns and delete all the duplicates in both columns so that all that is left is the unique observations in each column. Note: Duplicates are not side-by-side. If possible, I would also like a list of the duplicates (not just TRUE/FALSE). Thanks!

        C1 C2
     1  a  z 
     2  c  d
     3  f  a 
     4  e  c 

would become

        C1 C2
     1  f  z
     2  e  d

with duplicate list

    duplicates: a, c 

Solution

  • Here is a base R method using duplicated and lapply.

    temp <- unlist(df)
    # get duplicated elements
    myDupeVec <- unique(temp[duplicated(temp)])
    
    # get list without duplicates
    noDupesList <- lapply(df, function(i) i[!(i %in% myDupeVec)])
    
    noDupesList
    $C1
    [1] "f" "e"
    
    $C2
    [1] "z" "d"
    

    data

    df <- read.table(header=T, text="   C1 C2
         1  a  z 
         2  c  d
         3  f  a 
         4  e  c ", as.is=TRUE)
    

    Note that this returns a list. This is much more flexible structure, as there is generally a possibility that a level may be repeated more than once in a particular variable. If this is not the case, you can use do.call and data.frame to put the result into a rectangular structure.

    do.call(data.frame, noDupesList)
      C1 C2
    1  f  z
    2  e  d