Search code examples
rdataframeduplicatesrowidentity-column

How can I remove duplicate values by looking at other column values in the data frame?


There are two columns with opposite values of item1 and item2 columns, and the order of the columns is random.

I want to find and remove data with the same values in columns item1 and item2.

like this...

enter image description here

what should i do?


Solution

  • using base

    
    df <- structure(list(Item1 = c("A", "C", "B", "D", "E", "F"), 
                         Item2 = c("B", "D", "A", "C", "F", "E"), 
                         Result = c(0.5, 0.1, 0.5, 0.1, 0.7, 0.6)),
                    class = "data.frame", row.names = c(NA, -6L))
    
    fltr <- !duplicated(apply(df, 1, function(x) paste(sort(x), collapse = "")))
    
    df[fltr, ]
    #>   Item1 Item2 Result
    #> 1     A     B    0.5
    #> 2     C     D    0.1
    #> 5     E     F    0.7
    #> 6     F     E    0.6
    

    Created on 2021-01-15 by the reprex package (v0.3.0)