Search code examples
rduplicatesunique

R - Remove rows with same values AND rows with same combinations of values


I have some data that looks like this:

Row  From   To      Value
1    MA1007 MA1007  2
2    MA1007 MA2801  1
3    MA2801 MA1017  1
4    MA1051 MA2066  2
5    MA1051 MA2059  1
6    MA2066 MA1051  2
7    MA2066 MA2066  1

I want to remove all rows where the values are the same in the From and To columns. In the example above, that means rows 1 and 7 would be removed. However, I also want to remove any rows where the combination of values in the From and To column appear more than once (retaining just one instance of that row). For example, row 2 or 3 would be removed (they both contain the same combination of "MA" codes albeit in different columns). Either row 4 or 6 would also be removed. In case it makes any difference, the Value column will also have the same value for a specific combination. E.g. rows 2 and 3 have the same value, as do rows 4 and 6.

Here's how the final data should look:

Row  From   To      Value
2    MA1007 MA2801  1
4    MA1051 MA2066  2
5    MA1051 MA2059  1

Any thoughts on how I should approach this? Thanks :)


Solution

  • We can use subset to create an expression where the 'From' values are not equal (!=) to 'To' and (&) where it is not (!) duplicated for 'From', 'To', after sorting by row with pmin/pmax

    subset(df1, From != To & !duplicated(cbind(pmin(From, To), pmax(From, To))))
    

    -output

     Row   From     To Value
    2   2 MA1017 MA2801     1
    4   4 MA1051 MA2066     2
    5   5 MA1051 MA2059     1
    

    data

    df1 <- structure(list(Row = 1:7, From = c("MA1007", "MA1017", "MA2801", 
    "MA1051", "MA1051", "MA2066", "MA2066"), To = c("MA1007", "MA2801", 
    "MA1017", "MA2066", "MA2059", "MA1051", "MA2066"), Value = c(2L, 
    1L, 1L, 2L, 1L, 2L, 1L)), class = "data.frame", row.names = c(NA, 
    -7L))