I have some data that looks like this:
Row From To Value
1 MA1007 MA1007 2
2 MA1007 MA2801 1
3 MA2801 MA1017 1
4 MA1051 MA2066 2
5 MA1051 MA2059 1
6 MA2066 MA1051 2
7 MA2066 MA2066 1
I want to remove all rows where the values are the same in the From
and To
columns. In the example above, that means rows 1 and 7 would be removed. However, I also want to remove any rows where the combination of values in the From
and To
column appear more than once (retaining just one instance of that row). For example, row 2 or 3 would be removed (they both contain the same combination of "MA" codes albeit in different columns). Either row 4 or 6 would also be removed. In case it makes any difference, the Value
column will also have the same value for a specific combination. E.g. rows 2 and 3 have the same value, as do rows 4 and 6.
Here's how the final data should look:
Row From To Value
2 MA1007 MA2801 1
4 MA1051 MA2066 2
5 MA1051 MA2059 1
Any thoughts on how I should approach this? Thanks :)
We can use subset
to create an expression where the 'From' values are not equal (!=
) to 'To' and (&
) where it is not (!
) duplicated
for 'From', 'To', after sort
ing by row with pmin/pmax
subset(df1, From != To & !duplicated(cbind(pmin(From, To), pmax(From, To))))
-output
Row From To Value
2 2 MA1017 MA2801 1
4 4 MA1051 MA2066 2
5 5 MA1051 MA2059 1
df1 <- structure(list(Row = 1:7, From = c("MA1007", "MA1017", "MA2801",
"MA1051", "MA1051", "MA2066", "MA2066"), To = c("MA1007", "MA2801",
"MA1017", "MA2066", "MA2059", "MA1051", "MA2066"), Value = c(2L,
1L, 1L, 2L, 1L, 2L, 1L)), class = "data.frame", row.names = c(NA,
-7L))