Selecting unique rows of a dataframe when the duplicate values are swapped across multiple columns in R?

I am trying to use R to filter a dataframe based on two columns to select rows that are agnostic to which column of those two columns a value occurs in.

The input looks like this, with duplicates bolded. Note that the ID column does NOT contribute to uniqueness:

ID	Col_1	Col_2
1	A	A
1	A	B
1	A	C
1	B	A
1	B	B
1	B	C
2	C	B
2	C	C
2	C	D

I tried to filter by removing rows based on values of Col_2 that don't show up in Col_1 or vice versa, but that removes entirely too much. What I'd like the output to be is something like:

ID	Col_1	Col_2
1	A	A
1	A	B
1	A	C
1	B	B
1	B	C
2	C	C
2	C	D

I can't swap values between Col_1 and Col_2, because in the cases where there is no "duplication" it is relevant which column a given value appears in. It also doesn't matter whether, e.g., A B is selected over B A, just that only one is retained.

Solution

One way to solve your problem:

library(dplyr)

df |> 
  distinct(C1 = pmin(Col_1, Col_2), C2 = pmax(Col_1, Col_2), .keep_all=T) |> 
  select(-c(C1, C2))

  ID Col_1 Col_2
1  1     A     A
2  1     A     B
3  1     A     C
4  1     B     B
5  1     B     C
6  2     C     C
7  2     C     D

Or using built-in functions:

with(df, df[!duplicated(cbind(pmin(Col_1, Col_2), pmax(Col_1, Col_2))),])