Search code examples
rdataframematrixmergeintersection

Match rows of a matrix/dataframe to the rows of another (strings), regardless of order?


I am looking for a solution to a problem similar to this one: Match rows of a matrix to the rows of another, regardless of the column order?

But the difference is that I am using strings and not numerals:

df1 <- data.frame (name_1  = c("A", "B", "C"),
               name_2 = c("C", "D", "B"))
df2 <- data.frame (name_1  = c("C", "D", "C"),
                   name_2 = c("A", "B", "D"))

but, in my real data, it is a full string consisting of numbers and letters:

df1 <- data.frame (names_1  = c("AB1C", "BR2G"),
                   names_2 = c("CFG", "DOR5F"))
df2 <- data.frame (names_1  = c("CFG", "DOR5F"),
                   names_2 = c("AB1C", "BR2G"))

such that:

>function(df1,df2, by = c("names_1", "names_2")) # I was trying merge

would either return a data.table:

  Genes_1 Genes_2
1       C       A
2       D       B

or would return indexes from one that match to the matching rows in another:

1 2

I think I could probably just create a transposed duplicate of one of the 2 tables and run merge two separate times:

df3 <- df2
names(df3) <- c("name_2", "name_1")
merge(df1, df2, by = c("name_1", "name_2") )
#no result
merge(df1, df3, by = c("name_1", "name_2") )
   name_1 name_2
1      A      C
2      B      D

Solution

  • Seems you are interested in merging without considering the order. If there are more than 2 variables, consider sorting rowwise. If only two variables as in your case, consider rbinding the second dataframe to a reversed version of itself before merging:

    merge(df1, rbind(df2, setNames(rev(df2),names(df2))))
      name_1 name_2
    1      A      C
    2      B      D