I have two df as below df1:
M1 |
-------+
a,b,c |
a |
b,c |
c,b,a |
b,a,d |
d,a,b,c|
a,d,c |
b |
c,d |
d,a |
df2:
X1 |X2
--------+---
a |1
b |2
c |3
d |4
a,b |5
a,c |6
a,d |7
b,c |8
b,d |9
c,d |10
a,b,c |11
a,c,d |12
a,b,d |13
b,c,d |14
a,b,c,d |15
can someone help me to match values in df1$M1 and df2$X1. and put the corresponding X2 value in column M2 as below
df1:
M1 |M2
--------+---
a,b,c |11
a |1
b,c |8
c,b,a |11
b,a,d |13
d,a,b,c |15
a,d,c |12
b |2
c,d |10
d,a |7
Can someone help me
X1
and M1
have to be stored as Characters. You can check with str(df1)
, and re-assign if necessary df1 <- as.character(df1$X1)
, and the same for df2
Then, create new columns with the values in alphabetical order:
df1$Ordered <- sapply(lapply(strsplit(df1$X1, ","), sort),paste,collapse=",")
df2$Ordered <- sapply(lapply(strsplit(df2$M1, ","), sort),paste,collapse=",")
Then perform a join like so:
merge(df1, df2, by="Ordered")
If you want to include all the values in df1
regardless of whether they have a matching value in df2
, add the all.x = TRUE
argument. Same logic applies adding all = TRUE
(include everything from both data frames), or all.y = TRUE
for df2
.