I did a row by row comparison of colB of the 2 dataframes (both dataframes have the same ordered columns) I found out they are 98% identical and the only difference between 2 columns was some pairs of the values were switched like so:
df1$colB df2$colB
A B
B A
1 1
C D
D C
I took subset of when df1$colB is not equal to df2$colB
subset(df1, df1$colB != df2$colB)
Is there a way to use the row occurrence from the subset above to rearrange the entire rows of df1 so that all values from df1$colB and df2$colB now match?
I don't think subset
is the right function for this. It expects a logical vector as the second argument, (confusingly) also named "subset", and so has no ordering capability. Instead use match
and [
:
df1[ match(df1$colB, df2$colB), ]
X1.5 colB
2 2 B
1 1 A
3 3 1
5 5 D
4 4 C
But after testing with a slightly more complex version of df2 that had NA's and other values in colB, I've decided that @Ananta's suggestion of merge
is the safer way to go (but with df2 as the first argument to get the desired ordering):
> merge( df2,df1)[ names(df1) ]
X1.5 colB
1 3 1
2 1 A
3 2 B
4 4 C
5 5 D