Search code examples
rcomparisonrowcol

R - How to rearrange entire rows using subset of data


I did a row by row comparison of colB of the 2 dataframes (both dataframes have the same ordered columns) I found out they are 98% identical and the only difference between 2 columns was some pairs of the values were switched like so:

df1$colB   df2$colB
   A          B
   B          A
   1          1 
   C          D 
   D          C

I took subset of when df1$colB is not equal to df2$colB

subset(df1, df1$colB != df2$colB)

Is there a way to use the row occurrence from the subset above to rearrange the entire rows of df1 so that all values from df1$colB and df2$colB now match?


Solution

  • I don't think subset is the right function for this. It expects a logical vector as the second argument, (confusingly) also named "subset", and so has no ordering capability. Instead use match and [:

     df1[ match(df1$colB,  df2$colB), ]
      X1.5 colB
    2    2    B
    1    1    A
    3    3    1
    5    5    D
    4    4    C
    

    But after testing with a slightly more complex version of df2 that had NA's and other values in colB, I've decided that @Ananta's suggestion of merge is the safer way to go (but with df2 as the first argument to get the desired ordering):

    > merge( df2,df1)[ names(df1) ]
      X1.5 colB
    1    3    1
    2    1    A
    3    2    B
    4    4    C
    5    5    D