Search code examples
rmergeplyr

How to merge two data frames on common columns in R with sum of others?


R Version 2.11.1 32-bit on Windows 7

I got two data sets: data_A and data_B:

data_A

USER_A USER_B ACTION
1      11     0.3
1      13     0.25
1      16     0.63
1      17     0.26
2      11     0.14
2      14     0.28

data_B

USER_A USER_B ACTION
1      13     0.17
1      14     0.27
2      11     0.25

Now I want to add the ACTION of data_B to the data_A if their USER_A and USER_B are equal. As the example above, the result would be:

data_A

USER_A USER_B ACTION
1      11     0.3
1      13     0.25+0.17
1      16     0.63
1      17     0.26
2      11     0.14+0.25
2      14     0.28

So how could I achieve it?


Solution

  • You can use ddply in package plyr and combine it with merge:

    library(plyr)
    ddply(merge(data_A, data_B, all.x=TRUE), 
      .(USER_A, USER_B), summarise, ACTION=sum(ACTION))
    

    Notice that merge is called with the parameter all.x=TRUE - this returns all of the values in the first data.frame passed to merge, i.e. data_A:

      USER_A USER_B ACTION
    1      1     11   0.30
    2      1     13   0.25
    3      1     16   0.63
    4      1     17   0.26
    5      2     11   0.14
    6      2     14   0.28