Search code examples
rdataframedplyrsubstringmapply

Multiply across dataframes by column values


I am looking to multiply data across two different dataframes. The input is dynamic so the number of IDs per team is not constant. I've included a sample set of data below.

Branch  ID-1  Time-1  ID-2  Time-2  ID-3  Time-3
Texas   BKP   5.5     LMG   2.8     DDP   8.9
Maine   BQQ   11      BKP   8.1     OLW   3.0
NYork   DDP   2.0     ADD   6.5     BQQ   0.4

There are hundreds of branches that could utilize thousands of IDs, so the below is just a small subset I've created. We also receive a cost dataframe for all IDs similar to below:

ID     Cost
ADD    4.50
BKP    11.99
BQQ    1.50
DDP    8.99
LMG    24.99
OLW    29.99

I am trying to find the cost by ID per branch and I'm able to go by Substrings, but joining the second dataframe is where I'm running into trouble. The output that I need would look like this:

Branch  ID-1  Time-1  ID-2  Time-2  ID-3  Time-3  Cost-1  Cost-2  Cost-3
Texas   BKP   5.5     LMG   2.8     DDP   8.9     65.945  69.972  80.011
Maine   BQQ   11      BKP   8.1     OLW   3.0     16.50   97.119  89.97
NYork   DDP   2.0     ADD   6.5     BQQ   0.4     17.98   29.25   0.6

I know this isn't the prettiest output, but unfortunately it's the output needed. I'm very grateful for any help you are able to provide with this.


Solution

  • A possible solution with :

    library(data.table)
    
    melt(setDT(df1), id = 1,
         measure.vars = patterns(c("^ID","^Time")),
         value.name = c("ID","Time")
         )[df2, on = .(ID), Cost := Time * i.Cost
           ][, dcast(.SD, Branch ~ variable, value.var = c("ID","Time","Cost"))]
    

    which gives:

       Branch ID_1 ID_2 ID_3 Time_1 Time_2 Time_3 Cost_1 Cost_2 Cost_3
    1:  Maine  BQQ  BKP  OLW   11.0    8.1    3.0 16.500 97.119 89.970
    2:  NYork  DDP  ADD  BQQ    2.0    6.5    0.4 17.980 29.250  0.600
    3:  Texas  BKP  LMG  DDP    5.5    2.8    8.9 65.945 69.972 80.011
    

    Used data:

    df1 <- structure(list(Branch = c("Texas", "Maine", "NYork"), ID.1 = c("BKP", "BQQ", "DDP"), Time.1 = c(5.5, 11, 2), 
                          ID.2 = c("LMG", "BKP", "ADD"), Time.2 = c(2.8, 8.1, 6.5), ID.3 = c("DDP", "OLW", "BQQ"), Time.3 = c(8.9, 3, 0.4)), 
                     .Names = c("Branch", "ID.1", "Time.1", "ID.2", "Time.2", "ID.3", "Time.3"), class = "data.frame", row.names = c(NA, -3L))
    df2 <- structure(list(ID = c("ADD", "BKP", "BQQ", "DDP", "LMG", "OLW"), Cost = c(4.5, 11.99, 1.5, 8.99, 24.99, 29.99)),
                     .Names = c("ID", "Cost"), class = "data.frame", row.names = c(NA, -6L))