Search code examples
pythonpandasnumpydata-sciencedata-analysis

How can I add a new column to a dataframe (df1) that is the sum of multiple lookup values from df1 in another dataframe (df2)


Say I have 2 dataframes:

df1

     id       guid               name      item1        item2        item3        item4        item5         item6       item7        item8       item9
0  3031958124  85558-261955282  Alonso  85558-57439  85558-54608  85558-91361  85558-40647  85558-41305  85558-79979  85558-33076  85558-89956  85558-12554
1  3031958127  85558-261955282  Jeff    85558-57439  85558-39280  85558-91361  85558-55987  85558-83083  85558-79979  85558-33076  85558-41872  85558-12554
2  3031958129  85558-261955282  Mike    85558-57439  85558-39280  85558-91361  85558-55987  85558-40647  85558-79979  85558-33076  85558-88297  85558-12534
...

df2 where item_lookup is the index

             item_type   cost  value  target 
item_lookup
85558-57439  item1       9500   25.1   1.9
85558-54608  item2       8000   18.7   0.0 
85558-91361  item3       7000   16.5   0.9
...

I want to add the sum of cost, value, and target for each item1 through item9 using item_lookup (df2) and store that as a column on df1.

So the result should look like: df1

     id       guid               name      item1        item2        item3        item4        item5         item6       item7        item8       item9       cost   value  target
0  3031958124  85558-261955282  Alonso  85558-57439  85558-54608  85558-91361  85558-40647  85558-41305  85558-79979  85558-33076  85558-89956  85558-12554  58000   192.5   38.3
1  3031958127  85558-261955282  Jeff    85558-57439  85558-39280  85558-91361  85558-55987  85558-83083  85558-79979  85558-33076  85558-41872  85558-12554  59400   183.2   87.7
2  3031958129  85558-261955282  Mike    85558-57439  85558-39280  85558-91361  85558-55987  85558-40647  85558-79979  85558-33076  85558-88297  85558-12534  58000   101.5   18.1
...

I've tried following similar solutions online that use .map, however these examples are only for single columns whereas I am trying to sum values for 9 columns.


Solution

  • You can do this by using df.apply, basically looping through the rows and then looping through the items in the row and calculating the sum


    Since i couldn't use your dfs because they are incomplete, i made mine.

    given df1:

      item1 item2 item3
    0     b     e     j
    1     d     a     d
    2     j     b     a
    3     c     j     f
    4     e     f     c
    5     a     d     b
    6     f     c     e
    

    and df2

                 cost  value  target
    item_lookup                     
    a              19     20      12
    b              16     14      14
    c              20     18      18
    d              17     12      14
    e              20     15      17
    f              19     20      12
    j              11     17      12
    

    you can use the following function to get what you need

    def add_items(row):
         row["cost"] = row["target"] = row["value"] = 0
         # get the columns that have item in the name
         cols = [col for col in df1.columns if "item" in col]
         # get each of the columns look it up in df2 and add it to our new cols
         for col in cols:
             item_lookup = row[col]
             lookup_result = df2.loc[item_lookup]
             row["cost"] += lookup_result["cost"]
             row["target"] += lookup_result["target"]
             row["value"] += lookup_result["value"]
         return row
    

    and then apply it

    >>> df1.apply(add_items, axis=1)
      item1 item2 item3  cost  target  value
    0     b     e     j    47      43     46
    1     d     a     d    53      40     44
    2     j     b     a    46      38     51
    3     c     j     f    50      42     55
    4     e     f     c    59      47     53
    5     a     d     b    52      40     46
    6     f     c     e    59      47     53