Search code examples
pythonstringdataframeoperation

Subtract 2 dataframes into a third one but keep str columns


I have 2 different dataframes with a mix of String & Numerical columns. I want to subtract the numeric columns of these two dataframes into a new one but keep string columns.

df1 = pd.DataFrame({'col1': ['A', 'B', 'C'],
                'col2': ['black berry','green apple','red wine'],
               'col3':['black', 'green','red' ],
               'data1':[2,2,2],
                'data2':[2,2,2],
                'data3':[5,5,5]
               })
df2 = pd.DataFrame({'col1': ['A', 'B', 'C'],
                'col2': ['black berry','green apple','red wine'],
               'col3':['black', 'green','red' ],
               'data1':[1,2,2],
                'data2':[2,1,3],
                'data3':[3,2,1]
               })

df1

col1         col2          col3     data1   data2  data3
 A        black berry      black      2       2     5
 B        green apple      green      2       2     5
 C        red wine          red       2       2     5

df2

col1         col2          col3      data1   data2  data3
 A        black berry      black      1       2     3
 B        green apple      green      2       1     2
 C        red wine          red       2       3     1

I expected df3 = df2-df1

df3

col1         col2          col3     data1   data2  data3
 A        black berry      black      1       0     2
 B        green apple      green      0       1     3
 C        red wine          red       0       -1    4

I tried df3=df2-df1 but I get this error:

unsupported operand type(s) for -: 'str' and 'str' because it's not possible to substract characters columns.

So how to do this?


Solution

  • I think this is all you need

    df3 = df1.assign(
        data1=df1["data1"] - df2["data1"],
        data2=df1["data2"] - df2["data2"],
        data3=df1["data3"] - df2["data3"],
    )
    

    With this solution you just specify which columns you want to do the subtraction on.