Search code examples
pythonarrayspandasdataframe

Create Pivot table and add additional columns from another dataframe


Given two identically formatted dataframes:

df1

Counterparty  Product  Deal  Date          Value
foo           bar      Buy    01/01/24     10.00
foo           bar      Buy    01/01/24     10.00
foo           bar      Sell   01/01/24     10.00
foo           bar      Sell   01/01/24     10.00
fizz          bar      Buy    01/01/24     10.00
fizz          bar      Buy    01/01/24     10.00
fizz          buzz     Sell   01/01/24     10.00
fizz          buzz     Sell   01/01/24     10.00

df2

Counterparty  Product  Deal  Date          Value
foo           bar      Buy    01/01/24     11.00
foo           bar      Buy    01/01/24     09.00
foo           bar      Sell   01/01/24     09.00
foo           bar      Sell   01/01/24     10.00
fizz          bar      Buy    01/01/24     12.00
fizz          bar      Buy    01/01/24     08.00
fizz          buzz     Sell   01/01/24     09.00
fizz          buzz     Sell   01/01/24     10.00

I have done this so far:

out = pd.pivot_table(df1, values = 'Value', index='Counterparty', columns = 'Product', aggfunc='sum').reset_index().rename_axis(None, axis=1)
out = out.fillna(0)

 Counterparty   bar  buzz
0         fizz  20.0  20.0
1          foo  40.0   0.0 

buy How can I pivot these to create a visual like this:

Counterparty  Bar  Buzz  Total  col1 col2
foo           40    0      40    39    1
fizz          20    20     40    39    1

where col1 is coming from df2 and col2 is the difference between the Total and col1

sample:

df1 = pd.DataFrame({
    "Counterparty": ["foo", "foo", "foo", "foo", "fizz", "fizz", "fizz", "fizz"],
    "Product": ["bar", "bar", "bar", "bar", "bar", "bar", "buzz", "buzz"],
    "Deal": ["Buy","Buy", "Sell", "Sell", "Buy", "Buy", "Sell", "Sell"],
    "Date": ["01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24"],
    "Value": [10, 10, 10, 10, 10, 10, 10, 10]
    })

df2 = pd.DataFrame({
    "Counterparty": ["foo", "foo", "foo", "foo", "fizz", "fizz", "fizz", "fizz"],
    "Product": ["bar", "bar", "bar", "bar", "bar", "bar", "buzz", "buzz"],
    "Deal": ["Buy","Buy", "Sell", "Sell", "Buy", "Buy", "Sell", "Sell"],
    "Date": ["01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24"],
    "Value": [11, 9, 9, 10, 12, 8, 9, 10]
    })

out = pd.pivot_table(df1, values = 'Value', index='Counterparty', columns = 'Product', aggfunc='sum').reset_index().rename_axis(None, axis=1)
out = out.fillna(0)

Solution

  • Total column can be generated by summing all present columns except the first. Must be done first prior to adding other columns.

    out['Total'] = out[out.columns[1:]].sum(axis = 1)
    

    col1 column is done with a groupby, merge on Counterparty and then rename the column:

    out = out.merge(df2.groupby('Counterparty')[['Value']].sum(), on = 'Counterparty')
    out = out.rename(columns = {'Value': 'col1'})
    

    col2 is trivial:

    out['col2'] = out['Total'] - out['col1']