Given two identically formatted dataframes:
df1
Counterparty Product Deal Date Value
foo bar Buy 01/01/24 10.00
foo bar Buy 01/01/24 10.00
foo bar Sell 01/01/24 10.00
foo bar Sell 01/01/24 10.00
fizz bar Buy 01/01/24 10.00
fizz bar Buy 01/01/24 10.00
fizz buzz Sell 01/01/24 10.00
fizz buzz Sell 01/01/24 10.00
df2
Counterparty Product Deal Date Value
foo bar Buy 01/01/24 11.00
foo bar Buy 01/01/24 09.00
foo bar Sell 01/01/24 09.00
foo bar Sell 01/01/24 10.00
fizz bar Buy 01/01/24 12.00
fizz bar Buy 01/01/24 08.00
fizz buzz Sell 01/01/24 09.00
fizz buzz Sell 01/01/24 10.00
I have done this so far:
out = pd.pivot_table(df1, values = 'Value', index='Counterparty', columns = 'Product', aggfunc='sum').reset_index().rename_axis(None, axis=1)
out = out.fillna(0)
Counterparty bar buzz
0 fizz 20.0 20.0
1 foo 40.0 0.0
buy How can I pivot these to create a visual like this:
Counterparty Bar Buzz Total col1 col2
foo 40 0 40 39 1
fizz 20 20 40 39 1
where col1
is coming from df2
and col2
is the difference between the Total
and col1
sample:
df1 = pd.DataFrame({
"Counterparty": ["foo", "foo", "foo", "foo", "fizz", "fizz", "fizz", "fizz"],
"Product": ["bar", "bar", "bar", "bar", "bar", "bar", "buzz", "buzz"],
"Deal": ["Buy","Buy", "Sell", "Sell", "Buy", "Buy", "Sell", "Sell"],
"Date": ["01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24"],
"Value": [10, 10, 10, 10, 10, 10, 10, 10]
})
df2 = pd.DataFrame({
"Counterparty": ["foo", "foo", "foo", "foo", "fizz", "fizz", "fizz", "fizz"],
"Product": ["bar", "bar", "bar", "bar", "bar", "bar", "buzz", "buzz"],
"Deal": ["Buy","Buy", "Sell", "Sell", "Buy", "Buy", "Sell", "Sell"],
"Date": ["01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24", "01/01/24"],
"Value": [11, 9, 9, 10, 12, 8, 9, 10]
})
out = pd.pivot_table(df1, values = 'Value', index='Counterparty', columns = 'Product', aggfunc='sum').reset_index().rename_axis(None, axis=1)
out = out.fillna(0)
Total
column can be generated by summing all present columns except the first. Must be done first prior to adding other columns.
out['Total'] = out[out.columns[1:]].sum(axis = 1)
col1
column is done with a groupby, merge on Counterparty
and then rename the column:
out = out.merge(df2.groupby('Counterparty')[['Value']].sum(), on = 'Counterparty')
out = out.rename(columns = {'Value': 'col1'})
col2
is trivial:
out['col2'] = out['Total'] - out['col1']