All,
I have the following long format dataframe:
df = pd.DataFrame({'date': ["2020-01-01","2020-01-01","2020-01-02","2020-01-02","2020-01-01","2020-01-01","2020-01-02","2020-01-02"], 'asset': ["x", "x","x", "x","y","y","y","y"], 'type': ["price", "spread","price","spread","price", "spread","price","spread"], 'value': ["1.5", "0.01","1.6", "0.01","1","0.08","1.2","0.09"]})
which looks like this:
date asset type value
0 2020-01-01 x price 1.5
1 2020-01-01 x spread 0.01
2 2020-01-02 x price 1.6
3 2020-01-02 x spread 0.01
4 2020-01-01 y price 1
5 2020-01-01 y spread 0.08
6 2020-01-02 y price 1.2
7 2020-01-02 y spread 0.09
I want to subtract the price of y
from the price of x
and maintain the same datastructure, result should should look like this:
date asset type value
0 2020-01-01 x price 1.5
1 2020-01-01 x spread 0.01
2 2020-01-02 x price 1.6
3 2020-01-02 x spread 0.01
4 2020-01-01 y price 1
5 2020-01-01 y spread 0.08
6 2020-01-02 y price 1.2
7 2020-01-02 y spread 0.09
8 2020-01-01 x_min_y pricediff 0.5
9 2020-01-02 x_min_y pricediff 0.4
I would like to use the assign()
function of pandas to create this but I am not sure how to do this.
Thanks in advance!
Use:
m = df['type'].eq('price') & df['asset'].isin(['x', 'y'])
d = df[m].pivot('date', 'asset', 'value').astype(float)
d = pd.concat(
[df, d['x'].sub(d['y']).reset_index(name='value').assign(
asset='x_min_y', type='pricediff')],
ignore_index=True)
Details:
Create a boolean mask m
to filter the rows where type
is price
and asset
is in x, y
and use DataFrame.pivot
to reshape the dataframe:
print(d) # pivoted dataframe
asset x y
date
2020-01-01 1.5 1.0
2020-01-02 1.6 1.2
Use Series.sub
to subtract column x
from y
in the pivoted dataframe and assign the columns asset
and type
, then use pd.concat
to concat this pivoted dataframe with the original dataframe df
.
print(d)
date asset type value
0 2020-01-01 x price 1.5
1 2020-01-01 x spread 0.01
2 2020-01-02 x price 1.6
3 2020-01-02 x spread 0.01
4 2020-01-01 y price 1
5 2020-01-01 y spread 0.08
6 2020-01-02 y price 1.2
7 2020-01-02 y spread 0.09
8 2020-01-01 x_min_y pricediff 0.5
9 2020-01-02 x_min_y pricediff 0.4