Apologies in advance as technically i have a few questions regarding the problem i would like to solve with python but since they are related i am putting it all in one post (at least i am hoping it will be a worthy challenge for whoever is able to help me with this).
I have the following pandas dataframe called df
example:
REF Period Product Price Type QTY
T001 Jan-20 EQ 69.87 Sell -10
T001 Feb-21 EQ 69.77 Buy 10
T002 Apr-20 BN 10.77 Buy 15
T003 Jul-20 EQ 71.25 Sell -20
T003 Aug-20 EQ 70.89 Buy 40
T003 Sep-20 EQ 70.69 Sell -20
T004 Nov-20 BN 20.83 Buy 10
T004 Dec-20 EQ 40.01 Sell 12
T005 Sep-20 FD 31.25 Buy -20
T005 Mar-21 FD 36.89 Sell 40
T005 Sep-21 FD 40.69 Buy -20
As you can see the column REF
refers to the trade reference of a portfolio that i would like to analyse.
I have been struggling with finding a solution for the following data analysis problems:
I would like python to go over each row of the trade reference column and:
Add a column TRADE_TYPE
and DIRECTION
(this part i got covered of course)
If the REF
is unique (no duplicates in the column) then TRADE_TYPE
should be = "Flat" + df[Period] (i.e. "Flat Apr-20") and DIRECTION
should be = df[Type] of that row
If the REF
is the same across 2 rows, Product
is the same across these rows, Period
is different in these rows, and Type
is different (one is Sell the other is Buy) then TRADE_TYPE
in those rows should = "Spread" and DIRECTION
should be equal to the Type of the first row (i.e. if the first row of the 2 says Sell then its a Sell and vice versa)
If the REF
is the same across 3 rows, Product
is the same across these rows, Period
is different across these rows, and the sum of the QTY
of these rows equals ZERO then TRADE_TYPE
should be = "Trio" and DIRECTION
should be equal to the Type of the first row (i.e. if the first row of the 3 is SELL second is BUY and third is SELL, then it should pick the value of the first one)
If the REF
is the same across 2 rows but the Product
is different then TRADE_TYPE
should be = "Arbitrage" and DIRECTION
should be equal to the Type
of the first row (i.e. if the first row of the 2 says Sell then its a Sell and vice versa)
The final result should be something like this if using the above table as example:
REF Period Product Price Type QTY TRADE_TYPE DIRECTION
T001 Jan-20 EQ 69.87 Sell -10 Spread Sell
T001 Feb-21 EQ 69.77 Buy 10 Spread Sell
T002 Apr-20 BN 10.77 Buy 15 Flat Apr Buy
T003 Jul-20 EQ 71.25 Sell -20 Trio Sell
T003 Aug-20 EQ 70.89 Buy 40 Trio Sell
T003 Sep-20 EQ 70.69 Sell -20 Trio Sell
T004 Nov-20 BN 20.83 Buy 10 Arbitrage Buy
T004 Dec-20 EQ 40.01 Sell 12 Arbitrage Buy
T005 Sep-20 FD 31.25 Buy -20 Trio Buy
T005 Mar-21 FD 36.89 Sell 40 Trio Buy
T005 Sep-21 FD 40.69 Buy -20 Trio Buy
Anyone that could possibly help with this problem or point me in the right direction? (how to iterate and use if conditionals at the same time over rows and columns etc).
Thank you so much for your help in advance!
Roughly - you need to do a df.groupby(["REF", "Product"]).count()
, this will give you a table that just contains ref, product, and a count. From there you can project/join in RAW_TRADE_TYPE
- one of "Flat" "Spread" or "Trio", using where()
or select()
.
You can join()
this to the original table. From there you have all the information you need to calculate the columns as you describe.