Search code examples
pythonpandasnumpydataframedata-analysis

Python - how to do complex IF iterations over all rows of a column and return a value


Apologies in advance as technically i have a few questions regarding the problem i would like to solve with python but since they are related i am putting it all in one post (at least i am hoping it will be a worthy challenge for whoever is able to help me with this).

I have the following pandas dataframe called df example:

REF   Period    Product   Price  Type    QTY 
T001  Jan-20    EQ        69.87  Sell    -10   
T001  Feb-21    EQ        69.77  Buy      10   
T002  Apr-20    BN        10.77  Buy      15      
T003  Jul-20    EQ        71.25  Sell    -20   
T003  Aug-20    EQ        70.89  Buy      40   
T003  Sep-20    EQ        70.69  Sell    -20          
T004  Nov-20    BN        20.83  Buy      10   
T004  Dec-20    EQ        40.01  Sell     12   
T005  Sep-20    FD        31.25  Buy     -20   
T005  Mar-21    FD        36.89  Sell     40
T005  Sep-21    FD        40.69  Buy     -20  

As you can see the column REF refers to the trade reference of a portfolio that i would like to analyse. I have been struggling with finding a solution for the following data analysis problems:

I would like python to go over each row of the trade reference column and:

  1. Add a column TRADE_TYPE and DIRECTION (this part i got covered of course)

  2. If the REF is unique (no duplicates in the column) then TRADE_TYPE should be = "Flat" + df[Period] (i.e. "Flat Apr-20") and DIRECTION should be = df[Type] of that row

  3. If the REF is the same across 2 rows, Product is the same across these rows, Period is different in these rows, and Type is different (one is Sell the other is Buy) then TRADE_TYPE in those rows should = "Spread" and DIRECTION should be equal to the Type of the first row (i.e. if the first row of the 2 says Sell then its a Sell and vice versa)

  4. If the REF is the same across 3 rows, Product is the same across these rows, Period is different across these rows, and the sum of the QTY of these rows equals ZERO then TRADE_TYPE should be = "Trio" and DIRECTION should be equal to the Type of the first row (i.e. if the first row of the 3 is SELL second is BUY and third is SELL, then it should pick the value of the first one)

  5. If the REF is the same across 2 rows but the Product is different then TRADE_TYPE should be = "Arbitrage" and DIRECTION should be equal to the Type of the first row (i.e. if the first row of the 2 says Sell then its a Sell and vice versa)

The final result should be something like this if using the above table as example:

REF   Period    Product   Price  Type    QTY   TRADE_TYPE  DIRECTION
T001  Jan-20    EQ        69.87  Sell    -10   Spread      Sell
T001  Feb-21    EQ        69.77  Buy      10   Spread      Sell
T002  Apr-20    BN        10.77  Buy      15   Flat Apr    Buy
T003  Jul-20    EQ        71.25  Sell    -20   Trio        Sell
T003  Aug-20    EQ        70.89  Buy      40   Trio        Sell
T003  Sep-20    EQ        70.69  Sell    -20   Trio        Sell       
T004  Nov-20    BN        20.83  Buy      10   Arbitrage   Buy
T004  Dec-20    EQ        40.01  Sell     12   Arbitrage   Buy
T005  Sep-20    FD        31.25  Buy     -20   Trio        Buy
T005  Mar-21    FD        36.89  Sell     40   Trio        Buy
T005  Sep-21    FD        40.69  Buy     -20   Trio        Buy       

Anyone that could possibly help with this problem or point me in the right direction? (how to iterate and use if conditionals at the same time over rows and columns etc).

Thank you so much for your help in advance!


Solution

  • Roughly - you need to do a df.groupby(["REF", "Product"]).count(), this will give you a table that just contains ref, product, and a count. From there you can project/join in RAW_TRADE_TYPE - one of "Flat" "Spread" or "Trio", using where() or select().

    You can join() this to the original table. From there you have all the information you need to calculate the columns as you describe.