Search code examples
pythonfunctionloopsaggregate

How can I iterate through a variable in dataframe while the same time aggregating by another variable and assign labels as a new variable?


timestamp   IFPID   Outcome Share Price Trade Qty   Date
0   10/05/2022 08:04    SP3K4K5 dn  36  100 2022-10-05
1   10/05/2022 08:04    SP3K4K5 up  64  100 2022-10-05
2   10/05/2022 08:04    SP3K4K5 up  65  100 2022-10-05
3   10/05/2022 08:04    SP3K4K5 dn  35  100 2022-10-05
4   11/05/2022 00:54    SP3K4K5 up  57  64  2022-11-05
5   11/05/2022 00:54    SP3K4K5 dn  43  64  2022-11-05

I want to new variable Expected(a binary class with buy or sell) which will be computed based on the aggregation of the the Date and Outcome variables by the share price variable and assign buy when the mean of the of up outcome for a particular date is higher than the mean of the dn outcome.


Solution

  • import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'outcome': ['dn', 'up', 'up', 'dn', 'up', 'dn'], 'share price': [36,64,65,35,57,43],'date':['10/5/22', '10/5/22', '10/5/22', '10/5/22', '11/5/22','11/5/22']})
    print(df)
    

    enter image description here

    piv = df.pivot_table(index="date", columns="outcome", values="share price", aggfunc='mean', fill_value=0)
    piv['label'] = np.select([piv['dn']<piv['up'],piv['dn']>piv['up']], ['sell','buy'])
    piv
    

    enter image description here