Search code examples
pandasgroup-bydivision

Calculation over pandas groupby object with condition within groups


I have a df as follows:

    appid   month    tag totalvalue
0   1234    02-'22   B   50.00
1   1234    02-'22   BA  10.00
2   1234    01-'22   B   100.00
3   2345    03-'22   BA  25.00
4   2345    03-'22   B   100.00
5   2345    04-'22   BB  100.00     

Output what I want is follows:

    appid   month    tag totalvalue  %tage
0   1234    02-'22   B   50.00       1.0
1   1234    02-'22   BA  10.00       0.2
2   1234    01-'22   B   100.00      1.0
3   2345    03-'22   BA  25.00       0.25
4   2345    03-'22   B   100.00      1.0  
5   2345    04-'22   BB  100.00      inf

I want to have group variables based on appid & month. Moreover want to check if there are tag=B is available in that group just divide other tag's totalvalue with it. If not shows the inf

I have tried with df.groupby(['appid', 'month'])['totalvalue'] but unable to replicate them with condition of tag=B as denominator over groupby object.


Solution

  • IIUC, you can use a groupby.transform('first') on the masked totalvalue, then use it a divider:

    m = df['tag'].eq('B')
    
    df['%tage'] = (df['totalvalue']
                   .div(df['totalvalue'].where(m)
                        .groupby([df['appid'], df['month']])
                        .transform('first').fillna(0))
                  )
    

    output:

       appid   month tag  totalvalue  %tage
    0   1234  02-'22   B        50.0   1.00
    1   1234  02-'22  BA        10.0   0.20
    2   1234  01-'22   B       100.0   1.00
    3   2345  03-'22  BA        25.0   0.25
    4   2345  03-'22   B       100.0   1.00
    5   2345  04-'22  BB       100.0    inf