Search code examples
pythonpandasdataframepandas-groupbymulti-level

Perform a calculation as a new column on multiindex dataframe


I have the following multiindex dataframe:

                                            MSISDN_COUNT
NAME            DIM1    SEGMENT 
JAN22           N       Detractor           307
                        Passive             198
                        Promoter            1256
                Y       Detractor           237
                        Passive             161
                        Promoter            1096
FEB22           N       Detractor           313
                        Passive             222
                        Promoter            1260
                Y       Detractor           261
                        Passive             169
                        Promoter            1155

I want to create a new column, which will be calculated for each NAME and for each DIM1.

The calculation is: ((Promoter - Detractor) / (Promoter + Passive + Detractor) )* 100

So the expected output is

                                            MSISDN_COUNT    OUT
NAME            DIM1    SEGMENT 
JAN22           N       Detractor           307             53.88
                        Passive             198
                        Promoter            1256
                Y       Detractor           237             57.49
                        Passive             161
                        Promoter            1096
FEB22           N       Detractor           313             52.76
                        Passive             222
                        Promoter            1260
                Y       Detractor           261             56.4
                        Passive             169
                        Promoter            1155

I can't figure out how to do this with the multi-level index dataframe.

Any help is appreciated!


Solution

  • Use DataFrame.xs for select DataFrames for 3 level, apply your formula, add third level SEGMENT with value Detractor and assign back:

    Promoter = df.xs('Promoter', level=2)
    Detractor = df.xs('Detractor', level=2)
    Passive = df.xs('Passive', level=2)
    
    s = ((Promoter - Detractor) / (Promoter + Passive + Detractor) )* 100
    
    df['OUT'] = s.assign(SEGMENT='Detractor').set_index('SEGMENT', append=True)
    print (df)
                          MSISDN_COUNT        OUT
    NAME  DIM1 SEGMENT                           
    JAN22 N    Detractor           307  53.889835
               Passive             198        NaN
               Promoter           1256        NaN
          Y    Detractor           237  57.496653
               Passive             161        NaN
               Promoter           1096        NaN
    FEB22 N    Detractor           313  52.757660
               Passive             222        NaN
               Promoter           1260        NaN
          Y    Detractor           261  56.403785
               Passive             169        NaN
               Promoter           1155        NaN
    

    Details:

    print (s)
                MSISDN_COUNT
    NAME  DIM1              
    JAN22 N        53.889835
          Y        57.496653
    FEB22 N        52.757660
          Y        56.403785
          
    print (s.assign(SEGMENT='Detractor').set_index('SEGMENT', append=True))
                          MSISDN_COUNT
    NAME  DIM1 SEGMENT                
    JAN22 N    Detractor     53.889835
          Y    Detractor     57.496653
    FEB22 N    Detractor     52.757660
          Y    Detractor     56.403785