Search code examples
pandaspivot-table

Pandas Pivot Table add subsequent columns then divide by previous column


I have the following pandas pivot table:

table = pd.pivot_table(df, index = 'group', columns = 'levels',
                       values = 'uid', aggfunc = 'count', margins = True)

There are three levels (level 1, level 2, and level 3).

Basically I have individuals that progress through these levels, but I want to know what percentage of people make it from one level to the next.

I'm thinking I need two transformations:

  1. Add subsequent columns so I know how many people made it to each level (i.e. level 1 = level 1 + level 2 + level 3; level 2 = level 2 + level 3; level 3 = level 3)
  2. Then divide each column by the previous column to know what percentage of people progressed from the previous level (i.e. level 1 = level 1 / level 1; level 2 = level 2 / level 1; level 3 = level 3 / level 2

Any advice is much appreciated. Thank you!

Initial table:

Groups Level 1 Level 2 Level 3
GroupA 5 6 3
GroupB 3 2 4

Want to end up with table:

Groups Level 1 Level 2 Level 3
GroupA 100% 64% 33%
GroupB 100% 67% 67%

Solution

  • According to your logic, you can just take the cumsum in reverse order by the rows, then divide:

    d = df.iloc[:,::-1].cumsum(axis=1).iloc[:,::-1]   
    d.div(d.shift(axis=1)).fillna(1)
    

    output:

            Level 1   Level 2   Level 3
    Groups                             
    GroupA      1.0  0.642857  0.333333
    GroupB      1.0  0.666667  0.666667