Search code examples
pythonpandascrosstab

calculate percentage relative to entire table


I want to calculate the % according to the total of the entire table, not just row or column in a pandas table with python. I can do by row and by column separately, but i rather to do the entire table. Any suggestions are welcome. Thanks so much

data

df1 = pd.DataFrame(data={'id': [1,2,3,4,5,6,7,8,9,10], 'place': [1,1,2,2,2,1,1,2,1,1], 'band': [1,2,3,3,3,2,1,2,3,1], 'status': [1,2,2,1,1,1,1,2,1,2]})

d1={1: 'north', 2: 'south'} 
d2={1: '10-20', 2: '30-40', 3: '20-30'} 
d3={1: 'green', 2: 'red'} 

df1['place']=df1['place'].map(d1).fillna('Other')
df1['band']=df1['band'].map(d2).fillna('Other')
df1['status']=df1['status'].map(d3).fillna('Other')

commands per row/col

pd.crosstab(df1.band, [df1.place, df1.status]).apply(lambda r: r/r.sum(), axis=1).round(2) #  by row 
pd.crosstab(df1.band, [df1.place, df1.status]).apply(lambda r: r/r.sum(), axis=0).round(2) #  by column

Solution

  • If you do not have to do it in a single apply, you can just divide your cross-table with the total sum.

    ctdf = pd.crosstab(df1.band, [df1.place, df1.status])
    ctdf / ctdf.sum().sum()
    >>>
    place   north   south
    status  green   red green   red
    band                
    10-20   0.2     0.1 0.0     0.0
    20-30   0.1     0.0 0.2     0.1
    30-40   0.1     0.1 0.0     0.1
    

    Another option (by @Shubham Sharma's comment) is to use ctdf.values.sum()

    Finally, you can also use pipe (instead of apply) in a single method chain:

    pd.crosstab(df1.band, [df1.place, df1.status]).pipe(lambda x: x / x.values.sum())