I want to calculate the relative percentages of a sum of a particular column of a dataframe after I have grouped by a different column. The format of the data frame would be for example:
df = pd.DataFrame({'Company': ['Company1', 'Company2', 'Company3', 'Company1', 'Company3'],
'STC Watts': [10.10, 9.6, 13.3, 5.6, 10.12],
...other columns
)
At this point I need to group by the "Company" columns and get the total sum of "STC Watts" column for each company, but I would also like to have a relative percentage of that company "STC Watts" sum out of the total for the entire dataset. I tried using a couple solutions from this question but could not get any of them to work
You can post-process the groupby.sum
adding a new column with the division by the total (optional multiplied by 100 for percent, and eventually rounded):
out = (
df.groupby('Company', as_index=False)['STC Watts'].sum()
.assign(percent=lambda d: d['STC Watts'].div(d['STC Watts'].sum())
.mul(100).round(2)
)
)
Output:
Company STC Watts percent
0 Company1 15.70 32.22
1 Company2 9.60 19.70
2 Company3 23.42 48.07