Search code examples
pythonpandasdataframegroup-by

Python pandas to calculate percentage of groupby sums?


I want to calculate the relative percentages of a sum of a particular column of a dataframe after I have grouped by a different column. The format of the data frame would be for example:

df = pd.DataFrame({'Company': ['Company1', 'Company2', 'Company3', 'Company1', 'Company3'],
                   'STC Watts': [10.10, 9.6, 13.3, 5.6, 10.12],
                   ...other columns
                     )

At this point I need to group by the "Company" columns and get the total sum of "STC Watts" column for each company, but I would also like to have a relative percentage of that company "STC Watts" sum out of the total for the entire dataset. I tried using a couple solutions from this question but could not get any of them to work


Solution

  • You can post-process the groupby.sum adding a new column with the division by the total (optional multiplied by 100 for percent, and eventually rounded):

    out = (
     df.groupby('Company', as_index=False)['STC Watts'].sum()
       .assign(percent=lambda d: d['STC Watts'].div(d['STC Watts'].sum())
                                               .mul(100).round(2)
               )
    )
    

    Output:

        Company  STC Watts  percent
    0  Company1      15.70    32.22
    1  Company2       9.60    19.70
    2  Company3      23.42    48.07