Search code examples
pandasgroup-bysum

How do I sum by certain conditions and into a new data frame?


dataframe

my attempted solution

I'm trying to make a bar chart that shows the percentage of non-white employees at each company. In my attempted solution I've summed the counts of employee by ethnicity already but I'm having trouble taking it to the next step of summing the employees by all ethnicities except white and then having a new df with columns of company and % of non-white employees.

How should I go about it? Is my method of using groupby first incorrect?

I've tried using

e2 = df_ethnicities.groupby(["Company", "Ethnicity"]).agg({"Count": sum}).reset_index()

to first get the counts of each ethnicity by company.


Solution

  • You can drop the reset_indexand then unstack. This will result in a Dataframe has the different counts for the different etnicities as columns. 1 minus the % of white employees will then yield the desired formula.

    df_agg = df_ethnicities.groupby(["Company", "Ethnicity"]).agg({"Count": sum}).unstack()
    percentatges = 1-df_agg[('Count','White')]/df_agg.sum(axis=1)