Search code examples
pythonpandasbar-chartstacked-chart

Stacked bars of counts of the elements of one column organised by another column


I am having a hard time to produce a chart where counts of one feature are stacked per another feature.

Assume the dataframe below:

     method  year proteins  values       QC
0      John  2018        A      30     PASS
1      Kate  2018        B      11     PASS
2      Kate  2018        C      22  NO-PASS
3   Patrick  2019        A      60     PASS
4   Patrick  2019        B      40  NO-PASS
5   Patrick  2019        C      50  NO-PASS
6      Mary  2017        A       8  NO-PASS
7      Mary  2017        B      11     PASS

I have so far this: (df['QC'].value_counts(dropna=True, normalize=True)*100).plot(kind='bar', rot=0, color='c', title='how many pass QC').set(xlabel="QC options", ylabel="% proteins")which produces a basic bar plot of counts.

pab plot of counts

But I actually need each bar to be "built by counts per protein", like the one below made with excel. I have tried by pivoting it as well, but then I do not have numerical data and I fail moving on from there.

bar plot of counts per protein with excel

Thank you for your help!


Solution

  • Pandas will plot each column as bars and stack them with the keyword stacked=True. So you have to calculate and reshape your data to fit this logic.

    # Get the #'s data points in the groups
    gdf = df.groupby(['proteins', 'QC'])['values'].count()
    
    # Normalize to 100.0
    gdf = gdf.div(gdf.sum())*100.0
    
    # Since we want to stack by protiens, lets make them columns
    gdf = gdf.unstack('proteins')
    
    # If you want to choose a subset of columns to plot
    col_to_plot = gdf.columns.tolist()
    
    # Plot command
    gdf[col_to_plot].plot(kind='bar', stacked=True)