I am having a hard time to produce a chart where counts of one feature are stacked per another feature.
Assume the dataframe below:
method year proteins values QC
0 John 2018 A 30 PASS
1 Kate 2018 B 11 PASS
2 Kate 2018 C 22 NO-PASS
3 Patrick 2019 A 60 PASS
4 Patrick 2019 B 40 NO-PASS
5 Patrick 2019 C 50 NO-PASS
6 Mary 2017 A 8 NO-PASS
7 Mary 2017 B 11 PASS
I have so far this: (df['QC'].value_counts(dropna=True, normalize=True)*100).plot(kind='bar', rot=0, color='c', title='how many pass QC').set(xlabel="QC options", ylabel="% proteins")
which produces a basic bar plot of counts.
But I actually need each bar to be "built by counts per protein", like the one below made with excel. I have tried by pivoting it as well, but then I do not have numerical data and I fail moving on from there.
Thank you for your help!
Pandas will plot each column as bars and stack them with the keyword stacked=True. So you have to calculate and reshape your data to fit this logic.
# Get the #'s data points in the groups
gdf = df.groupby(['proteins', 'QC'])['values'].count()
# Normalize to 100.0
gdf = gdf.div(gdf.sum())*100.0
# Since we want to stack by protiens, lets make them columns
gdf = gdf.unstack('proteins')
# If you want to choose a subset of columns to plot
col_to_plot = gdf.columns.tolist()
# Plot command
gdf[col_to_plot].plot(kind='bar', stacked=True)