I created a countplot following the indications here: https://stackoverflow.com/a/33259038
ncount = len(bod)
plt.figure(figsize=(14,6))
ax = sns.countplot(x="Preparedness_Q2", hue='country_group', data=bod, palette='inferno',
order=["Not at all", "Prepared only in some aspects", "Prepared enough",'Fully prepared','No Answer'])
plt.title('Preparedness for the pandemic', pad=20, fontsize=20)
plt.xlabel(None)
plt.ylabel('Number of Observations', labelpad=15)
# Make twin axis
ax2=ax.twinx()
ax2.axes.yaxis.set_visible(False)
for p in ax.patches:
x=p.get_bbox().get_points()[:,0]
y=p.get_bbox().get_points()[1,1]
ax.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y),
ha='center', va='bottom')
The output looks like this:
I have not been able to separate the percentages by the "hue", it currently is providing the percentages as a fraction of the total (adding all the observations of "other" and "spain"). Is there a way to provide the percentages divided by the group? Through this answer https://stackoverflow.com/a/59433700 I found you can separate according to each pair of columns but not for the group.
Currently it is "ncount" that calculates the "total". I was able to get the right percentages doing a very ugly(?) change, this was the code:
for num, p in enumerate(ax.patches):
x=p.get_bbox().get_points()[:,0]
y=p.get_bbox().get_points()[1,1]
if num <= (len(ax.patches)/2)-1:
ax.annotate('{:.1f}%'.format(100.*y/other), (x.mean(), y),
ha='center', va='bottom')
else:
ax.annotate('{:.1f}%'.format(100.*y/spain), (x.mean(), y),
ha='center', va='bottom')
Here, 'other' and 'spain' are the total amount of observations for each group. But my problem is that the countplot prioritizes the "counts" rather than the "percentages". So the output is completely off, see this result:
Does anyone have a suggestion on how to approach this? Thanks in advance!
I feel kind of stupid, minutes after posting this question I found this github answer https://github.com/mwaskom/seaborn/issues/1027#issuecomment-360866896 which contains the solution...
For completeness sake, here is the code I used:
x, hue = "Preparedness_Q2", "country_group"
hue_order = ["Not at all", "Prepared only in some aspects", "Prepared enough",'Fully prepared','No Answer']
prop_df = (bod[x].groupby(bod[hue]).value_counts(normalize=True).rename(y).reset_index())
plt.figure(figsize=(15,6))
ax = sns.barplot(x=x, y=y, hue=hue, data=prop_df, order=hue_order)
plt.title('Preparedness for the pandemic', pad=20, fontsize=20)
plt.xlabel(None)
plt.ylabel('Frequencies [%]', labelpad=15)
for p in ax.patches:
x=p.get_bbox().get_points()[:,0]
y=p.get_bbox().get_points()[1,1]
ax.annotate('{:.1f}%'.format(100.*y), (x.mean(), y),
ha='center', va='bottom') # set the alignment of the text
# Change legend title
legend = ax.legend(loc='upper left')
legend.texts[0].set_text("Other countries")
plt.show()