Search code examples
pythonmatplotlibseabornpercentagehue

Matplotlib/Seaborn (Countplot) - percentage not taking into account hue


I created a countplot following the indications here: https://stackoverflow.com/a/33259038

ncount = len(bod)

plt.figure(figsize=(14,6))
ax = sns.countplot(x="Preparedness_Q2", hue='country_group', data=bod, palette='inferno',
                  order=["Not at all", "Prepared only in some aspects", "Prepared enough",'Fully prepared','No Answer']) 
plt.title('Preparedness for the pandemic', pad=20, fontsize=20)
plt.xlabel(None)
plt.ylabel('Number of Observations', labelpad=15)

# Make twin axis
ax2=ax.twinx()
ax2.axes.yaxis.set_visible(False)

for p in ax.patches:
    x=p.get_bbox().get_points()[:,0]
    y=p.get_bbox().get_points()[1,1]
    ax.annotate('{:.1f}%'.format(100.*y/ncount), (x.mean(), y), 
            ha='center', va='bottom') 

The output looks like this:

enter image description here

I have not been able to separate the percentages by the "hue", it currently is providing the percentages as a fraction of the total (adding all the observations of "other" and "spain"). Is there a way to provide the percentages divided by the group? Through this answer https://stackoverflow.com/a/59433700 I found you can separate according to each pair of columns but not for the group.

Currently it is "ncount" that calculates the "total". I was able to get the right percentages doing a very ugly(?) change, this was the code:

for num, p in enumerate(ax.patches):
    x=p.get_bbox().get_points()[:,0]
    y=p.get_bbox().get_points()[1,1]
    if num <= (len(ax.patches)/2)-1:
        ax.annotate('{:.1f}%'.format(100.*y/other), (x.mean(), y), 
            ha='center', va='bottom') 
    else:
        ax.annotate('{:.1f}%'.format(100.*y/spain), (x.mean(), y), 
            ha='center', va='bottom') 

Here, 'other' and 'spain' are the total amount of observations for each group. But my problem is that the countplot prioritizes the "counts" rather than the "percentages". So the output is completely off, see this result:

enter image description here

Does anyone have a suggestion on how to approach this? Thanks in advance!


Solution

  • I feel kind of stupid, minutes after posting this question I found this github answer https://github.com/mwaskom/seaborn/issues/1027#issuecomment-360866896 which contains the solution...

    For completeness sake, here is the code I used:

    x, hue = "Preparedness_Q2", "country_group"
    hue_order = ["Not at all", "Prepared only in some aspects", "Prepared enough",'Fully prepared','No Answer']
    
    prop_df = (bod[x].groupby(bod[hue]).value_counts(normalize=True).rename(y).reset_index())
    
    plt.figure(figsize=(15,6))
    ax = sns.barplot(x=x, y=y, hue=hue, data=prop_df, order=hue_order)
    
    plt.title('Preparedness for the pandemic', pad=20, fontsize=20)
    plt.xlabel(None)
    plt.ylabel('Frequencies [%]', labelpad=15)
    
    for p in ax.patches:
        x=p.get_bbox().get_points()[:,0]
        y=p.get_bbox().get_points()[1,1]
        ax.annotate('{:.1f}%'.format(100.*y), (x.mean(), y), 
                ha='center', va='bottom') # set the alignment of the text
        
    # Change legend title
    legend = ax.legend(loc='upper left')
    legend.texts[0].set_text("Other countries")
    
    plt.show()
    

    enter image description here