Search code examples
pythonmatplotlibseaborndisplothistplot

How to separately normalize each distribution group


Lets say I have a dataframe such as:

CATEGORY  Value
a          v1
a          v2
a          v3
a          v4
a          v5
b          v6
b          v7
b          v8

Now, if i want to plot this distributions by category, i could use something like:

sns.histplot(data,"Value",hue="CATEGORY",stat="percent").

The problem with this is that category "a" represents 5/8 of the sample and "b" is 3/8. The histograms will reflect this. I want to plot in a way that each histogram will have an area of 1, instead of 5/8 and 3/8.

Below is an example of how it looks like now

enter image description here

But each of those areas should be one.

I thought of maybe iterating by category and plotting one by one


Solution

  • As per this answer of the duplicate, use common_norm=False.

    Also see seaborn histplot and displot output doesn't match.

    This is not specific to stat='percent'. Other options are 'frequency', 'probability', and 'density'.

    import seaborn as sns
    import matplotlib.pyplot as plt
    
    tips = sns.load_dataset('tips')
    
    fig, axes = plt.subplots(nrows=2, figsize=(20, 10), tight_layout=True)
    
    sns.histplot(data=tips, x='total_bill', hue='day', stat='percent', multiple='dodge', bins=30, common_norm=True, ax=axes[0])
    sns.histplot(data=tips, x='total_bill', hue='day', stat='percent', multiple='dodge', bins=30, common_norm=False, ax=axes[1])
    
    axes[0].set_title('common_norm=True', fontweight='bold')
    axes[1].set_title('common_norm=False', fontweight='bold')
    
    handles = axes[1].get_legend().legend_handles
    
    for ax in axes:
        for c in ax.containers:
            ax.bar_label(c, fmt=lambda x: f'{x:0.2f}%' if x > 0 else '', rotation=90, padding=3, fontsize=8, fontweight='bold')
        ax.margins(y=0.15)
        ax.spines[['top', 'right']].set_visible(False)
        ax.get_legend().remove()
    
    _ = fig.legend(title='Day', handles=handles, labels=tips.day.cat.categories.tolist(), bbox_to_anchor=(1, 0.5), loc='center left', frameon=False)
    

    enter image description here

    sns.displot

    g = sns.displot(data=tips, kind='hist', x='total_bill', hue='day', stat='percent', multiple='dodge', bins=30, common_norm=False, height=5, aspect=4)
    
    ax = g.axes.flat[0]  # ax = g.axes[0][0] also works
    
    for c in ax.containers:
        ax.bar_label(c, fmt=lambda x: f'{x:0.2f}%' if x > 0 else '', rotation=90, padding=3, fontsize=8, fontweight='bold')
    

    enter image description here