Search code examples
pythonmatplotlibseabornviolin-plot

Why is the violin plot shape different for the same distribution?


I am plotting three distributions of experiment results for comparing them side to side. However, one of the distributions (labeled MLP) is fixed (the same distribution in every plot), so I was expecting it to have the same shape accross different plots, assuming that I have set a fixed y axis range (0,1).

I am using seaborn.violinplot (Python 3) for generating the plots. See some examples:

The other distributions are clearly influencing its shape but I don't know the reason. I tried to set a seed before plotting the dists, and also chose bw=0.2, bw='scott' and bw='silverman', but none of these worked. Why is the MLP violin shape different?

This is the code I use to produce these plots:

for metric in metrics:
    random.seed(42)
    np.random.seed(42)
    file_name = f"{file_name_base}{metric}/{cancer}_{strategy_translation[strategy]}_{threshold_str}.pdf" 
    ax = sns.violinplot(data=df, x='Algorithm', y=metric, palette='turbo',
                        inner=None, linewidth=0, saturation=0.4)
    ax.set(ylim=(0, 1))
    sns.boxplot(x='Algorithm', y=metric, data=df, palette='turbo', width=0.3,
                boxprops={'zorder': 2}, ax=ax).set(title=title)

    for i, algorithm in enumerate(algorithms):
        median = df.loc[df['Algorithm']==algorithm][metric].median()
        plt.axhline(y=median, color=colors[i], linestyle ="--")

    plt.savefig(file_name)
    plt.clf()

And the df object looks like

Metric 1 Metric 2 ... Algorithm
0.1 0.8 MLP
0.2 0.81 MLP
0.12 0.77 GAT
0.1 0.82 GAT
0.17 0.89 GCN
0.13 0.79 GCN

Solution

  • As pointed out by mwaskom, the solution was to use the scale parameter. In my case, as all distributions have the same number of samples, I simply added scale="count" to sns.violinplot method.

    scale{“area”, “count”, “width”}, optional The method used to scale the width of each violin. If area, each violin will have the same area. If count, the width of the violins will be scaled by the number of observations in that bin. If width, each violin will have the same width. from seaborn documentation