Multiple overlapping seaborn violin plots, split by hue

I am trying to create overlapping and transparent violin plots split by one variable using seaborn in python. My dataset looks like this:

The variable "names" are "one" to "nine", "distance" is from 0 to 1, condition is either "healthy" or "disease", and "sample_id" is 1 to 16. Each "condition" has 8 sample_ids.

As you can see, the problem is that the two halves of the violin plot are wrong orientation for each of the "name" variables, and the legend contains disease/healthy "condition" variable for each of the 16 sample_ids.

The code that I am using for this is:

my_condition_palette={"disease": "darkorange","healthy":"steelblue"}
fig, ax = plt.pyplot.subplots()
for sample_id in my_ids:
sns.violinplot(data=my_dataset[my_dataset.sample_id==sample_id], x="name", y="distance", hue="condition", hue_order=["disease", "healthy"], palette=my_condition_palette, cut=0, linewidth=0, inner=None, split=True,density_norm="count",common_norm=False, gap=0.1)
for violin in ax.collections:

Does anyone know what I am doing wrong here? Or perhaps there is a better way of plotting this? Thank you!


  • With density_norm="count", the width of the violin for the x-value with the highest count (for the given sample_id) is maximized. The width of the other violins is shrunk relative to their count.

    In the given dataset, it seems that each sample_id is either fully 'healthy' or fully 'disease'. When drawing one sample_id, seaborn thinks there is only one hue value active, which will occupy the full width for each of the x-values. You can use dodge=True to force the violin to be reduced and put on the correct side.

    For the legend, you can set legend=False for all except one of the sample_ids.

    The following code creates reproducible test data and shows how everything could work. order= sets the order of the x values.

    from matplotlib import pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    # first, create some dummy test data
    df = pd.DataFrame({'sample_id': np.repeat(np.arange(1, 17), 100)})
    names = ['one', 'two', 'three', 'four', 'five', 'six']
    prob = np.random.rand(len(names)) ** 2 + 0.1  # use different probabilities for each 'name'
    prob /= prob.sum()  # the probabilities need to sum to 1
    df['name'] = np.random.choice(names, len(df), p=prob)
    df['distance'] = np.random.rand(len(df))
    df['condition'] = np.where(df['sample_id'] % 2 == 1, 'disease', 'healthy')
    my_ids = df.sample_id.unique()
    my_condition_palette = {"disease": "darkorange", "healthy": "steelblue"}
    fig, ax = plt.subplots()
    for sample_id in my_ids:
        sns.violinplot(data=df[df['sample_id'] == sample_id], x="name", y="distance", order=names,
                       hue="condition", hue_order=["disease", "healthy"], palette=my_condition_palette,
                       cut=0, linewidth=0, inner=None, split=True, density_norm="count", common_norm=False, gap=0.1,
                       legend=sample_id == my_ids[0])
    for violin in ax.collections:
        violin.set_alpha(1 / 8)
    sns.move_legend(ax, loc="upper left", bbox_to_anchor=(1, 1))
    ax.set_xlabel('')  # remove superfluous x label

    PS: This is how the plot looks without dodge=True, and plotting only the first sample. The "half" violins are rescaled to occupy the full width (default 0.8 wide) for each x value.

