Search code examples
pythonmatplotlibseabornboxplot

Adding multiple vertical lines on boxplot in seaborn


I am trying to draw multiple boxplots on a plot and I would like to indicate some threshold value for each distribution. I now have this code:

sns.set_style('white')
palette = 'Pastel1'
plt.figure(figsize=(20,50))
plt.style.use("seaborn-whitegrid")
ax = sns.violinplot(y="category", x="value", data=df, hue="value", dodge=False,
                    palette=palette,
                    scale="width", inner=None)
xlim = ax.get_xlim()
ylim = ax.get_ylim()
for violin in ax.collections:
    bbox = violin.get_paths()[0].get_extents()
    x0, y0, width, height = bbox.bounds
    violin.set_clip_path(plt.Rectangle((x0, y0), width, height / 2, transform=ax.transData))

ax.set_alpha(0.6)
sns.boxplot(y="category", x="value", data=df, saturation=1, showfliers=False,
            width=0.1, boxprops={'zorder': 4, 'facecolor': 'white'}, ax=ax)
old_len_collections = len(ax.collections)
sns.stripplot(y="category", x="value", data=df, hue="category", palette=palette, dodge=False, ax=ax)
for dots in ax.collections[old_len_collections:]:
    dots.set_offsets(dots.get_offsets() + np.array([0.12, 0]))
ax.set_xlim(xlim)
ax.set_ylim(ylim)
ax.legend_.remove()
plt.show()

This draws me a split box-violin plot like this: enter image description here

Now what I would like to do is to draw a segment on each category that has a unique value for each of the categories, but I can't get axvlines to work. How do I do it? I'm going for something like this (this is hand drawn on previous plot, the black bars are those that i want to be drawing automatically):

enter image description here


Solution

  • Here's a basic example that I think you can adapt for your needs:

    # create some mock data
    import numpy as np
    import pandas as pd
    import seaborn as sns
    
    df = pd.DataFrame(
        {
            "a": np.random.normal(0.1, 2.3, 1000),
            "b": np.random.chisquare(2, size=1000),
            "c": np.random.gamma(2, 2, size=1000),
        }
    )
    
    # create the violin plot
    ax = sns.violinplot(data=df, orient="h")
    
    # set the position of the thresholds for each category
    thresholds = {"a": 5.3, "b": 9.2, "c": 4.8}
    
    # get the y-positions of the tick label for each category
    ypos = {c.get_text(): y for c, y in zip(ax.get_yticklabels(), ax.get_yticks())}
    
    # plot the threshold lines
    for cat in df.columns:
        ax.plot(
            [thresholds[cat], thresholds[cat]],
            [ypos[cat] - 0.1, ypos[cat] + 0.1],
            color="k",
            lw=2,
        )
    

    This gives:

    enter image description here

    You can obviously alter that colour or thickness or extent of the threshold markers as you see fit.