Search code examples
pythonmatplotlibgraphseaborngraph-visualization

How to stop violin plot cutting off outliers or extreme values


I have a seaborn violin plot on the left, and matplotlib on the right.

As you can see, matplotlib removes some values/data, even with showextrema=True or False, that has no effect. How do I make matplotlib use violin plot to keep those values?

import matplotlib.pyplot as plt
import seaborn as sns

a = [195.0, 245.0, 142.0, 237.0, 153.0, 238.0, 168.0, 145.0, 229.0, 138.0, 176.0, 116.0, 252.0, 148.0, 199.0, 162.0, 134.0, 163.0, 130.0, 339.0, 152.0, 208.0, 152.0, 192.0, 163.0, 249.0, 113.0, 176.0, 123.0, 189.0, 150.0, 207.0, 184.0, 153.0, 228.0, 153.0, 170.0, 118.0, 302.0, 197.0, 211.0, 159.0, 228.0, 147.0, 166.0, 156.0, 167.0, 147.0, 126.0, 155.0, 138.0, 159.0, 139.0, 111.0, 133.0, 134.0, 131.0, 156.0, 240.0, 207.0, 150.0, 207.0, 265.0, 151.0, 173.0, 157.0, 261.0, 186.0, 195.0, 158.0, 272.0, 134.0, 221.0, 131.0, 252.0, 148.0, 178.0, 206.0, 146.0, 217.0, 159.0, 190.0, 156.0, 172.0, 159.0, 141.0, 167.0, 168.0, 218.0, 191.0, 207.0, 164.0]

fig, axes = plt.subplots()

# Seaborn violin plot
sns.violinplot(data=a, width=0.6, color="w" )

# Matplotlib violin plot
axes.violinplot(a, showmeans=True, showmedians=False, showextrema=False, widths = 0.6)
axes.set_xticks([y+1 for y in range(2)])
plt.show()

enter image description here


Solution

  • The range over which the KDE is plotted for a matplotlib violinplot is the range of input values. This is defined pretty deep in the code, so there is no easy option to change that.

    In contrast, the seaborn violinplot allows to have some good control over the KDE range. By default, it expands the shown KDE curve by twice the bandwidth of the KDE on each side of the plot. This is steered by the cut argument to sns.violinplot(, cut=2), which defaults to 2. If you set cut=0, you will obtain the same as the matplotlib violinplot. Together with the option to manually chose the KDE bandwidth as float, sns.violinplot(..., bw = 0.2, cut=2), you have a very good control over how the violinplot is displayed.

    In conclusion, just use the seaborn violinplot if you need fine grained control over the range of the KDE curve.