Search code examples
pythonmatplotlibseabornviolin-plot

violinplot not correctly scaling by count


I am trying to scale my violin plot by count, but the three final violins, which are based on three data points each, are vastly bigger than the first three, which are based on many more.

My code is as follows:

fig = plt.figure(figsize=(20,10))
grid = plt.GridSpec(1, 1, wspace=0.15, hspace=0)

plotol= fig.add_subplot(grid[0,0])
olivine = sns.violinplot(x=olivinedata.Sample, y=olivinedata.FoContent, scale='count', hue=olivinedata.RimCore, order=["85B", "95B", "98", "LZa* (Tranquil)", "LZa* (Banded)", "LZb* ", "LZa", "LZb", "LZc"], ax=plotol)
plotol.set_xticklabels(plotol.get_xticklabels(), 
                          rotation=20, fontsize = 15,
                          horizontalalignment='right')
plotol.set_yticklabels(plotol.get_yticks(), size=15)


plotol.set_xlabel("Sample",size = 24,alpha=0.7)
plotol.set_ylabel("Fo# (mol. %)",size = 24,alpha=0.7)
plt.setp(plotol.get_legend().get_texts(), fontsize='22')
plotol.legend(title="Measurement Type")

I am also getting a warning message

UserWarning: FixedFormatter should only be used together with FixedLocator if sys.path[0] == '':

which results from inclusion of the line:

plotol.set_yticklabels(plotol.get_yticks(), size=15)

and I have no idea why. Any help is appreciated!

Violin Plot


Solution

  • You probably want scale_hue=False, otherwise the scaling acts per x category.

    Here is a comparison of the scale options, with and without scale_hue:

    import matplotlib.pyplot as plt
    import pandas as pd
    import numpy as np
    import seaborn as sns
    
    df1 = pd.DataFrame({'sample': np.repeat([*'ABC'], 20),
                        'hue': np.repeat([*'BBRBRB'], 10),
                        'val': np.random.uniform(10, 20, 60)})
    df2 = pd.DataFrame({'sample': np.repeat([*'XYZ'], 3),
                        'hue': np.repeat([*'BBB'], 3),
                        'val': np.random.uniform(10, 20, 9)})
    fig, axes = plt.subplots(nrows=2, ncols=3, figsize=(24, 8))
    for row, scale_hue in zip([0, 1], [True, False]):
        for ax, scale in zip(axes[row, :], ['area', 'count', 'width']):
            sns.violinplot(data=pd.concat([df1, df2]), x='sample', y='val', hue='hue',
                           scale=scale, scale_hue=scale_hue, ax=ax)
            ax.set_title(f"scale='{scale}', scale_hue={scale_hue}", size=16)
    plt.tight_layout()
    plt.show()
    

    comparison violinplot scale count