Search code examples
pythonpandasmatplotlibviolin-plot

Pythonic was to plot violin plot of a ranges of data


I have a dataset that looks like

    x       y
    0.07    0.400000
    0.07    0.171429
    0.08    0.214286
    0.08    0.214286
    0.08    0.214286
    0.09    0.142857
    0.09    0.571429
    0.09    0.071429
    0.09    0.271429
    0.10    0.342857

I want to plot a violin plot for a given range of x, for example from 0.07 to 0.08 and then from 0.09 to 0.1

I'm using

ax = sns.violinplot(x="x", y="y", data=df)

Which, obviously gives me a violin plot per value of x. Using the data above I would get, 4 plots.


Solution

  • You could try pandas' cut to put the data into bins. These bins can be added to a new column:

    import pandas as pd
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    
    df = pd.DataFrame({'x': np.random.randint(6, 13, 50) * 0.01,
                       'y': np.random.uniform(0, 1, 50)})
    ranges = np.arange(0.055, 0.14, 0.02)
    ax = sns.violinplot(x=pd.cut(df.x, ranges), y='y', data=df)
    ax.set_xticklabels([f'{r + 0.005:.2f}-{r + 0.015:.2f}' for r in ranges[:-1]])
    plt.show()
    

    example plot

    PS: An adaption to address the additional questions in the comments:

    df = pd.DataFrame({'x': np.random.randint(6, 13, 50) * 0.01,
                       'y': np.random.uniform(0, 1, 50)})
    ranges = np.append(0.055, np.arange(0.065, 0.14, 0.02))
    df['category'] = pd.cut(df.x, ranges)
    counts = df.groupby(['category'])['x'].count()
    
    ax = sns.violinplot(x='category', y='y', data=df, palette='Greens')
    labels = ['0.06'] + [f'{r + 0.005:.2f}-{r + 0.015:.2f}' for r in ranges[1:-1]]
    ax.set_xticklabels([f'{label}\n({count / sum(counts) * 100:.1f} %)' for label, count in zip(labels, counts)])
    plt.tight_layout()
    plt.show()
    

    resulting plot

    To add the percentages on the violins:

    counts = df.groupby(['category'])['x'].count()
    means = df.groupby(['category'])['y'].mean()
    for i, (mean, count) in enumerate(zip(means, counts)):
        ax.text(i, mean, f'{count/sum(counts)*100} %', ha='center', va='center', color='r')