Search code examples
pythonpandasmatplotlibseabornboxplot

Ordering boxplot x-axis in seaborn


My dataframe round_data looks like this:

      error                         username                    task_path
0      0.02  n49vq14uhvy93i5uw33tf7s1ei07vngozrzlsr6q6cnh8w...    39.png
1      0.10  n49vq14uhvy93i5uw33tf7s1ei07vngozrzlsr6q6cnh8w...    45.png
2      0.15  n49vq14uhvy93i5uw33tf7s1ei07vngozrzlsr6q6cnh8w...    44.png
3      0.25  xdoaztndsxoxk3wycpxxkhaiew3lrsou3eafx3em58uqth...    43.png
...     ...                                                ...       ...
1170  -0.11  9qrz4829q27cu3pskups0vir0ftepql7ynpn6in9hxx3ux...    33.png
1171   0.15  9qrz4829q27cu3pskups0vir0ftepql7ynpn6in9hxx3ux...    34.png


[1198 rows x 3 columns]

I want to have a boxplot showing the error of each user sorted by their average performance. What I have is:

ax = sns.boxplot(
    x='username', 
    y='error', 
    data=round_data,
    whis=np.inf,
    color='c',
    ax=ax
)

which results into this plot: boxplot

How can I sort the x-axis (i.e., users) by mean error?


Solution

  • I figured out the answer:

    grouped = round_data[round_data.batch==i].groupby('username')
    users_sorted_average = (
        pd.DataFrame({col: vals['absolute_error'] for col, vals in grouped})
        .mean()
        .sort_values(ascending=True)
    )
    

    Passing users_sorted_average for the "order" parameter in the seaborn plot function would give the desired behavior:

    ax = sns.boxplot(
        x='username', 
        y='error', 
        data=round_data, 
        whis=np.inf,
        ax=ax,
        color=c,
        order=users_sorted_average.index,
    )
    

    enter image description here