Search code examples
pandassortingseabornbar-chartslice

Sns barplot does not sort sliced values


I want to plot from pd df using sns barplot. Everything works fine : PLOT 1

code associated :

result = df.groupby(['Code departement']).size().sort_values(ascending=False)
x=result.index
y=result.values
plot=sns.barplot(x, y)
plot.set(xlabel='Code departement', ylabel='Nombre de transactions')
sns.barplot(x, y, data=df).set_title('title')

But as you can see in PLOT 1, there are too many bars so I just want the 10 highest, and when I slice x and y :

x=result[:10].index
y=result[:10].values
plot=sns.barplot(x, y)

It prints bars unordered like this : PLOT 2

I checked by printing x and y (sliced) and they are right ordered, Idk what I am missing thank you for your help


Solution

  • You didn't state the version you are using, but probably it isn't the latest. Seaborn as well as matplotlib receive quite some improvements with each new version.

    With seaborn 0.11.1 you'd get a warning, as x and y is preferred to be passed via keywords, i.e. sns.barplot(x=x, y=y). The warning tries to avoid confusion with the data= keyword. Apart from that, the numeric x-values would appear sorted numerically.

    The order can be controlled via the order= keyword. In this case, sns.barplot(x=x, y=y, order=x). To only have the 10 highest, you can pass sns.barplot(x=x, y=y, order=x[:10]).

    Also note that you are creating the bar plot twice (just to change the title?), which can be very confusing. As sns.barplot returns the ax (the subplot onto which the plot has been drawn), the usual approach is ax = sns.barplot(...) and then ax.set_title(...). (The name ax is preferred, to easier understand how matplotlib and seaborn example code can be employed in new code.)

    The following example code has been tested with seaborn 0.11.1:

    import matplotlib.pyplot as plt
    import seaborn as sns
    import pandas as pd
    import numpy as np
    
    print(sns.__version__)
    df = pd.DataFrame({'Code departement': np.random.randint(1, 51, 1000)})
    result = df.groupby(['Code departement']).size().sort_values(ascending=False)
    x = result.index
    y = result.values
    ax = sns.barplot(x, y, order=x[:10])
    ax.set(xlabel='Code departement', ylabel='Nombre de transactions')
    ax.set_title('title')
    plt.show()
    

    barplot, only 10 highest