Search code examples
pandasmatplotlibseaborn

Bar plots with label on top based on another column


I have this dataset and I want to generate a barplot where the speedup will be on top of each corresponding bar.

       no_dev  fw          model          time    speedup
0        8  pytorch    efficientnet   2984.223210  12.802327
1        8  pytorch           vgg16   2583.343883   6.794499
2        8  pytorch        resnet50    442.069308  24.661291
3        4  pytorch    efficientnet   5318.658496   7.183202
4        4  pytorch        resnet50    695.629588  15.672134
5        4  pytorch           vgg16   4796.323589   3.659580
6        2  pytorch        resnet50   1041.627414  10.466314
7        2  pytorch           vgg16   9465.335288   1.854401
8        2  pytorch    efficientnet   9365.883145   4.079167
9        1  pytorch        resnet50  10902.000000   1.000000
10       1  pytorch    efficientnet  38205.000000   1.000000
11       1  pytorch           vgg16  17552.527806   1.000000

I want the speedup to be on top of each bar. Here is my code below.

df = df.sort_values(by=['model', 'speedup'], ascending=True)


# Set the figure size
fig, ax = plt.subplots(figsize=(10, 6))

# Set the figure size and create the bar plot grouped by 'model' and 'no_npu'
bars = sns.barplot(data=df, x='model', y='time', hue='no_dev', dodge=True)

# Iterate over the bars and the DataFrame rows simultaneously

idx = 0
speedups = df['speedup'].values
print(speedups)
for bar in bars.patches:
    # Use the 'speedup' value from the DataFrame row for the label
    label = speedups[idx]
    print(label)
    
    # Annotate the bar with the 'speedup' value
    bars.annotate(f'{label:.2f}x',  # Formatting the label as a floating point with 'x' to denote speedup
                  xy=(bar.get_x() + bar.get_width() / 2, bar.get_height()),
                  xytext=(0, 3),  # 3 points vertical offset
                  textcoords="offset points",
                  ha='center', va='bottom')
    idx += 1

# Add labels and title
plt.legend()

After running this code I get plot below which misplaced the speedups.

enter image description here


Solution

  • The bar patches are not in the left to right order, but rather the first of each group (blue bars in your case), then the seconds (oranges), etc.

    If you sort the values differently, this should give the correct order:

    speedups = df.sort_values(by=['no_dev', 'model'])['speedup'].values
    

    output:

    enter image description here

    For reference, here is the order of the patches:

    enter image description here