Search code examples
pythonpandasmatplotlibseabornbar-chart

How to display custom values on a bar plot


I'm looking to see how to do two things in Seaborn with using a bar chart to display values that are in the dataframe, but not in the graph.

  1. I'm looking to display the values of one field in a dataframe while graphing another. For example, below, I'm graphing 'tip', but I would like to place the value of 'total_bill' centered above each of the bars (i.e.325.88 above Friday, 1778.40 above Saturday, etc.)
  2. Is there a way to scale the colors of the bars, with the lowest value of 'total_bill' having the lightest color (in this case Friday) and the highest value of 'total_bill' having the darkest? Obviously, I'd stick with one color (i.e., blue) when I do the scaling.

While I see that others think that this is a duplicate of another problem (or two), I am missing the part of how I use a value that is not in the graph as the basis for the label or the shading. How do I say, use total_bill as the basis. I'm sorry, but I just can't figure it out based on those answers.

Starting with the following code,

import pandas as pd
import seaborn as sns
%matplotlib inline

df = pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch08/tips.csv", sep=',')
groupedvalues = df.groupby('day').sum().reset_index()
g = sns.barplot(x='day', y='tip', data=groupedvalues)

I get the following result:

Enter image description here

Interim Solution:

for index, row in groupedvalues.iterrows():
    g.text(row.name, row.tip, round(row.total_bill, 2), color='black', ha="center")

Enter image description here

On the shading, using the example below, I tried the following:

import pandas as pd
import seaborn as sns
%matplotlib inline

df = pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch08/tips.csv", sep=',')
groupedvalues = df.groupby('day').sum().reset_index()

pal = sns.color_palette("Greens_d", len(data))
rank = groupedvalues.argsort().argsort()
g = sns.barplot(x='day', y='tip', data=groupedvalues)

for index, row in groupedvalues.iterrows():
    g.text(row.name, row.tip, round(row.total_bill, 2), color='black', ha="center")

But that gave me the following error:

AttributeError: 'DataFrame' object has no attribute 'argsort'

So I tried a modification:

import pandas as pd
import seaborn as sns
%matplotlib inline

df = pd.read_csv("https://raw.githubusercontent.com/wesm/pydata-book/1st-edition/ch08/tips.csv", sep=',')
groupedvalues = df.groupby('day').sum().reset_index()

pal = sns.color_palette("Greens_d", len(data))
rank = groupedvalues['total_bill'].rank(ascending=True)
g = sns.barplot(x='day', y='tip', data=groupedvalues, palette=np.array(pal[::-1])[rank])

and that leaves me with

IndexError: index 4 is out of bounds for axis 0 with size 4


Solution

  • Stick to the solution from Changing color scale in seaborn bar plot, which uses argsort to determine the order of the bar colors. In the linked question, argsort is applied to a Series object, while here you have a DataFrame. Select one column of the DataFrame to apply argsort on.

    import seaborn as sns
    import matplotlib.pyplot as plt
    import numpy as np
    
    df = sns.load_dataset('tips')
    groupedvalues = df.groupby('day').sum().reset_index()
    
    pal = sns.color_palette('Greens_d', len(groupedvalues))
    rank = groupedvalues['total_bill'].argsort().argsort() 
    g = sns.barplot(x='day', y='tip', data=groupedvalues, palette=np.array(pal[::-1])[rank])
    
    for index, row in groupedvalues.iterrows():
        g.text(row.name, row.tip, round(row.total_bill, 2), color='black', ha='center')
        
    plt.show()
    

    enter image description here


    The second attempt works fine as well, the only issue is the rank, as returned by rank(), starts at 1 instead of 0. So one has to subtract 1 from the array. For indexing, we need integer values, so cast it to int.

    rank = groupedvalues['total_bill'].rank(ascending=True).values
    rank = (rank-1).astype(int)
    

    • From matplotlib 3.4.0, there is .bar_label, which has a label parameter for custom labels.
      • Other answers using .bar_label didn't customize the labels with labels=.
      • See this answer from May 16, 2021, for a thorough explanation of .bar_label with links to documentation and examples.
    1. The day column downloads as a category Dtype, which keeps the days of the week in order. This also ensures the plot order of the bars on the x-axis and the values in tb.
      • .bar_label adds labels from left to right, so the values in tb are in the same order as the bars.
      • If working with a column that isn't categorical, pd.Categorical can be used on the column to set the order.
    • In sns.barplot, estimator=sum is specified to sum tip. The default is mean.
    df = sns.load_dataset("tips")
    
    # sum total_bill by day
    tb = df.groupby('day').total_bill.sum()
    
    # get the colors in blues as requested
    pal = sns.color_palette("Blues_r", len(tb))
    
    # rank the total_bill sums
    rank = tb.argsort()
    
    # plot
    fig, ax = plt.subplots(figsize=(8, 6))
    sns.barplot(x='day', y='tip', data=df, palette=np.array(pal[::-1])[rank], estimator=sum, ci=False, ax=ax)
    
    # 1. add labels using bar_label with custom labels from tb
    ax.bar_label(ax.containers[0], labels=tb, padding=3)
    
    # pad the spacing between the number and the edge of the figure
    ax.margins(y=0.1)
    
    plt.show()
    

    enter image description here