I have a list containing the names of different methods and their performance on a test set which I want to show using a bar chart. Well, in fact I would like to draw their relative improvement/degradation with respect to the baseline model. So, the data looks like:
system_1,+2.5
system_2,-0.8
system_3,+0.24
I've tried the bar chart in seaborn which gives me a simple bar chart with a fixed color. But, what I am looking for a bar chart in which the colours are in the range of red, white, green
where the red corresponds to data['score'].min()
, white corresponds to 0
and green represents the data['score'].max()
. I would like the darkness/brightness of the colours show their distance from 0 meaning that dark red shows the worst system and dark green shows the best performing system and all the performances in the middle being shown by lighter colours.
I've found some solutions to make gradient colours, but they don't do what I expect. Here is my code and the chart that I get.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import sys
import numpy as np
sns.set(style="whitegrid", color_codes=True)
data = pd.read_csv(sys.argv[1])
pal = sns.color_palette("Greens_d", len(data))
colors = [0 if c >=0 else 1 for c in data['performance']]
ax = sns.barplot(x="performance", y="System", data=data, palette=pal)
plt.tight_layout()
plt.show()
As you see, instead of making the color range depending on the value of the datapoints it varies the colors based on the indexes of the data points. Do you have any idea on this?
Thank you very much!
[edit in 2023: matplotlib's DivergingNorm
has been renamed to TwoSlopeNorm
, with the same functionality.]
The following approach uses a diverging norm and the red-yellow-green colormap to map the lowest value to the red extreme, zero to yellow and the highest to green.
As the short bars get a very light color, a black edge is added to make every bar clearly visible.
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import TwoSlopeNorm
import numpy as np
sns.set(style='whitegrid', color_codes=True)
N = 11
data = pd.DataFrame({'System': [f'System {i}' for i in range(1, N + 1)],
'performance': np.random.uniform(-1.5, 2.5, N)})
norm = TwoSlopeNorm(vmin=data.performance.min(), vcenter=0, vmax=data.performance.max())
colors = [plt.cm.RdYlGn(norm(c)) for c in data['performance']]
ax = sns.barplot(x='performance', y='System', data=data, palette=colors, edgecolor='black')
plt.tight_layout()
plt.show()
Seaborn's diverging_palette
can be used to create a color palette given two hue values. A hue of 0 is red, a hue of 150 is green. Default the center is white. You can experiment with saturation s=80
and lightness l=55
.
red_green_pal = sns.diverging_palette(0, 150, n=256, as_cmap=True)
colors = [red_green_pal(norm(c)) for c in data['performance']]