I have two dataframes which I need to get the difference and then plot one of them on top of this difference. Here is a minimal example:
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame([[2,5,7,6,7],[4,4,4,4,3],[8,8,7,3,4],[16,10,12,13,16]], columns=["N", "A", "B", "C", "D"])
df2 = pd.DataFrame([[2,1,3,6,5],[4,1,2,3,2],[8,2,2,3,3],[16,8,10,3,11]], columns=["N", "A", "B", "C", "D"])
dfDiff = df1 - df2
dfDiff['N'] = df1['N']
# Individual barchart
colors = ['#6c8ebf', '#82b366', '#F7A01D', '#9876a7']
df1.set_index('N')[["A", "B", "C", "D"]].plot.bar(color=colors)
df2.set_index('N')[["A", "B", "C", "D"]].plot.bar(color=colors)
dfStacked = pd.DataFrame(columns=["N", "A", "A_diff", "B", "B_diff"])
dfStacked["N"] = df2["N"]
dfStacked["A"] = df2["A"]
dfStacked["B"] = df2["B"]
dfStacked["C"] = df2["C"]
dfStacked["D"] = df2["D"]
dfStacked["A_diff"] = dfDiff["A"]
dfStacked["B_diff"] = dfDiff["B"]
dfStacked["C_diff"] = dfDiff["C"]
dfStacked["D_diff"] = dfDiff["D"]
dfStacked.set_index('N').plot.bar(stacked=True)
plt.show()
The dataframes look like this:
The thing is that now the new stacked one ends up with everything merged. I want to have "A" stacked with "A_diff", "B", stacked with "B_diff", "C" stacked with "C_diff" and "D" stacked with "D_diff".
For example, I changed the code to do it with "A" and "A_diff" as dfStacked.set_index('N')[["A", "A_diff"]].plot.bar(stacked=True)
which looks correct, but I want A,B,C and D grouped by N like in the first two figures.
Do I need a new dataframe for this, like dfStacked
? If so, in which form should the content be added? And how can I keep the same colors but add hatch="/"
only for the "top" stacked bar?
Would it be better to have the dataframe as below?:
df3 = pd.DataFrame(columns=["N", "Algorithm", "df1", "dfDiff"])
df3.loc[len(df3)] = [2, "A", 20, 10]
df3.loc[len(df3)] = [2, "A", 1, 4]
df3.loc[len(df3)] = [4, "A", 2, 3]
df3.loc[len(df3)] = [4, "A", 3, 4]
df3.loc[len(df3)] = [2, "B", 1, 3]
df3.loc[len(df3)] = [2, "B", 2, 4]
df3.loc[len(df3)] = [4, "B", 3, 3]
df3.loc[len(df3)] = [4, "B", 4, 2]
But how to group them by "N" and "Algorithm"? I mean, each row corresponds to one bar, just they should be grouped by "N" with all the "Algorithms" and the two last columns are the two "parts" of each bar. It would be good that the colors match the first two figures (for the "Algorithms") but the top part of the bar has hatch="/"
for example.
I'll start from df1
, df2
and get dfStacked
in a slightly different way:
import pandas as pd
df1 = pd.DataFrame(
[
[2,5,7,6,7],
[4,4,4,4,3],
[8,8,7,3,4],
[16,10,12,13,16]
],
columns=["N", "A", "B", "C", "D"]
).set_index('N')
df2 = pd.DataFrame(
[
[2,1,3,6,5],
[4,1,2,3,2],
[8,2,2,3,3],
[16,8,10,3,11]
],
columns=["N", "A", "B", "C", "D"]
).set_index('N')
dfStacked = pd.concat(
[df1, df1-df2],
axis=1,
keys=['raw','diff']
).reorder_levels([1,0], axis=1)
Now we have this DataFrame
:
To draw this data in a bar chart stacked by the first level we could make use of two DataFrame.plot
's features - ax
and bottom
. The first one is the location of the axes where the barplot should be drawn, the second one is for the values where the bottom line of the bars should start. For details run help(plt.bar)
to read about bottom
and help(pd.DataFrame.plot)
to read about ax
.
import matplotlib.pyplot as plt
from matplotlib.colors import TABLEAU_COLORS
plt.figure(figsize=(10,7))
ax = plt.gca()
names = dfStacked.columns.levels[0]
n = len(names)
color = iter(TABLEAU_COLORS)
w = 1/(n+2) # width
h = '/'*5 # hatch for diff values
for i, name in enumerate(names):
c = next(color) # color
p = n/2 - i # position
dfStacked[name]['raw'].plot.bar(
ax=ax,
position=p,
width=w,
color=c,
label=f'{name} raw'
)
dfStacked[name]['diff'].plot.bar(
ax=ax,
bottom=dfStacked[name]['raw'],
hatch=h,
position=p,
width=w,
color=c,
label=f'{name} diff'
)
ax.set_xlim([-1, n])
ax.tick_params(axis='x', rotation=0)
ax.legend();
And here's the output: