I have a dataset df
that looks like this:
ID Week VarA VarB VarC VarD
s001 w1 2 5 4 7
s001 w2 4 5 2 3
s001 w3 7 2 0 1
s002 w1 4 0 9 8
s002 w2 1 5 2 5
s002 w3 7 3 6 0
s001 w1 6 5 7 9
s003 w2 2 0 1 0
s003 w3 6 9 3 4
For each ID, I am trying to plot its progress by Week for all Var (VarB,VarC,VarD) with VarA as the reference data.
I do df.melt()
and run coding below and it works.
ID Week Var Value
s001 w1 VarA 2
s001 w2 VarA 4
s001 w3 VarA 7
s002 w1 VarA 4
s002 w2 VarA 1
s002 w3 VarA 7
s001 w1 VarA 6
s003 w2 VarA 2
s003 w3 VarA 6
s001 w1 VarB 5
s001 w2 VarB 5
...
Codes:
for id in idlist:
#get VarA into new df
newdf= df_melt[df_melt.Var == 'VarA']
#remove rows with VarA so it won't be included in facet_wrap()
tmp = df_melt[df_melt.Var != 'VarA']
plot2 = ggplot() + ggtitle(id) + labs(x='Week',y="Value") \
+ geom_point(newdf[newdf['ID'] == id], aes(x='Week',y='Value')) \
+ geom_point(tmp[tmp['ID'] == id], aes(x='Week',y='Value',color='Var')) \
+ theme(axis_text_x=element_text(rotation=45))
print(plot2)
However, when I add facet_wrap('Var', ncol=3,scales='free')
I get an error below
IndexError: arrays used as indices must be of integer (or boolean) type
And also I couldn't connect the line using geom_line()
.
Is this because of the different df
used? Is there a way to use multiple geom_point()
for different df and facet_wrap
in one ggplot object?
The issue with the question is a bug that would be reproduced by the following code. The bug has been fixed and the next version of plotnine will have the fix.
import pandas as pd
from plotnine import *
df1 = pd.DataFrame({
'x': list("abc"),
'y': [1, 2, 3],
'g': list("AAA")
})
df2 = pd.DataFrame({
'x': list("abc"),
'y': [4, 5, 6],
'g': list("AAB")
})
(ggplot(aes("x", "y"))
+ geom_point(df1)
+ geom_point(df2)
+ facet_wrap("g", scales="free_x")
)