Search code examples
pythonpython-ggplotplotnine

Plot multiple dataframe in a plot with facet_wrap


I have a dataset df that looks like this:

ID      Week    VarA    VarB    VarC    VarD
s001    w1      2       5       4       7
s001    w2      4       5       2       3
s001    w3      7       2       0       1
s002    w1      4       0       9       8
s002    w2      1       5       2       5
s002    w3      7       3       6       0
s001    w1      6       5       7       9
s003    w2      2       0       1       0
s003    w3      6       9       3       4

For each ID, I am trying to plot its progress by Week for all Var (VarB,VarC,VarD) with VarA as the reference data.

I do df.melt() and run coding below and it works.

ID     Week  Var  Value
s001    w1  VarA    2
s001    w2  VarA    4
s001    w3  VarA    7
s002    w1  VarA    4
s002    w2  VarA    1
s002    w3  VarA    7
s001    w1  VarA    6
s003    w2  VarA    2
s003    w3  VarA    6
s001    w1  VarB    5
s001    w2  VarB    5
...

Codes:

for id in idlist:

#get VarA into new df
newdf= df_melt[df_melt.Var == 'VarA']

#remove rows with VarA so it won't be included in facet_wrap()  
tmp = df_melt[df_melt.Var != 'VarA']

plot2 = ggplot() + ggtitle(id) + labs(x='Week',y="Value") \
    + geom_point(newdf[newdf['ID'] == id], aes(x='Week',y='Value')) \
        + geom_point(tmp[tmp['ID'] == id], aes(x='Week',y='Value',color='Var')) \
           + theme(axis_text_x=element_text(rotation=45))

print(plot2)  

However, when I add facet_wrap('Var', ncol=3,scales='free') I get an error below

IndexError: arrays used as indices must be of integer (or boolean) type

And also I couldn't connect the line using geom_line().

This is my expected output: enter image description here

Is this because of the different df used? Is there a way to use multiple geom_point() for different df and facet_wrap in one ggplot object?


Solution

  • The issue with the question is a bug that would be reproduced by the following code. The bug has been fixed and the next version of plotnine will have the fix.

    import pandas as pd
    from plotnine import *
    
    df1 = pd.DataFrame({
        'x': list("abc"),
        'y': [1, 2, 3],
        'g': list("AAA")
    
    })
    
    df2 = pd.DataFrame({
        'x': list("abc"),
        'y': [4, 5, 6],
        'g': list("AAB")
    })
    
    (ggplot(aes("x", "y"))
     + geom_point(df1)
     + geom_point(df2)
     + facet_wrap("g", scales="free_x")
    )