Search code examples
pythonrggplot2pandasrpy2

Pandas stacked multilevel index plot


I would like to make a stacked bar plot for one index level, while the other remains unstacked. The code below create tuples for each index row:

from pandas import DataFrame, MultiIndex
from numpy import repeat
from numpy.random import randn
arrays = [repeat('a b'.split(),2),[True,False,True,False]]
midx = MultiIndex.from_tuples(zip(*arrays), names=['letters','bool'])
df = DataFrame(randn(4,2)**2+5, index=midx)
df.plot(kind='bar', stacked=True)
plt.legend(loc="center right", bbox_to_anchor=(1.5, 0.5), ncol=2)

enter image description here enter image description here

But I would rather want to see (0,1) grouped side by side, as when using this R code (in IPython):

%load_ext rmagic
dr = df.stack().reset_index()

and then

%%R -i dr

library(ggplot2)
names(dr) <- c('letters','bool','n','value')

    x <- ggplot() +
      geom_bar(data=dr, aes(y = value, x = letters, fill = bool), 
               stat="identity", position='stack') +
      theme_bw() + 
      facet_grid( ~ n)

print(x)

enter image description here

Now: Is there a way of doing this within pandas, should I torture matplotlib instead, should I install ggplot for python or should I just run ggplot2 in IPython using the Rmagic (as I just did)? I could not get rpy2's ggplot class

from rpy2.robjects.lib import ggplot2

to work with my layout (yet).


Solution

  • If you have the R code, porting to rpy2 can be taken progressively

    import rpy2.robjects as ro
    
    ro.globalenv['dr'] = dr
    
    ro.r("""
    library(ggplot2)
    names(dr) <- c('letters','bool','n','value')
    
    x <- ggplot() +
      geom_bar(data=dr, aes(y = value, x = letters, fill = bool), 
               stat="identity", position='stack') +
      theme_bw() + 
      facet_grid( ~ n)
    
    print(x)
    """)
    

    The drawback with this is that R's GlobalEnv is used. A function can be more elegant.

    make_plot = ro.r("""
    function(dr) {
      names(dr) <- c('letters','bool','n','value')
    
      x <- ggplot() +
        geom_bar(data=dr, aes(y = value, x = letters, fill = bool), 
                 stat="identity", position='stack') +
        theme_bw() + 
        facet_grid( ~ n)
    
      print(x)
    }""")
    
    make_plot(dr)
    

    An alternative is to use the ggplot2 mapping in rpy2, and write this without writting R code:

    from rpy2.robjects import Formula
    from rpy2.robjects.lib.ggplot2 import ggplot, geom_bar, aes_string, theme_bw, facet_grid
    
    ## oddity with names in the examples, that can either be corrected in the Python-pandas
    ## structure or with an explicit conversion into an R object and renaming there
    drr = rpy2.robjects.pandas2ri.pandas2ri(dr)
    drr.names[2] = 'n'
    drr.names[3] = 'value'
    
    p = ggplot(drr) + \
        geom_bar(aes_string(x="letters", y="value", fill="bool"),
                 stat="identity", position="stack") + \
        theme_bw() + \
        facet_grid(Formula('~ n'))
    
    p.plot()