Search code examples
python-3.xdata-visualizationbokeh

How do the factors in factor_cmap in Bokeh work?


I am trying to construct a grouped vertical bar chart in Bokeh from a pandas dataframe. I'm struggling with understanding the use of factor_cmap and how the color mapping works with this function. There's an example in the documentation (https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html#pandas) that was helpful to follow, here:

from bokeh.io import output_file, show
from bokeh.palettes import Spectral5
from bokeh.plotting import figure
from bokeh.sampledata.autompg import autompg_clean as df
from bokeh.transform import factor_cmap

output_file("bar_pandas_groupby_nested.html")

df.cyl = df.cyl.astype(str)
df.yr = df.yr.astype(str)

group = df.groupby(by=['cyl', 'mfr'])

index_cmap = factor_cmap('cyl_mfr', palette=Spectral5, factors=sorted(df.cyl.unique()), end=1)

p = figure(plot_width=800, plot_height=300, title="Mean MPG by # Cylinders and Manufacturer",
           x_range=group, toolbar_location=None, tooltips=[("MPG", "@mpg_mean"), ("Cyl, Mfr", "@cyl_mfr")])

p.vbar(x='cyl_mfr', top='mpg_mean', width=1, source=group,
       line_color="white", fill_color=index_cmap, )

p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Manufacturer grouped by # Cylinders"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None

show(p)

This yields the following (again, a screen shot from the documentation): Grouped Vbar output

I understand how factor_cmap is working here, I think. The index for the dataframe has multiple factors and we're only taking the first by slicing (as seen with the end = 1). But when I try to instead set coloring based on the second index level, mfr, (setting start = 1 , end = 2) , the index mapping breaks and I get this. I based this change on my assumption that the factors were hierarchical and I needed to slice them to get the second level.

I think I must be thinking about the indexing with these categorical factors wrong, but I'm not sure what I'm doing wrong. How do I get a categorical mapper to color by the second level of the factor? I assumed the format of the factors was ('cyl', 'mfr') but maybe that assumption is wrong?

Here's the documentation for factor_cmap, although it wasn't very helpful: https://docs.bokeh.org/en/latest/docs/reference/transform.html#bokeh.transform.factor_cmap .


Solution

  • If you mean you are trying this:

    index_cmap = factor_cmap('cyl_mfr', 
                             palette=Spectral5, 
                             factors=sorted(df.cyl.unique()), 
                             start=1, end=2)
    

    Then there are at least two issues:

    • 2 is out of bounds for the length of the list of sub-factors ('cyl', 'mfr'). You would just want start=1 and leave end with its default value of None (which means to the end of the list, as usual for any Python slice).

    • In this specific case, with start=1 that means "colormap based on mfr sub-factors of the values", but you are still configuring the cololormapper with the cylinders as the factors for the map:

      factors=sorted(df.cyl.unique())
      

      When the colormapper goes to look up a value with mfr="mazda" in the mapping, it does not find anything (because you only put cylinder values in the mapping) so it gets shaded the default color grey (as expected).

    So you could do something like this:

    index_cmap = factor_cmap('cyl_mfr', 
                             palette=Spectral5, 
                             factors=sorted(df.mfr.unique()), 
                             start=1)
    

    Which "works" modulo the fact that there are way more manufacturer values than there are colors in the Spectral5 palette:

    enter image description here

    In the real situation you'll need to make sure you use a palette as least as big as the number of (sub-)factors that you configure.