Search code examples
pythonplotbokehtimeserieschart

bokeh - multiline line plot with flexible mapping


I have a Pandas DataFrame of the following format

name -  date   - score
 A   - 1/1/10  - 100
 A   - 1/2/10  - 200
 A   - 1/3/10  - 300
 B   - 1/1/10  - 150
 B   - 1/2/10  - 400
 B   - 1/3/10  - 600

I want to create a Bokeh plot has date on the x axis, score on the y axis and a separate line + colour for each name. I'm working from a Jupyter notebook.

Here's some test data, although I want to get something that works for arbitrary number of/ values in name, rather than just A and B.

import pandas as pd
import datetime
test_data = {'name':['A','A','A','B','B','B'],
        'date':[datetime.date(2010,1,1),
               datetime.date(2010,2,1),
              datetime.date(2010,3,1),
              datetime.date(2010,1,1),
              datetime.date(2010,2,1),
              datetime.date(2010,3,1),],
        'score':[100,200,300,150,400,600]}

plot_df = pd.DataFrame(test_data)

Using Seaborn, I would do it like this.

import seaborn as sns
ax = sns.lineplot(data=plot_df, x='date',y='score',hue='name')

I am wondering the most efficient way to do the same thing using Bokeh?

I can plot a single player like this.

import bokeh.plotting as bp
bp.output_notebook()

filtered_df = plot_df[plot_df.player == 'A'].sort_values(by=['date'])
plot_ds = bp.ColumnDataSource(filtered_df)
plot = bp.figure()
plot.line('date','score',source=plot_ds)
bp.show(plot)

I'm wondering how to get this to work for arbitrary number of different names. Again, I need it to be robust to a change in the number of distinct names.

I think I should use a colormapper somehow, but am confused about exactly how I incorporate it? I also see that there is another answer here that hardcodes the variable --> colour mapping and and trying to think of the easiest way of generalising this.

EDIT - the multiline chart would also need a legend for each name, similar to Seaborn example.

The next step would be getting this to work so that you can dynamically change the names and date range using slider + radio-buttons, but I want to get this simpler plot working first. This is why I'm not just sticking with Seaborn.


Solution

  • Maybe something like this (for Bokeh 1.1.0):

    import pandas as pd
    import datetime
    import bokeh.plotting as bp
    from bokeh.palettes import Category10
    
    test_data = {'name': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
                 'date': [datetime.date(2010, 1, 1),
                          datetime.date(2010, 2, 1),
                          datetime.date(2010, 3, 1),
                          datetime.date(2010, 1, 1),
                          datetime.date(2010, 2, 1),
                          datetime.date(2010, 3, 1),
                          datetime.date(2010, 1, 1),
                          datetime.date(2010, 2, 1),
                          datetime.date(2010, 3, 1), ],
                 'score': [100, 200, 300, 150, 400, 600, 150, 250, 400]}
    
    plot_df = pd.DataFrame(test_data)
    gby = plot_df.groupby('name')
    names = list(gby.groups.keys())
    palette = Category10[len(names)]
    
    plot_df['color'] = [palette[names.index(x)] for i, sdf in gby for x in sdf['name']]
    
    plot = bp.figure(x_axis_type = 'datetime')
    gby.apply(lambda d: plot.line('date', 'score', line_color = d['color'].unique()[0], line_width = 3, legend = d['name'].unique()[0], source = d))
    
    bp.show(plot)
    

    or using multi_line:

    import pandas as pd
    import datetime
    import bokeh.plotting as bp
    from bokeh.palettes import Category10
    from bokeh.models import ColumnDataSource
    
    test_data = {'name': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
                 'date': [datetime.date(2010, 1, 1),
                          datetime.date(2010, 2, 1),
                          datetime.date(2010, 3, 1),
                          datetime.date(2010, 1, 1),
                          datetime.date(2010, 2, 1),
                          datetime.date(2010, 3, 1),
                          datetime.date(2010, 1, 1),
                          datetime.date(2010, 2, 1),
                          datetime.date(2010, 3, 1), ],
                 'score': [100, 200, 300, 150, 400, 600, 150, 250, 400]}
    
    plot_df = pd.DataFrame(test_data)
    gby = plot_df.groupby('name')
    
    plot = bp.figure(x_axis_type = 'datetime')
    
    x = [list(sdf['date']) for i, sdf in gby]
    y = [list(sdf['score']) for i, sdf in gby]
    source = ColumnDataSource(dict( x = x, 
                                    y = y, 
                                    legend = plot_df['name'].unique(), 
                                    color = Category10[len(plot_df['name'].unique())]))
    plot.multi_line('x', 'y', legend = 'legend', line_color = 'color', line_width = 3, source = source)
    bp.show(plot)
    

    Result (both options):

    enter image description here