I am trying to replicate bokeh bar chart with nested categories as presented here:
https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html
My staring point is the below dataframe
test__df = pd.DataFrame(data= [['2019-01-01','A',1],
['2019-01-01','B',2],
['2019-01-01','C',3],
['2019-01-02','A',4],
['2019-01-02','B',5],
['2019-01-02','C',6],
['2019-01-03','A',7],
['2019-01-03','B',8],
['2019-01-03','C',9]],
columns =['Date','Category','Count'])
I want to transform the data into dictionary as presented below, but have difficulties with the conversion.
Category = ['A', 'B', 'C']
Data = {'Category' : Category,
'2019-01-01' : [1,2,3],
'2019-01-02' : [4,5,6],
'2019-01-03' : [7,8,9]}
I tried to use function .to_dict in different configurations (‘dict’, ‘list’,’series’,’split’,’records’,’index’) but no one was giving me the desired output.
My question:
How can be done the transformation from the dataframe into the dictionary?
Maybe it is not optimal way of creating that type of dictionary for the chart when starting with the dataframe, so how can this be better handled in that case?
You are missing the important bit from that example, which is that what you need to construct is a list of coordinates, which in this case is a list of (year, category) tuples, and a list of corresponding counts. These can be gotten with df.groupby
in various ways, here is one:
In [26]: g = df.groupby(by=['Date', 'Category'])
In [27]: coords = list(g.groups.keys())
In [28]: counts = [float(g.get_group(x).Count) for x in coords]
In [29]: coords
Out[29]:
[('2019-01-01', 'A'),
('2019-01-01', 'B'),
('2019-01-01', 'C'),
('2019-01-02', 'A'),
('2019-01-02', 'B'),
('2019-01-02', 'C'),
('2019-01-03', 'A'),
('2019-01-03', 'B'),
('2019-01-03', 'C')]
In [30]: counts
Out[30]: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
Then used with Bokeh with this code:
source = ColumnDataSource(data=dict(coords=coords, counts=counts))
p = figure(x_range=FactorRange(*coords), plot_height=250, toolbar_location=None, tools="")
p.vbar(x='coords', top='counts', width=0.9, source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xaxis.major_label_orientation = 1
p.xgrid.grid_line_color = None
show(p)
Results in the following Bokeh plot: