Search code examples
pythonpandasbokeh

python: dataframe into dictionary


I am trying to replicate bokeh bar chart with nested categories as presented here:

https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html

My staring point is the below dataframe

test__df =  pd.DataFrame(data= [['2019-01-01','A',1],
                                ['2019-01-01','B',2],
                                ['2019-01-01','C',3],
                                ['2019-01-02','A',4],
                                ['2019-01-02','B',5],
                                ['2019-01-02','C',6],
                                ['2019-01-03','A',7],
                                ['2019-01-03','B',8],
                                ['2019-01-03','C',9]],
                       columns =['Date','Category','Count'])

I want to transform the data into dictionary as presented below, but have difficulties with the conversion.

Category = ['A', 'B', 'C']

Data = {'Category' : Category,
        '2019-01-01'   : [1,2,3],
        '2019-01-02'   : [4,5,6],
        '2019-01-03'   : [7,8,9]}

I tried to use function .to_dict in different configurations (‘dict’, ‘list’,’series’,’split’,’records’,’index’) but no one was giving me the desired output.

My question:

How can be done the transformation from the dataframe into the dictionary?

Maybe it is not optimal way of creating that type of dictionary for the chart when starting with the dataframe, so how can this be better handled in that case?


Solution

  • You are missing the important bit from that example, which is that what you need to construct is a list of coordinates, which in this case is a list of (year, category) tuples, and a list of corresponding counts. These can be gotten with df.groupby in various ways, here is one:

    In [26]: g = df.groupby(by=['Date', 'Category'])
    
    In [27]: coords = list(g.groups.keys())
    
    In [28]: counts = [float(g.get_group(x).Count) for x in coords]
    
    In [29]: coords
    Out[29]:
    [('2019-01-01', 'A'),
     ('2019-01-01', 'B'),
     ('2019-01-01', 'C'),
     ('2019-01-02', 'A'),
     ('2019-01-02', 'B'),
     ('2019-01-02', 'C'),
     ('2019-01-03', 'A'),
     ('2019-01-03', 'B'),
     ('2019-01-03', 'C')]
    
    In [30]: counts
    Out[30]: [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
    

    Then used with Bokeh with this code:

    source = ColumnDataSource(data=dict(coords=coords, counts=counts))
    
    p = figure(x_range=FactorRange(*coords), plot_height=250, toolbar_location=None, tools="")
    
    p.vbar(x='coords', top='counts', width=0.9, source=source)
    
    p.y_range.start = 0
    p.x_range.range_padding = 0.1
    p.xaxis.major_label_orientation = 1
    p.xgrid.grid_line_color = None
    
    show(p)
    

    Results in the following Bokeh plot:

    enter image description here