I'm trying to create vbar graph like from bokeh documentation on graph :mean MPG by #cylinders and manufacturer (https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html#userguide-categorical), but keep getting error (BAD_COLUMN_NAME): Glyph refers to nonexistent column name: value".
Here is my csv file:
id,name,value,vendorname
1, contract abc, "2,000,500.00", company x
2, contract bcd, "1,300,500.00", company y
3, contract cde, "1,344,000.00", company x
4, contract def, "400,000.00", company z
5, contract efg, "566,000.00", company s
and so on....
here is the code:
from bokeh.io import show, output_file
from bokeh.palettes import Viridis256
from bokeh.transform import factor_cmap
import pandas as pd
from bokeh.core.properties import value
from bokeh.models import FactorRange, ColumnDataSource
from bokeh.palettes import Spectral5
from bokeh.plotting import figure, show
from bokeh.embed import components
df = pd.read_csv('contract.csv')
group = df.groupby(['vendorname', 'name'])
index_cmap = factor_cmap('vendorname_name', palette=Viridis256, factors=sorted(df.vendorname.unique()), end=1)
p = figure(plot_width=1000, plot_height=1500, title="Value Contract by # Contract and Vendor", x_range=group, toolbar_location=None,
tooltips=[("Value", "@value"), ("vendorname, name", "@vendorname_name")])
p.vbar(x='vendorname_name', top='value', width=1, source=group, line_color="white", fill_color=index_cmap, )
p.y_range.start = 0
p.x_range.range_padding = 0.05
p.xgrid.grid_line_color = None
p.xaxis.axis_label = "Contract grouped by # Vendor"
p.xaxis.major_label_orientation = 1.2
p.outline_line_color = None
output_file("contract.html")
show(p)
When you perform a groupby
on a dataframe, there are only aggregates for the group available:
In [3]: source = ColumnDataSource(group)
In [4]: source.data
Out[4]:
{'id_count': array([1, 1, 1], dtype=object),
'id_unique': array([1, 1, 1], dtype=object),
'id_top': array([' "2', ' "1', ' "1'], dtype=object),
'id_freq': array([1, 1, 1], dtype=object),
'value_count': array([1, 1, 1], dtype=object),
'value_unique': array([1, 1, 1], dtype=object),
'value_top': array(['500.00"', '000.00"', '500.00"'], dtype=object),
'value_freq': array([1, 1, 1], dtype=object),
'vendorname_name': array([(' company x', '000'), (' company x', '344'),
(' company y', '300')], dtype=object)}
But notice these values are junk, because your read_csv
is also not working. In your data, the value
column is an array of strings of poorly formatted numbers. The commas in the numbers confuse pandas with the basic code you have above. You need to follow the advice in https://stackoverflow.com/a/22137890/3406693 to read in that column as actual numbers. However, your CSV is malformed, with extra space around the commas, this also confuses pandas. I was not able to get pandas to read in the dataframe correctly until i fixed up the CSV to remove the spaces:
id,name,value,vendorname
1,contract abc,"2,000,500.00",company x
2,contract bcd,"1,300,500.00",company y
3,contract cde,"1,344,000.00",company x
4,contract def,"400,000.00",company z
5,contract efg,"566,000.00",company s
Then, this read_csv
command
df = pd.read_csv('contract.csv', thousands=",", quotechar='"', quoting=1)
Yields an intelligible dataframe:
In [3]: df
Out[3]:
id name value vendorname
0 1 contract abc 2000500.0 company x
1 2 contract bcd 1300500.0 company y
2 3 contract cde 1344000.0 company x
3 4 contract def 400000.0 company z
4 5 contract efg 566000.0 company s
It's possible you could get around having to fix up the CSV by telling Pandas explicitly what the column types are.
Anyway after all this you can plot using value_mean
from the group:
p.vbar(x='vendorname_name', top='value_meamn', ...) # use value_mean
Which yields: