Search code examples
pythonbokehscatter-plotfrequencytfidfvectorizer

Bokeh graphs y_range coordinate off by half a coordinate


I am displaying the frequency of nouns in a dataframe using bokeh charts. The data consists of companies and their patents from which I extracted the nouns.

When I display the frequencies using a y_range of (0,10) the data is displayed perfectly. When I use the list of companies, the data is offset by half a y_range coordinate.

scatter = figure(plot_width=800, plot_height=200, 
             x_range =  max_words, 
             y_range = companies,                
             tools = tools
            )

enter image description here

compared to

scatter = figure(plot_width=800, plot_height=200, 
             x_range =  max_words, 
             y_range = (0,10),                
             tools = tools
            )

enter image description here

any suggestions of how this issue can be resolved?


Solution

  • If you are providing a list of categorical factors e.g. y_range=companies then the actual coordinate values in the data also need to be the same (string) categorical factors, not numbers.

    There is an underlying synthetic coordinate system for categorical ranges, which is why passing numbers "works" in any sense at all. But doing this is not the intended usage, and there is no guarantee that the mapping from categorical factors to (internal) synthetic numeric coordinates won't change at any time (i.e. it should not be relied upon).

    See the User's Guide chapter Handling Categorical Data for more information and many examples.

    Alternatively, if you really want to keep numerical y-coordinates, you could use a FuncTickFormatter that converts the integer coordinates into company names to display, in order to "fake" a categorical y-axis.