Search code examples
pythonpython-3.xplotlysankey-diagram

How can we format numbers in a Sankey chart and set labels outside of the chart?


I've got some simple code that produces a nice Sankey chart.

import holoviews as hv
import plotly.graph_objects as go
import plotly.express as pex
hv.extension('bokeh')


sankey1 = hv.Sankey(df_final, kdims=['Sub_Market', 'Sport League'], vdims=["Revenue"])
hv.Sankey(sankey1)

sankey1.opts(cmap='Colorblind',label_position='right',
                                 edge_color='Sub_Market', edge_line_width=0,
                                 node_alpha=1.0, node_width=40, node_sort=True,
                                 width=800, height=600, bgcolor="snow",
                                 title="Flow of Revenue between Sub Market and Conference")

enter image description here

Unfortunately, the numbers are coming through as exponential. I really want to get them displayed in millions. Also, is there a way to get the labels on the right displayed on the right and at the same time, get the labels on the left displayed on the left, so they are all outside the chart and easier to read?


Solution

  • First, holoview allows the configuration of custom formatters for dimensions.

    To render the numbers as-is, you can use str function as a formatter for the dimension.

    I have used a sample dataframe to show an example of how this can be achieved. You can run it in this runnable collab notebook.

    import holoviews as hv
    from holoviews.core import Store
    import pandas as pd
    
    hv.ipython.notebook_extension('bokeh')
    
    Store.set_current_backend('bokeh')
    renderer = Store.renderers['bokeh']
    
    df_final = pd.DataFrame({
        'Sub_Market': ['Central texas', 'Southern California', 'Florida'],
        'Sport League': ['MLS', 'NBA', 'MLS'],
        'Revenue': [1.4981211 * 10**5, 2.921212* 10**6, 1.2121112*10**6]
    })
    
    graph = hv.Sankey(
        df_final, 
        kdims=['Sub_Market', 'Sport League'],
        vdims=[hv.Dimension("Revenue", value_format=str)],
    )
    

    Now to customise the position of the labels, you need the rendered plot.

    Here we are using bokeh as a backend and can get the plot by forwarding the graph object as an argument to the get_plot method of the bokeh renderer.

    renderer = Store.renderers['bokeh']
    plot = renderer.get_plot(graph)
    

    Now, we can access the plot handles that we wish to customize. The default x_offset value applied on all labels is 0. We only need to apply offsets on the left labels.

    To do so we augment the datasource for the labels to include a 'x_offset' field and set the offset for the labels that we wish to position in the left side of the quads.

    Also, we need to set the starting point of the plot.xrange so that the plot is not cutoff.

    offset = -200
    num_nodes = len(plot.handles['text_1_source'].data['x'])
    plot.handles['text_1_source'].data['x_offset'] = [0]* num_nodes
    num_left_nodes = 3
    left_nodes_selection = slice(0, num_left_nodes)
    plot.handles['text_1_source'].data['x_offset'][left_nodes_selection] = [offset]* num_left_nodes
    plot.handles['text_1_glyph'].x_offset = {'field': 'x_offset' }
    plot.handles['plot'].x_range.start += (2*offset)
    

    Finally, we can render the plot to an SVG component and display it in the notebook.

    hv.ipython.notebook_extension('bokeh')
    data, metadata = hv.ipython.display_hooks.render(plot, fmt='svg')
    hv.ipython.display(hv.ipython.HTML(data["text/html"]))
    

    sankey plot with customised label positions