Search code examples
plotlyaxis-labels

Node labels not appearing in Sankey from Python Plotly


Trying my first sankey flow diagram using Python Plotly. The sankey flow diagram appears perfectly with the right source and target. But I am unable to include the node labels. The labels are inside a dataframe. Without the labels for the various sources and targets the diagram is incomplete. Here are the codes that I used. My aim is to get a static diagram.

{link = dict(source = Service_df, target=Manufac_df, value=Revenue_df)}
{node = dict(label=Label_df, pad=50, thickness=5)}
{data = dict(type = 'sankey', hoverinfo = 'all', link = link, node=node)}
{fig = go.Figure(data)}
{fig.show() }

Later, I tried to hardcord the labels for the source and target. That also didn't work.

When I hover over the diagram, I get the message Source:Undefined Target:Undefined. At the same time, the incoming and outgoing flow count is seen while hovering at Source or Target.

I'm trying this out in Jupyter Notebook Python 3.8.5

The dataframes are as follows:

{MainData.info()}

<class 'pandas.core.frame.DataFrame'> RangeIndex: 31 entries, 0 to 30 Data columns (total 3 columns):

Column Non-Null Count Dtype


0 Service 31 non-null int64
1 Manufacturer 31 non-null int64
2 Revenue 31 non-null float64 dtypes: float64(1), int64(2)

{LabelData.info()}

<class 'pandas.core.frame.DataFrame'> RangeIndex: 10 entries, 0 to 9 Data columns (total 1 columns):

Column Non-Null Count Dtype


0 Label 10 non-null object dtypes: object(1)

The image I get is over here. https://drive.google.com/drive/folders/1QDc-qVyMYSTJNI0coNJf8Ehuq8tlL-mP?usp=sharing


Solution

  • Without seeing your dataframes, I'd assume that your Label_df has a different shape/size than your dataframes in link.

    import pandas as pd
    import plotly.graph_objects as go
    
    # mock data, identical to the plotly documentation https://plotly.com/python/sankey-diagram/
    Label_df = pd.DataFrame(["A1", "A2", "B1", "B2", "C1", "C2"])
    Service_df = pd.DataFrame([0, 1, 0, 2, 3, 3])
    Manufac_df = pd.DataFrame([2, 3, 3, 4, 4, 5])
    Revenue_df = pd.DataFrame([8, 4, 2, 8, 4, 2])
    
    link = dict(source = Service_df[0], target=Manufac_df[0], value=Revenue_df[0])
    # this works
    node = dict(label=Label_df[0], pad=50, thickness=5)
    data = dict(type='sankey', hoverinfo='all', link=link, node=node)
    fig = go.Figure(data)
    fig.show()
    
    # this gives Source:Undefined Target:Undefined
    node = dict(label=list(Label_df[0]), pad=50, thickness=5)
    data = dict(type='sankey', hoverinfo='all', link=link, node=node)
    fig = go.Figure(data)
    fig.show()