Search code examples
pandasjupyter-notebookplotlygoogle-colaboratorysankey-diagram

How to organize a very illogical Plotly sankey diagram


My diagram is complex, with 189 nodes. But I expected the layout to be more logical, based on the flow of data from source to target. It is completely circuitous and loopy.

These are trades flowing through an algo trading system, so there are entries, targets, and stops. I expected there to be a natural layout with entries on the left, targets in the middle, and final exits on the right. To a certain extent, the entries and stops happened that way, but the middle is a mess. The actual data flows more logically than the way this diagram depicts.

For example, there are flows from ratchet bracket 1 to ratchet bracket 2 to ratchet bracket 3, so I assumed that these would appear in order from left to right, but they do not. The same with grab cents 1, 2, and 3.

Is there any way to coax Plotly into rendering this in a more logical, flow-based way? Or can I somehow influence the placement by ordering the lists that feed the diagram? Is there another Sankey implementation that might handle this better? I am doing this in Google Colab. enter image description here

I've also found that dragging things into order does not make the render look any better. Instead of flowing direction from ratchet bracket 3 to 4, the connection continues to go the wrong direction, joined with a disjointed off-screen connection. It is impossible to follow.

enter image description here

Update following answer

I followed the advice in Serge's answer, which helped, but did not solve all the problems. Why, for example, are there so many nodes at the edges, why are there still "retrograde" paths when nodes are introduced in their order of appearance, and why aren't the major nodes spaced more evenly?

Is there anything else I can do to make this diagram come out more user-friendly? The white space is obviously not well-utilized. I tried experimenting with the dimensions, but it didn't seem to help in that regard.

enter image description here


Solution

  • You need to order the nodes in a meaningful manner. For example, place entry nodes first, then intermediate nodes, and finally exit nodes. Layer the nodes and if possible adjust the links. Here is an example.

    import pandas as pd
    
    data = {
        'source': ['entry1', 'entry2', 'entry3', 'entry4', 'entry5',
                   'ratchet_bracket1', 'ratchet_bracket1', 'ratchet_bracket2', 'ratchet_bracket3', 'ratchet_bracket4',
                   'grab_cents1', 'grab_cents2', 'grab_cents3', 'grab_cents4', 'grab_cents5',
                   'target1', 'target2', 'target3', 'target4', 'target5'],
        'target': ['ratchet_bracket1', 'ratchet_bracket2', 'ratchet_bracket3', 'grab_cents1', 'grab_cents2',
                   'ratchet_bracket2', 'target1', 'ratchet_bracket3', 'target2', 'target3',
                   'grab_cents2', 'grab_cents3', 'target4', 'target5', 'ratchet_bracket4',
                   'final_exit1', 'final_exit2', 'final_exit3', 'final_exit4', 'final_exit5'],
        'value': [10, 15, 10, 12, 14, 7, 8, 9, 5, 6, 10, 15, 11, 13, 12, 20, 18, 22, 24, 19]
    }
    
    df = pd.DataFrame(data)
    print(df)
    
    import plotly.graph_objects as go
    
    all_nodes = list(pd.unique(df[['source', 'target']].values.ravel('K')))
    mapping = {node: idx for idx, node in enumerate(all_nodes)}
    
    df['source_idx'] = df['source'].map(mapping)
    df['target_idx'] = df['target'].map(mapping)
    
    fig = go.Figure(data=[go.Sankey(
        node=dict(
            pad=15,
            thickness=20,
            line=dict(color="black", width=0.5),
            label=all_nodes,
        ),
        link=dict(
            source=df['source_idx'],  
            target=df['target_idx'],
            value=df['value']
        )
    )])
    
    fig.update_layout(title_text="Sankey Diagram of Algo Trading System", font_size=10)
    fig.show()
    
    

    which gives

    enter image description here