Search code examples
pythonpandasdataframelabelnetworkx

How to create multi-line node labels when creating Networkx Directed Graphs from Pandas Dataframe


My question is a continuation from my previous question.

I am trying to create a networkx flow diagram from a pandas dataframe. The dataframe records how an order flows through multiple firms. Most of the rows in the dataframe are connected and the connections are manifested in multiple columns. Sample data is as below:

df = pd.DataFrame({'Company': ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
              'event_type':['new', 'route', 'receive', 'execute', 'route', 'receive', 'execute'],
             'event_id': ['110', '120', '200', '210', '220', '300', '310'],
             'prior_event_id': [np.nan, '110', np.nan, '120', '210', np.nan, '300'],
             'route_id': [np.nan, 'foo', 'foo', np.nan, 'bar', 'bar', np.nan]}
             )

The dataframe looks like below:

  Company event_type event_id prior_event_id route_id
0       A        new      110            NaN      NaN
1       A      route      120            110      foo
2       B    receive      200            NaN      foo
3       B    execute      210            120      NaN
4       B      route      220            210      bar
5       C    receive      300            NaN      bar
6       C    execute      310            300      NaN

I was able to create the source and target columns from the sample data using the code below:

df['event_sub'] = df.groupby([df.Company, df.event_type]).cumcount()+1
df['event'] = df.Company + ' ' + df.event_type + ' ' + df.event_sub.astype(str)  

replace_dict_event = dict(df[['event_id', 'event']].values)
df['source'] = df['prior_event_id'].apply(lambda x: replace_dict_event.get(x) if replace_dict_event.get(x) else np.nan )
df['target'] = df['event_id'].apply(lambda x: replace_dict_event.get(x) if replace_dict_event.get(x) else np.nan )

replace_dict_rtd = dict(df[df.event_type == 'route'][['route_id', 'event']].values)
df.loc[df.event_type == 'receive', 'source'] = df[df.event_type == 'receive']['route_id'].apply(lambda x: replace_dict_rtd.get(x))

Now the dataframe looks like this:

df

The slight difference between the result above and the result in my previous question is that I incorporated the company name in the current result. And the networkx graph I created from the source and target columns looks like below: graph

However, the problem I am facing is that in my actual data, the company names are longer and there are more nodes. Therefore quite often the labels are all squeezed together and basically become unintelligible. The first solution that came to my mind is to break the labels into multiple lines. My desired node looks like this:

node

What I tried is to add `\n' in the pertinent columns, so I changed the 2nd line of my last code block to

df['event'] = df.Company + '\n' + df.event_type + ' ' + df.event_sub.astype(str) 

But this didn't give me what I want. Instead I got "KeyError: 'Node A\nnew 1 not in graph.'" I tried some other methods I found on SO but no luck either.

Is there any way to achieve this?


Solution

  • # dummy data
    a = np.random.randint(0,2,size=(10,10))
    G = nx.from_numpy_matrix(a)
    
    pos = nx.spring_layout(G)
    
    # draw without labels, then draw labels separately
    nx.draw_networkx(G, pos=pos, with_labels=False)
    
    # draw_networkx_labels takes as keyword argument a dictionary called labels
    # which links the id of a node to a name.
    # you can create one using dictionary comprehension like so:
    nodenames = {n:'firstline \n secondline \n thirdline' for n in G.nodes()}
    
    # and then draw:
    nx.draw_networkx_labels(G, pos=pos, labels=nodenames)