My question is a continuation from my previous question.
I am trying to create a networkx flow diagram from a pandas dataframe. The dataframe records how an order flows through multiple firms. Most of the rows in the dataframe are connected and the connections are manifested in multiple columns. Sample data is as below:
df = pd.DataFrame({'Company': ['A', 'A', 'B', 'B', 'B', 'C', 'C'],
'event_type':['new', 'route', 'receive', 'execute', 'route', 'receive', 'execute'],
'event_id': ['110', '120', '200', '210', '220', '300', '310'],
'prior_event_id': [np.nan, '110', np.nan, '120', '210', np.nan, '300'],
'route_id': [np.nan, 'foo', 'foo', np.nan, 'bar', 'bar', np.nan]}
)
The dataframe looks like below:
Company event_type event_id prior_event_id route_id
0 A new 110 NaN NaN
1 A route 120 110 foo
2 B receive 200 NaN foo
3 B execute 210 120 NaN
4 B route 220 210 bar
5 C receive 300 NaN bar
6 C execute 310 300 NaN
I was able to create the source and target columns from the sample data using the code below:
df['event_sub'] = df.groupby([df.Company, df.event_type]).cumcount()+1
df['event'] = df.Company + ' ' + df.event_type + ' ' + df.event_sub.astype(str)
replace_dict_event = dict(df[['event_id', 'event']].values)
df['source'] = df['prior_event_id'].apply(lambda x: replace_dict_event.get(x) if replace_dict_event.get(x) else np.nan )
df['target'] = df['event_id'].apply(lambda x: replace_dict_event.get(x) if replace_dict_event.get(x) else np.nan )
replace_dict_rtd = dict(df[df.event_type == 'route'][['route_id', 'event']].values)
df.loc[df.event_type == 'receive', 'source'] = df[df.event_type == 'receive']['route_id'].apply(lambda x: replace_dict_rtd.get(x))
Now the dataframe looks like this:
The slight difference between the result above and the result in my previous question is that I incorporated the company name in the current result. And the networkx graph I created from the source
and target
columns looks like below:
However, the problem I am facing is that in my actual data, the company names are longer and there are more nodes. Therefore quite often the labels are all squeezed together and basically become unintelligible. The first solution that came to my mind is to break the labels into multiple lines. My desired node looks like this:
What I tried is to add `\n' in the pertinent columns, so I changed the 2nd line of my last code block to
df['event'] = df.Company + '\n' + df.event_type + ' ' + df.event_sub.astype(str)
But this didn't give me what I want. Instead I got "KeyError: 'Node A\nnew 1 not in graph.'" I tried some other methods I found on SO but no luck either.
Is there any way to achieve this?
# dummy data
a = np.random.randint(0,2,size=(10,10))
G = nx.from_numpy_matrix(a)
pos = nx.spring_layout(G)
# draw without labels, then draw labels separately
nx.draw_networkx(G, pos=pos, with_labels=False)
# draw_networkx_labels takes as keyword argument a dictionary called labels
# which links the id of a node to a name.
# you can create one using dictionary comprehension like so:
nodenames = {n:'firstline \n secondline \n thirdline' for n in G.nodes()}
# and then draw:
nx.draw_networkx_labels(G, pos=pos, labels=nodenames)