Here is the link to my previous question : Pandas Dataframe - running through two columns 'Father' and 'Son' to rebuild end-to-end links step by step
The fact is that what I want to do with my dataframes in the question above would be much easier going through a networkx MultiDiGraph building.
But I already tried to work with networkx, and could not use it because I always have this error when I try to draw my graphs :
df_ = pd.DataFrame({
'key' : ['E', 'E', 'E', 'E', 'K', 'K', 'K', 'K', 'K'],
'father' : ['A', 'D', 'C', 'B', 'F', 'G', 'H', 'I', 'J'],
'son' : ['B', 'E', 'D', 'C', 'G', 'H', 'I', 'J', 'K']
})
df_
G = nx.from_pandas_edgelist(df_, source='father', target='son',
create_using=nx.MultiDiGraph)
nx.draw(G)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
~\Anaconda3\lib\site-packages\networkx\utils\decorators.py in _random_state(func, *args, **kwargs)
395 try:
--> 396 random_state_arg = args[random_state_index]
397 except TypeError as e:
IndexError: tuple index out of range
The above exception was the direct cause of the following exception:
NetworkXError Traceback (most recent call last)
<ipython-input-13-4a39fa27cfaf> in <module>
2 edge_attr=True,
3 create_using=nx.MultiDiGraph)
----> 4 nx.draw(G)
~\Anaconda3\lib\site-packages\networkx\drawing\nx_pylab.py in draw(G, pos, ax, **kwds)
121 kwds["with_labels"] = "labels" in kwds
122
--> 123 draw_networkx(G, pos=pos, ax=ax, **kwds)
124 ax.set_axis_off()
125 plt.draw_if_interactive()
~\Anaconda3\lib\site-packages\networkx\drawing\nx_pylab.py in draw_networkx(G, pos, arrows, with_labels, **kwds)
331
332 if pos is None:
--> 333 pos = nx.drawing.spring_layout(G) # default to spring layout
334
335 draw_networkx_nodes(G, pos, **node_kwds)
~\Anaconda3\lib\site-packages\decorator.py in fun(*args, **kw)
229 if not kwsyntax:
230 args, kw = fix(args, kw, sig)
--> 231 return caller(func, *(extras + args), **kw)
232 fun.__name__ = func.__name__
233 fun.__doc__ = func.__doc__
~\Anaconda3\lib\site-packages\networkx\utils\decorators.py in _random_state(func, *args, **kwargs)
398 raise nx.NetworkXError("random_state_index must be an integer") from e
399 except IndexError as e:
--> 400 raise nx.NetworkXError("random_state_index is incorrect") from e
401
402 # Create a numpy.random.RandomState instance
NetworkXError: random_state_index is incorrect
I want to extract the graph back to a dataframe that would look like this :
df_2 = pd.DataFrame({
'key' : ['E', 'K'],
'step_0' : ['A', 'F'],
'step_1' : ['B', 'G'],
'step_2' : ['C', 'H'],
'step_3' : ['D', 'I'],
'step_4' : ['E', 'J'],
'step_5' : [np.NaN, 'K']
})
df_2
I know I could do it with networkx and it was also advised in comment of the linked question. But I don't understand how to get rid of the nx error.
My environment is last version of anaconda
, with Jupyter Notebook
and Python 3.8.8
, networkx 2.5
, Decorator 5.0.6
, Matplotlib 3.3.4
. I precise it because last versions of decorator
should have this problem fixed according to this question.
It was I think because of Decorator 5.0.6
, so I needed to upgrade anaconda to last version where it contains Decorator 5.1.0
, which doesn't cause this problem anymore.
To complete about the reference to the previous question, I had posted, I had to install Graphviz
and Pygraphviz
because it includes better plotting for the graphs. This was also not easy because it's not compatible at first with Anaconda environment, and the pygraphviz documentation recommands not to use it with Anaconda.
However, I asked a question about Graphviz Anaconda installation of those libraries that causes difficulty to a bunch of people. In this question is linked a tutorial for Graphviz
installation on Anaconda, the question itself shows how to install Pygraphviz
on Anaconda.