Search code examples
pythongraphnetworkxcycle-detection

Cycle Detection in a Graph using NetworkX Python


My CSV file has ~2.6 million records of transaction among different people. I am trying to make a graph from this file as: person having unique IDs as nodes and edges representing transaction between two people and want to fetch all possible cycles from the graph. I am trying to use networkx.simple_cycles(graph_name) to fetch all the cycles from this graph but getting this error:

    NetworkXNotImplemented                    Traceback (most recent call 
    last)
    <ipython-input-21-c36e4cd0e220> in <module>()
    ----> 1 nx.simple_cycles(Directed_G)

    <decorator-gen-215> in simple_cycles(G)

    ~\AppData\Local\Continuum\anaconda3\lib\site- 
     packages\networkx\utils\decorators.py in _not_implemented_for(f, *args, 
     **kwargs)
     64         if match:
     65             raise nx.NetworkXNotImplemented('not implemented for %s 
     type'%
     ---> 66                                             ' 
     '.join(graph_types))
     67         else:
     68             return f(*args,**kwargs)

     NetworkXNotImplemented: not implemented for undirected type

My Python code looks like this:

    import pandas as pd
    import time
    import networkx as nx
    import numpy as np
    import matplotlib.pyplot as plt
    import seaborn as sns
    %matplotlib inline

    data=pd.read_csv(path/to/csv/file)
    Directed_G=nx.DiGraph()
    Directed_G=nx.from_pandas_dataframe(data, 'from', 'to')
    nx.simple_cycles(Directed_G)

My data looks something like this:

            from                to  
    0       000028c1f8598db 1a55bc3aab8562f     
    1       00003147f02a255 9c1f54d9859ce12     
    2       00003cdc5ed35a0 472f48d28903b43     
    3       00003cdc5ed35a0 5ab9e7e07978f9d 
    4       00003cdc5ed35a0 693452b7ae2fd0c

Can someone please help me with the error. Can there be some other way to find all the possible cycles from this graph?


Solution

  • When you do this:

    Directed_G=nx.from_pandas_dataframe(data, 'from', 'to')
    

    it creates a graph from the pandas dataframe and assigns that result to the name Directed_G. It doesn't go and check first what sort of graph Directed_G was previously. So it creates a graph using the default type (which is Graph) and the previous graph that was stored in Directed_G is overwritten, lost to the great garbage collector in the sky. Then the command to find the cycles dies because it can't handle an undirected graph.

    Add the optional argument create_using=DiGraph to your call to from_pandas_dataframe.

    You should be aware that in the latest version of networkx from_pandas_dataframe has been removed: https://networkx.github.io/documentation/networkx-2.0/release/release_2.0.html

    Previously, the function from_pandas_dataframe assumed that the dataframe has edge-list like structures, but to_pandas_dataframe generates an adjacency matrix. We now provide four functions from_pandas_edgelist, to_pandas_edgelist, from_pandas_adjacency, and to_pandas_adjacency.