Search code examples
pythonpandasdataframerecursionhierarchy

How to use recursion to record all routes in a parent child hierarchy?


I am trying to go through a hierarchy dataframe and record every possible routes into another dataframe. These routes can have variable depth.

Original dataframe (df). The highest column means that the value in the parent column is not a child of any:

parent child highest
a b 1
b c 0
b d 0
d e 0

End goal dataframe:

level 3 level 2 level 1 level 0
a b c
a b d e

This what I currently have

def search(parent):
    for i in range(df.shape[0]):
        if(df.iloc[i,0] == parent):
            search(df.iloc[i,1])

for i in range(df.shape[0]):
    if(df.iloc[i,2] == 1):
        search(df.iloc[i,0])

I am able to go through the hierarchy but I do not know how to save it in the format I want.


Solution

  • You can use networkx to solve the problem. Note if you use networkx you don't need the highest columns. The main function to find all paths is all_simple_paths

    # Python env: pip install networkx
    # Anaconda env: conda install networkx
    import networkx as nx
    
    # Create network from your dataframe
    #G = nx.from_pandas_edgelist(df, source='parent', target='child',
    #                            create_using=nx.DiGraph)
    
    # For older versions of networkx
    G = nx.DiGraph()
    for _, (source, target) in df[['parent', 'child']].iterrows():
        G.add_edge(source, target)
    
    # Find roots of your graph (a root is a node with no input)
    roots = [node for node, degree in G.in_degree() if degree == 0]
    
    # Find leaves of your graph (a leaf is a node with no output)
    leaves = [node for node, degree in G.out_degree() if degree == 0]
    
    # Find all paths
    paths = []
    for root in roots:
      for leaf in leaves:
        for path in nx.all_simple_paths(G, root, leaf):
            paths.append(path)
    
    # Create a new dataframe
    out = pd.DataFrame(paths).fillna('')
    out.columns = reversed(out.add_prefix('level ').columns)
    

    Output:

    >>> out
      level 3 level 2 level 1 level 0
    0       a       b       c        
    1       a       b       d       e