Search code examples
loopsrecursionnested-loopsdepth-first-searchbreadth-first-search

Find all paths to one ultimate parent based on a dataset showing 1-1 parent child ownership using recursion in Python


I have the following example dataset with 3 columns, child entity, parent entity, ownership percentage of parent in that child:

Child   Parent   Ownership
   A       B         50%
   C       D         10%
   B       C         20%
   E       D         30%
   B       E         40%

Note that D is the ultimate parent, and I want to use python to calculate the ownership percentage of D in each of the child entities. To do this, I need to find all paths that lead to D, and multiply the ownership percentage up the chain and add up the percentages if there are multiple paths that lead to D. See the result I want to reach below:

For A: [A->B->C->D],[A->B->E->D]. D_ownership = 50%*20%*10% + 50%*40%*30%=7%
For B: [B->C->D],[B->E->D]. D_ownership = 20%*10% + 40%*30% = 14%
For C: [C->D]. D_ownership = 10%
For E: [E->D]. D_ownership = 30%

At first I used a nested for loop where I first use a loop to find the parent of each child, I use that parent as the new child and loop through the dataset again to find the next level parent until I reach D. However, I realized this is way too complicated, and did a lot of research and found that this is a recursive problem and maybe I should use breadth first search or depth first search to find all the paths. But the dataset I have is not a graph yet and I'm not sure about the nodes. So I want to ask for inspirations on how to solve this. I think there's a much easier solution than what I had but I can't think of it yet.

Thanks all!


Solution

  • Your dataframe represents a weighted directed acyclic graph (DAG), a fancy name for "something that looks kinda like a tree".

    At first I used a nested for loop where I first use a loop to find the parent of each child, I use that parent as the new child and loop through the dataset again to find the next level parent until I reach D.

    The idea is good, and this will work. However, you can make it a bit more simple and a lot more efficient, if you first convert your data structure to a different data structure representing the same data, but so that finding the child of each parent, or the parent of each child, can be done directly without a loop. In python, such a structure is called a dict. So I suggest building a dict, either mapping each parent to its list of children, or mapping each child to its list of parents. In the code below I mapped each parent to its list of children, but the other way around would have worked too.

    However, I realized this is way too complicated, and did a lot of research and found that this is a recursive problem and maybe I should use breadth first search or depth first search to find all the paths.

    Very good observation. There is no reason to prefer one over the other, you can solve this problem either with depth-first-search or with breadth-first-search. However, depth-first-search is somewhat simpler to implement with a recursive function, so I suggest using depth-first-search.

    import pandas as pd
    from collections import defaultdict
    
    df = pd.DataFrame({'Child': list('ACBEB'), 'Parent': list('BDCDE'), 'Ownership': [.5,.1,.2,.3,.4]})
    
    dag_as_dict = defaultdict(list) # {parent: list of (child, weight)}
    for idx, row in df.iterrows():
        dag_as_dict[row['Parent']].append((row['Child'], row['Ownership']))
    
    print(dag_as_dict)
    # defaultdict(<class 'list'>,
    #             {'B': [('A', 0.5)],
    #              'D': [('C', 0.1), ('E', 0.3)],
    #              'C': [('B', 0.2)],
    #              'E': [('B', 0.4)]})
    
    def get_weights_of_descendants(dag_as_dict, root, root_weight=1.0, result=None):
        if result is None:
            result = defaultdict(float)
        for child, weight in dag_as_dict[root]:
            new_weight = weight * root_weight
            result[child] += new_weight
            get_weights_of_descendants(dag_as_dict, child, new_weight, result) # result is mutable so it will be updated by this recursive call
        return result
    
    weights = get_weights_of_descendants(dag_as_dict, 'D')
    
    print(weights)
    # defaultdict(<class 'float'>, {'C': 0.1, 'B': 0.14, 'A': 0.07, 'E': 0.3})