Consider the below dataframe:
id1 id2
0 aaa 111
1 bbb 222
2 333 ccc
3 999 zzz
4 ccc 111
5 888 zzz
6 zzz 222
7 ddd 888
8 eee 888
How can I recursively get a dataframe for every match of all the children and all of their grandchildren of a given input, in my case, input = [111, 222]
i.e
Parent1: 111
Child1: aaa
Child2: ccc (from row 4)
Child of Child2: 333 (from row 2)
Parent2: 222
Child1: bbb
Child2: zzz (from row 6)
ChildA of Child2: 888 (from row 5)
ChildB of Child2: 999 (from row 3)
Child_i of ChildA: ddd(from row 8)
Child_ii of ChildA: eee (from row 7)
the expected output for every level (parent->child->child of child) would be:
### for i = 111
# parent level
id1 id2
0 aaa 111
1 ccc 111
# child level
id1 id2
0 333 ccc
### for i = 222
# parent level
id1 id2
0 bbb 222
1 zzz 222
# child level
id1 id2
0 888 zzz
1 999 zzz
# child of child level
id1 id2
0 ddd 888
1 eee 888
I tried:
parents = [111, 222]
while len(parents) != 0:
for i in parents:
children = df[df['id2'].apply(lambda x: i in str(x))][['id1', 'id2']]
print(children) #print dataframe of match
parents = children['id1']
but it doesn't go all the way through, I thought of changing i in lambda to a list comprehension but didn't manage to make it work.
If you only want to print an indented graph, you could use a simple recursive function:
def desc(i, indent=0):
print(' '*indent + i)
for j in df.loc[df['id2'] == i, 'id1']:
desc(j, indent + 2)
for i in ('111', '222'): desc(i)
With the example df, it gives:
111
aaa
ccc
333
222
bbb
zzz
999
888
ddd
eee