Please consider the below dict for example:
d2 = [{'event_id': 't1',
'display_name': 't1',
'form_count': 0,
'repetition_id': None,
'children': [{'event_id': 't_01',
'display_name': 't(1)',
'form_count': 1,
'repetition_id': 't1',
'children': [],
'forms': [{'form_id': 'f1',
'form_repetition_id': '1',
'form_name': 'fff1',
'is_active': True,
'is_submitted': False}]}],
'forms': []},
{'event_id': 't2',
'display_name': 't2',
'form_count': 0,
'repetition_id': None,
'children': [{'event_id': 't_02',
'display_name': 't(2)',
'form_count': 1,
'repetition_id': 't2',
'children': [{'event_id': 't_03',
'display_name': 't(3)',
'form_count': 1,
'repetition_id': 't3',
'children': [],
'forms': [{'form_id': 'f3',
'form_repetition_id': '1',
'form_name': 'fff3',
'is_active': True,
'is_submitted': False}]}],
'forms': [{'form_id': 'f2',
'form_repetition_id': '1',
'form_name': 'fff2',
'is_active': True,
'is_submitted': False}]}],
'forms': []}]
Above d2
is a list of dicts, where children
is a nested dict with same keys as the parent.
Also, children
can have nesting upto multiple levels which is not possible to know upfront. So in short, I don't know how many times to keep exploding it.
Current df:
In [54]: df11 = pd.DataFrame(d2)
In [55]: df11
Out[55]:
event_id display_name form_count repetition_id children forms
0 t1 t1 0 None [{'event_id': 't_01', 'display_name': 't(1)', ... []
1 t2 t2 0 None [{'event_id': 't_02', 'display_name': 't(2)', ... []
I want to flatten it in the below way.
Expected output:
event_id display_name form_count repetition_id children forms
0 t1 t1 0 None {'event_id': 't_01', 'display_name': 't(1)', '... []
1 t2 t2 0 None {'event_id': 't_02', 'display_name': 't(2)', '... []
0 t_01 t(1) 1 t1 [] [{'form_id': 'f1', 'form_repetition_id': '1', ...
1 t_02 t(2) 1 t2 {'event_id': 't_03', 'display_name': 't(3)', ... [{'form_id': 'f2', 'form_repetition_id': '1', ...
0 t_03 t(3) 0 t3 [] [{'form_id': 'f2', 'form_repetition_id': '1'}]
How do I know that how many nested children are there?
My attempt:
In [58]: df12 = df11.explode('children')
In [64]: final = pd.concat([df12, pd.json_normalize(df12.children)])
In [72]: final
Out[72]:
event_id display_name form_count repetition_id children forms
0 t1 t1 0 None {'event_id': 't_01', 'display_name': 't(1)', '... []
1 t2 t2 0 None {'event_id': 't_02', 'display_name': 't(2)', '... []
0 t_01 t(1) 1 t1 [] [{'form_id': 'f1', 'form_repetition_id': '1', ...
1 t_02 t(2) 1 t2 [{'event_id': 't_03', 'display_name': 't(3)', ... [{'form_id': 'f2', 'form_repetition_id': '1', ...
This can be solved with a little bit of recursive programming:
from collections import deque
queue = deque(d2)
d3 = []
while queue:
item = queue.popleft()
d3.append(item)
# Optionally add a parent_event_id. Remove if you don't need it.
queue += [
{**child, "parent_event_id": item["event_id"]}
for child in item.get("children", [])
]
df = pd.DataFrame(d3)