I have a json dataset where each item/index can contain 2 nested dictionaries. The problem is that one of these nested dictionaries contains all of the exact key:value pairs as its parent dictionary. To put it in other words, I have a parent "Account" and any time there are "Sub-Accounts" it places the Sub-Accounts in the nested dictionary, and they are never seen as their own standalone item/index.
Here is the sample json of one item/index. Essentially, I need the sub_accounts object to extracted and become its own index. As you can see, it contains all of the same key:value objects as the parent containins the sub_accounts.
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "235",
"kind": "Real",
"name": "Checking",
"sub_accounts": [
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "236",
"kind": "Real",
"name": "Cash Reserve",
"sub_accounts": []
}
]
},
I have been able to use json_normalize or even variations of .pop() to accomplish a flattening of data and I have tried to explore other flattening options, but with no luck on the specific task I am trying to accomplish. Those solutions usually just result with the subaccounts still be associated to the original index.
I don't have a generic answer, but this seems to do what you need:
raw_data = """
[
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "235",
"kind": "Real",
"name": "Checking",
"sub_accounts": [
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "236",
"kind": "Real",
"name": "Cash Reserve",
"sub_accounts": []
}
]
}
]
"""
import json
jdict = json.loads(raw_data)
empty_list = list()
result = list()
for elem in jdict:
sub_elem_list = elem['sub_accounts']
elem['sub_accounts'] = empty_list
result.append(elem)
for sub_elem in sub_elem_list:
result.append(sub_elem)
print(json.dumps(result, indent=4))
output = """
[
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "235",
"kind": "Real",
"name": "Checking",
"sub_accounts": []
},
{
"classification": [
{
"classificationId": "Cash",
"taxonomyId": "accounting.gp"
}
],
"id": "236",
"kind": "Real",
"name": "Cash Reserve",
"sub_accounts": []
}
]
"""
When you have nested structures you need to nest your loops. The other answer has recursion, which can cause problems if you're nesting over a thousand recursive calls (so probably not this case). I also assumed that you care about order, preferring the parent's id to be first. Also, if you're trying to get rid of the sub_accounts
from the json, then you'd want to pop it from the records, but I again assume that the structure should be maintained.