Search code examples
pythonjsondictionaryindexingflatten

Python - How can I move a nested json dictionary up to its own index?


I have a json dataset where each item/index can contain 2 nested dictionaries. The problem is that one of these nested dictionaries contains all of the exact key:value pairs as its parent dictionary. To put it in other words, I have a parent "Account" and any time there are "Sub-Accounts" it places the Sub-Accounts in the nested dictionary, and they are never seen as their own standalone item/index.

Here is the sample json of one item/index. Essentially, I need the sub_accounts object to extracted and become its own index. As you can see, it contains all of the same key:value objects as the parent containins the sub_accounts.

    {
        "classification": [
            {
                "classificationId": "Cash",
                "taxonomyId": "accounting.gp"
            }
        ],
        "id": "235",
        "kind": "Real",
        "name": "Checking",
        "sub_accounts": [
            {
                "classification": [
                    {
                        "classificationId": "Cash",
                        "taxonomyId": "accounting.gp"
                    }
                ],
                "id": "236",
                "kind": "Real",
                "name": "Cash Reserve",
                "sub_accounts": []
            }
        ]
    },

I have been able to use json_normalize or even variations of .pop() to accomplish a flattening of data and I have tried to explore other flattening options, but with no luck on the specific task I am trying to accomplish. Those solutions usually just result with the subaccounts still be associated to the original index.


Solution

  • I don't have a generic answer, but this seems to do what you need:

    raw_data = """                                                                                                                                                                                 
    [                                                                                                                                                                                              
        {                                                                                                                                                                                          
            "classification": [                                                                                                                                                                    
                {                                                                                                                                                                                  
                    "classificationId": "Cash",                                                                                                                                                    
                    "taxonomyId": "accounting.gp"                                                                                                                                                  
                }                                                                                                                                                                                  
            ],                                                                                                                                                                                     
            "id": "235",                                                                                                                                                                           
            "kind": "Real",                                                                                                                                                                        
            "name": "Checking",                                                                                                                                                                    
            "sub_accounts": [                                                                                                                                                                      
                {                                                                                                                                                                                  
                    "classification": [                                                                                                                                                            
                        {                                                                                                                                                                          
                            "classificationId": "Cash",                                                                                                                                            
                            "taxonomyId": "accounting.gp"                                                                                                                                          
                        }                                                                                                                                                                          
                    ],                                                                                                                                                                             
                    "id": "236",                                                                                                                                                                   
                    "kind": "Real",                                                                                                                                                                
                    "name": "Cash Reserve",                                                                                                                                                        
                    "sub_accounts": []                                                                                                                                                             
                }                                                                                                                                                                                  
            ]                                                                                                                                                                                      
        }                                                                                                                                                                                          
    ]                                                                                                                                                                                              
    """                                                                                                                                                                                            
                                                                                                                                                                                                   
    import json                                                                                                                                                                                    
    jdict = json.loads(raw_data)                                                                                                                                                                   
                                                                                                                                                                                                   
    empty_list = list()                                                                                                                                                                            
    result     = list()                                                                                                                                                                            
    for elem in jdict:                                                                                                                                                                             
        sub_elem_list = elem['sub_accounts']                                                                                                                                                       
        elem['sub_accounts'] = empty_list                                                                                                                                                          
        result.append(elem)                                                                                                                                                                        
        for sub_elem in sub_elem_list:                                                                                                                                                             
            result.append(sub_elem)                                                                                                                                                                
                                                                                                                                                                                                   
    print(json.dumps(result, indent=4)) 
    
    output = """
    [
        {
            "classification": [
                {
                    "classificationId": "Cash",
                    "taxonomyId": "accounting.gp"
                }
            ],
            "id": "235",
            "kind": "Real",
            "name": "Checking",
            "sub_accounts": []
        },
        {
            "classification": [
                {
                    "classificationId": "Cash",
                    "taxonomyId": "accounting.gp"
                }
            ],
            "id": "236",
            "kind": "Real",
            "name": "Cash Reserve",
            "sub_accounts": []
        }
    ]
    
    """
    

    When you have nested structures you need to nest your loops. The other answer has recursion, which can cause problems if you're nesting over a thousand recursive calls (so probably not this case). I also assumed that you care about order, preferring the parent's id to be first. Also, if you're trying to get rid of the sub_accounts from the json, then you'd want to pop it from the records, but I again assume that the structure should be maintained.