Search code examples
pythonjsonjupyter-notebooknested-loopshierarchy

Flatten a nested JSON hierarchy in Python


I'll preface this with I am not a developer by any means, but I was thrown this assignment and I am just lost. This is my first time using python and first time coding in 7+ years, and it's not going well.

The JSON I have is an organizational tree, where each level potentially has children underneath it.

I need to write a script in Python in Jupyter Notebook to flatten it into this format, or something similar where each new child is a new row.

 level1 | level2 | level3
 org1
 org1      org2
 org1      org2    org3

Here is the JSON:

[{
    "Id": "f035de7f",
    "Name": "Org1",
    "ParentId": null,
    "Children": [{
        "Id": "8c18a70d",
        "Name": "Org2",
        "ParentId": "f035de7f",
        "Children": []
    }, {
        "Id": "b4514099",
        "Name": "Org3",
        "ParentId": "f035de7f",
        "Children": [{
            "Id": "8abe58d1",
            "Name": "Org4",
            "Children": []
        }]
    }, {
        "Id": "8e35bdc3",
        "Name": "Org5",
        "ParentId": "f035de7f",
        "Children": [{
            "Id": "331fffbf",
            "Name": "Org6",
            "ParentId": "8e35bdc3",
            "Children": [{
                "Id": "3bc3e085",
                "Name": "Org7",
                "ParentId": "331fffbf",
                "Children": []
            }]
        }]
    }]
}]

I've tried a variety of for loops and have scoured the internet for days, but I think I'm missing some very basic knowledge to make this work. I would highly appreciate any help someone can give.

Here are my starters:

for item in orgs_json:
    orgs_json_children = item["Children"]
    orgs_list.append(orgs_json_children)

or

wanted = ['Children', 'Name']

for item in orgs_json[0]:
    details = [X["Name"] for X in orgs_json]
    for key in wanted:
        print(key, ':', json.dumps(details[key], indent=4))
    # Put a blank line at the end of the details for each item
    print()   

Solution

  • You can process the nested structure with a stack:

    • Start with the outermost list, reversed, as the stack, together with an empty tuple for each, to track the organisation path.
    • In while stack: loop, take the top element from the stack. Do what you need to do with that organisation, like recording the name. Produce a row from the organisation path with the current organisation name added.
    • Add all the elements in the Children key to the stack, together with the organisation path of the parent organisation.
    • Loop until the stack is done.

    The reversals are needed because taking elements from a stack gives them in reverse order. You still want to use a stack for this job (as opposed to a queue) because we want to output the information depth-first.

    This would look like this:

    def flatten_orgs(orgs):
        stack = [(o, ()) for o in reversed(orgs)]  # organisation plus path
        while stack:
            org, path = stack.pop()  # top element
            path += (org['Name'],)   # update path, adding the current name
            yield path               # give this path to the caller
            # add all children to the stack, with the current path
            stack += ((o, path) for o in reversed(org['Children']))
    

    You can then loop over the above function to get all the paths:

    >>> for path in flatten_orgs(orgs_json):
    ...     print(*path, sep='\t')
    ...
    Org1
    Org1    Org2
    Org1    Org3
    Org1    Org3    Org4
    Org1    Org5
    Org1    Org5    Org6
    Org1    Org5    Org6    Org7