Search code examples
pythonjsonattributes

Python - iterate through each nested JSON entry to store two specific values at the same tree level


So I have a JSON file which looks like this:

data = {
    "path": "/",
    "subPages": [
        {
        "path": "/1",
        "subPages": [
            {
                "path": "/12",
                "subPages": [
                    {
                        "path": "/123",
                        "subPages": [],
                        "url": "123_URL",
                    }
                ],
                "url": "12_URL",
            },
            {
                "path": "/13",
                "subPages": [
                    {
                        "path": "/131",
                        "subPages": [
                            {
                                "path": "/1311",
                                "subPages": [
                                    {
                                        "path": "/13111",
                                        "subPages": [],
                                        "url": "13111_URL",
                                    }
                                ],
                                "url": "1311_URL",
                            }
                        ],
                        "url": "131_URL",
                    }
                ],
                "url": "13_URL",
            }
        ],
        "url": "1_URL",
        }
    ]
}

I want to be able to parse this JSON into a dictionary of key "path" and value "url". Something like getting:

dict = {"/" : "1_URL", "/12" : "12_URL", "/123" : "123_URL", "/13" : "13_URL" }

And so on. This has been a bit difficult to accomplish because I need to access each level of the hierarchy independently to extract intended values and it's a file that may even have another 2 levels in the JSON hierarchy to parse.

The challenge here is because the "subpages" array is defined always before url key. My recursive approach failed because of this:

def json_extract(obj, key):
arr = []

  def extract(obj, arr, key):
      if isinstance(obj, dict):
          for k, v in obj.items():
              if isinstance(v, (dict, list)):
                  extract(v, arr, key)
              elif k == key:
                  arr.append(v)
      elif isinstance(obj, list):
          for item in obj:
              extract(item, arr, key)
      return arr

values = extract(obj, arr, key)
return values

Do you have any idea how I can achieve this? Even just some logic you can point me to Thanks in advance!


Solution

  • Try:

    def get_kv(o):
        if isinstance(o, dict):
            if "path" in o and "url" in o:
                yield o["path"], o["url"]
            for v in o.values():
                yield from get_kv(v)
        elif isinstance(o, list):
            for v in o:
                yield from get_kv(v)
    
    
    print(dict(get_kv(data)))
    

    Prints:

    {
        "/1": "1_URL",
        "/12": "12_URL",
        "/123": "123_URL",
        "/13": "13_URL",
        "/131": "131_URL",
        "/1311": "1311_URL",
        "/13111": "13111_URL",
    }