So I have a JSON file which looks like this:
data = {
"path": "/",
"subPages": [
{
"path": "/1",
"subPages": [
{
"path": "/12",
"subPages": [
{
"path": "/123",
"subPages": [],
"url": "123_URL",
}
],
"url": "12_URL",
},
{
"path": "/13",
"subPages": [
{
"path": "/131",
"subPages": [
{
"path": "/1311",
"subPages": [
{
"path": "/13111",
"subPages": [],
"url": "13111_URL",
}
],
"url": "1311_URL",
}
],
"url": "131_URL",
}
],
"url": "13_URL",
}
],
"url": "1_URL",
}
]
}
I want to be able to parse this JSON into a dictionary of key "path" and value "url". Something like getting:
dict = {"/" : "1_URL", "/12" : "12_URL", "/123" : "123_URL", "/13" : "13_URL" }
And so on. This has been a bit difficult to accomplish because I need to access each level of the hierarchy independently to extract intended values and it's a file that may even have another 2 levels in the JSON hierarchy to parse.
The challenge here is because the "subpages" array is defined always before url key. My recursive approach failed because of this:
def json_extract(obj, key):
arr = []
def extract(obj, arr, key):
if isinstance(obj, dict):
for k, v in obj.items():
if isinstance(v, (dict, list)):
extract(v, arr, key)
elif k == key:
arr.append(v)
elif isinstance(obj, list):
for item in obj:
extract(item, arr, key)
return arr
values = extract(obj, arr, key)
return values
Do you have any idea how I can achieve this? Even just some logic you can point me to Thanks in advance!
Try:
def get_kv(o):
if isinstance(o, dict):
if "path" in o and "url" in o:
yield o["path"], o["url"]
for v in o.values():
yield from get_kv(v)
elif isinstance(o, list):
for v in o:
yield from get_kv(v)
print(dict(get_kv(data)))
Prints:
{
"/1": "1_URL",
"/12": "12_URL",
"/123": "123_URL",
"/13": "13_URL",
"/131": "131_URL",
"/1311": "1311_URL",
"/13111": "13111_URL",
}