Search code examples
pythonpysparkdatabricksazure-databricks

Check if nested json is empty or not python


I have json files in below format, some are empty and some have records:

{"result": []}

or

{"result": [{"approval": "Approved", "opened_at": "02/06/2023 18:49:18", "state": "Work in Progress", "sys_id": "73a6167b1b83a51"}, {"approval": "Approved", "opened_at": "07/11/2023 13:19:38", "state": "Open", "sys_id": "4bfd299371d4b41"}, .......]}

If the json has 'sys_id' column, then it should print sys_id available, else print empty json

I tried this, but each time I only get output as empty list, even if the sys_id is avaiable.

import json
import pandas as pd
from pandas.io.json import json_normalize
spark.conf.set("spark.sql.legacy.json.allowEmptyString.enabled", True)
file "/dbfs/mnt/landing/2023/sample.json"
    with open(file) as f:
        data = json.load(f)
        res = pd.json_normalize(data)
        if 'sys_id' in data:
            print("sys_id available")
        else:
            print("empty list")

Solution

  • It seems like the issue in your code is that you're checking if the string 'sys_id' is in the JSON data, which is not what you want. Instead, you should check if the key 'sys_id' is present in the JSON object

    import json
    import pandas as pd
    
    with open('/dbfs/mnt/landing/2023/sample.json') as f:
        data = json.load(f)
    
    # Check if 'result' is a non-empty list
    if 'result' in data and isinstance(data['result'], list) and len(data['result']) > 0:
        for item in data['result']:
            if 'sys_id' in item:
                print(f"sys_id available: {item['sys_id']}")
    else:
        print("empty list")