Search code examples
pythonjsondatabricksazure-data-lake

Save json to a file in Azure Data Lake Storage Gen 2


In Databricks, using Python, I am making a get request, using the requests library and the response is a json.

Here is an example of the get request:

json_data = requests.get("https://prod-noblehire-api-000001.appspot.com/job?").json()

I would like to save the json_data variable as a file in Azure Data Lake Storage. I don't want to read it into Pyspark/Pandas DataFrame first and then save.

If I was saving it to a local folder on my computer, I would have used the following code:

j = json.dumps(json_data)
with open("MyJsonFile.json", "w") as f:
    f.write(j)
    f.close()

However, since I would like to save it in Azure Data Lake Storage, I should be using the following, according to Microsoft's documentation:

def upload_file_to_directory():
    try:

        file_system_client = service_client.get_file_system_client(file_system="my-file-system")

        directory_client = file_system_client.get_directory_client("my-directory")
        
        file_client = directory_client.create_file("uploaded-file.txt")
        local_file = open("C:\\file-to-upload.txt",'r')

        file_contents = local_file.read()

        file_client.append_data(data=file_contents, offset=0, length=len(file_contents))

        file_client.flush_data(len(file_contents))

    except Exception as e:
      print(e)

How can I combine both pieces of code to save the variable as a file in ADLS? Also, is there a better way to do that?


Solution

  • You don't really have to save locally. Rather you can mount your ADLS storage account and then write the desired JSON content to it. Below is the code that worked for me.

    import requests
    import json
    
    json_data = requests.get("<YOUR_URL>").json()
    j = json.dumps(json_data)
    with open("/<YOUR_MOUNT_POINT>/<FILE_NAME>.json", "w") as f:
        f.write(j)
        f.close()
    

    enter image description here

    enter image description here