Search code examples
pythonazure-blob-storageazure-storage

Reading multiple json files from Azure storage into Python dataframe


I m using below code to read json file from Azure storage into a dataframe in Python.

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import json
import json
import pandas as pd
from pandas import DataFrame
from datetime import datetime
import uuid

filename = "raw/filename.json"

container_name="test"
constr = ""

blob_service_client = BlobServiceClient.from_connection_string(constr)
container_client = blob_service_client.get_container_client(container_name)
blob_client = container_client.get_blob_client(filename)
streamdownloader = blob_client.download_blob()

fileReader = json.loads(streamdownloader.readall())
df = pd.DataFrame(fileReader)
rslt_df = df[df['ID'] == 'f2a8141f-f1c1-42c3-bb57-910052b78110']
rslt_df.head()

This works fine. But I want to read multiple files into a dataframe. Is there any way we can pass a pattern in the file name to read multiple files from Azure storage like below to read the files recursively.

filename = "raw/filename*.json"

Thank you


Solution

  • I tried in my environment which can read multiple json files got result successfully:

    ServiceClient = BlobServiceClient.from_connection_string("< CONNECTION STRING>")
    ContainerClient=ServiceClient.get_container_client("container1")
    BlobList=ContainerClient.list_blobs(name_starts_with="directory1")
    for  blob  in  BlobList:
        print()
        print("The file "+blob.name+" containers:")
        blob_client = ContainerClient.get_blob_client(blob.name)
        downloaderpath = blob_client.download_blob()
        fileReader = json.loads(downloaderpath.readall())
        dataframe = pd.DataFrame(fileReader)
        print(dataframe.to_string())
    

    I uploaded my three json files in my container you can see below: enter image description here Output: enter image description here