Search code examples
pythonjsonazure-blob-storageazure-storage

Reading multiple json files recursively from Azure storage in Python


I m using the below code to read multiple json files from azure storage.

from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import json
import json
import pandas as pd
from pandas import DataFrame
from datetime import datetime
import uuid
connect_str = ""
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_name="raw"
container_client=blob_service_client.get_container_client(container_name)
blob_list = container_client.list_blobs(name_starts_with="path")

for blob in blob_list:
    #print("\t" + blob.name)
    blob_client = container_client.get_blob_client(blob)
    streamdownloader = blob_client.download_blob()
    fileReader = json.loads(streamdownloader.readall())
    df = pd.DataFrame(fileReader)
    print(df.to_string())

Below is the output.

class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Date                  2 non-null      object 
 1   id                    2 non-null      float64

memory usage: 256.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Date                 2 non-null      object 
 1   id                   2 non-null      float64

dtypes: float64(4), int64(1), object(3)
memory usage: 256.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Date                 2 non-null      object 
 1   id                   2 non-null      float64
 

Below is the sample data.

      Date       id 
0  15-Jun-2022  200.0  
1  15-Jun-2022  160.0  

     Date        id
0  15-Jun-2022  200.0  
1  15-Jun-2022  160.0  

     Date        id
0  16-Jun-2022  200.0  
1  16-Jun-2022  160.0  

I am unable to filter the final dataframe df, it gives me empty dataset. I think the issue is with the final dataframe structure. How can concatenate the rows in the final df.

Thank you.


Solution

  • I tried to reproduce in my environment as your code is working fine

    enter image description here

    Please find the below snap of the output after executed the above code successfully. Kindly check the data retrieved from the Json files stored in the storage account.

    The Json files are stored in a particular format like (rows & columns) in the storage account basis which we are getting the data in correct alignment.

    enter image description here

    Please find the below formatting of Json files in the storage account and modify as per your requirement.

    enter image description here

    enter image description here

    For your Reference:

    All the Ways to Filter Pandas Dataframes • datagy