I m using the below code to read multiple json files from azure storage.
from azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
import json
import json
import pandas as pd
from pandas import DataFrame
from datetime import datetime
import uuid
connect_str = ""
blob_service_client = BlobServiceClient.from_connection_string(connect_str)
container_name="raw"
container_client=blob_service_client.get_container_client(container_name)
blob_list = container_client.list_blobs(name_starts_with="path")
for blob in blob_list:
#print("\t" + blob.name)
blob_client = container_client.get_blob_client(blob)
streamdownloader = blob_client.download_blob()
fileReader = json.loads(streamdownloader.readall())
df = pd.DataFrame(fileReader)
print(df.to_string())
Below is the output.
class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 2 non-null object
1 id 2 non-null float64
memory usage: 256.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 2 non-null object
1 id 2 non-null float64
dtypes: float64(4), int64(1), object(3)
memory usage: 256.0+ bytes
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 2 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date 2 non-null object
1 id 2 non-null float64
Below is the sample data.
Date id
0 15-Jun-2022 200.0
1 15-Jun-2022 160.0
Date id
0 15-Jun-2022 200.0
1 15-Jun-2022 160.0
Date id
0 16-Jun-2022 200.0
1 16-Jun-2022 160.0
I am unable to filter the final dataframe df, it gives me empty dataset. I think the issue is with the final dataframe structure. How can concatenate the rows in the final df.
Thank you.
I tried to reproduce in my environment as your code is working fine
Please find the below snap of the output after executed the above code successfully. Kindly check the data retrieved from the Json files stored in the storage account.
The Json files are stored in a particular format like (rows & columns) in the storage account basis which we are getting the data in correct alignment.
Please find the below formatting of Json files in the storage account and modify as per your requirement.
For your Reference: