I appear to have exhausted the internet searching for what feels like a common occurrence, and I need some help, please.
I'm making an API call using the requests library, which returns one JSON response per call - I'm going to loop through and make multiple calls.
I want to combine all of the responses from the many API calls into one python data structure and then export the results to CSV.
One API response looks like this:
{
"status": "1",
"msg": "Success",
"data": {
"id": "12345",
"PriceDetail": [
{
"item": "Apple",
"amount": "10",
"weight": "225",
"price": "92",
"bestbeforeendeate": "30/09/2023"
}
]
}
}
My final output should be a CSV file with the following headers and data in the subsequent rows:
id | item | amount | weight | price | bestbeforeendeate |
---|---|---|---|---|---|
12345 | apple | 10 | 225 | 92 | 30/09/2023 |
..... | ..... | .. | ... | .. | .......... |
I've looked at combining the responses in a dictionary, named tuple, dataframe and tried the various options to export to from said structures like dictwriter, csvwriter, normalize etc. Still, I'm struggling to make any of it work.
The closest I got was (I saved the results to a JSON file to stop hitting the API):
with open('item.json') as json_file:
data_set = json.load(json_file)
for data in data_set:
if data['msg'] == 'Success':
id = data['data']['id']
return_data[id] = data['data']['PriceDetail']
df = pd.json_normalize(data['data']['PriceDetail'])
print(df)
I couldn't get the id added to the dataframe
Any suggestions would be appreciated.
Thanks,
Pandas has a function called json_normalize
, which can directly convert a dict into a dataframe. In order to convert a JSON string into a dict
you can simply use the json
library.
Good source I found would be this`.
import json
import pandas as pd
# Test string, assuming it is from API
test_string = """{
"status": "1",
"msg": "Success",
"data": {
"id": "12345",
"PriceDetail": [
{
"item": "Apple",
"amount": "10",
"weight": "225",
"price": "92",
"bestbeforeendeate": "30/09/2023"
}
]
}
}"""
# Function converts the api result to the dataframe and appends it to df
def add_new_entry_to_dataframe(df, api_result_string):
input_parsed = json.loads(api_result_string)
df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
df = df.append(df_with_new_data)
return df
# The dataframe you want to store everything
df = pd.DataFrame()
## Loop where you fetch new data
for i in range(10):
newly_fetched_result = test_string
df = add_new_entry_to_dataframe(df, newly_fetched_result)
df = df.reset_index()
# Save as .csv
df.to_csv('output.csv')
print(df)
The output of above code:
item amount weight price bestbeforeendeate
0 Apple 10 225 92 30/09/2023
0 Apple 10 225 92 30/09/2023
0 Apple 10 225 92 30/09/2023
0 Apple 10 225 92 30/09/2023
0 Apple 10 225 92 30/09/2023
0 Apple 10 225 92 30/09/2023
0 Apple 10 225 92 30/09/2023
0 Apple 10 225 92 30/09/2023
0 Apple 10 225 92 30/09/2023
0 Apple 10 225 92 30/09/2023
EDIT: I had another look at the problem and thought I share another solution, which might be better for you. Instead of building a huge dataframe over time, the code below appends the fetched data directly into the CSV file. The advantage is that all data is already in the CSV if the program crashes or if you terminate it.
# Function converts the json string to a dataframe and appends it directly to the CSV file
def add_json_string_to_csv(api_result_string):
input_parsed = json.loads(api_result_string)
df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
df_with_new_data.to_csv('output.csv', mode='a', header=False)
## Loop where you fetch new data
while (True):
newly_fetched_result = test_string
add_json_string_to_csv(newly_fetched_result)