python json pandas python-requests export-to-csv

Flatten JSON response and output to csv

I appear to have exhausted the internet searching for what feels like a common occurrence, and I need some help, please.

I'm making an API call using the requests library, which returns one JSON response per call - I'm going to loop through and make multiple calls.

I want to combine all of the responses from the many API calls into one python data structure and then export the results to CSV.

One API response looks like this:

{
    "status": "1",
    "msg": "Success",
    "data": {
      "id": "12345",
      "PriceDetail": [
        {
          "item": "Apple",
          "amount": "10",
          "weight": "225",
          "price": "92",
          "bestbeforeendeate": "30/09/2023"
        }
        ]
    }
}

My final output should be a CSV file with the following headers and data in the subsequent rows:

id	item	amount	weight	price	bestbeforeendeate
12345	apple	10	225	92	30/09/2023
.....	.....	..	...	..	..........

I've looked at combining the responses in a dictionary, named tuple, dataframe and tried the various options to export to from said structures like dictwriter, csvwriter, normalize etc. Still, I'm struggling to make any of it work.

The closest I got was (I saved the results to a JSON file to stop hitting the API):

with open('item.json') as json_file: 
    data_set = json.load(json_file) 
    for data in data_set: 
        if data['msg'] == 'Success': 
            id = data['data']['id'] 
            return_data[id] = data['data']['PriceDetail'] 

df = pd.json_normalize(data['data']['PriceDetail']) 
print(df)

I couldn't get the id added to the dataframe

Any suggestions would be appreciated.

Thanks,

Solution

Pandas has a function called json_normalize, which can directly convert a dict into a dataframe. In order to convert a JSON string into a dict you can simply use the json library. Good source I found would be this`.

import json
import pandas as pd

# Test string, assuming it is from API
test_string = """{
    "status": "1",
    "msg": "Success",
    "data": {
      "id": "12345",
      "PriceDetail": [
        {
          "item": "Apple",
          "amount": "10",
          "weight": "225",
          "price": "92",
          "bestbeforeendeate": "30/09/2023"
        }
        ]
    }
}"""

# Function converts the api result to the dataframe and appends it to df
def add_new_entry_to_dataframe(df, api_result_string):
    input_parsed = json.loads(api_result_string)
    df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
    df = df.append(df_with_new_data)
    return df
    

# The dataframe you want to store everything
df = pd.DataFrame()

## Loop where you fetch new data
for i in range(10):
    newly_fetched_result = test_string
    df = add_new_entry_to_dataframe(df, newly_fetched_result)


df = df.reset_index()

# Save as .csv
df.to_csv('output.csv')

print(df)

The output of above code:

item amount weight price bestbeforeendeate
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023

EDIT: I had another look at the problem and thought I share another solution, which might be better for you. Instead of building a huge dataframe over time, the code below appends the fetched data directly into the CSV file. The advantage is that all data is already in the CSV if the program crashes or if you terminate it.

# Function converts the json string to a dataframe and appends it directly to the CSV file
def add_json_string_to_csv(api_result_string):
    input_parsed = json.loads(api_result_string)
    df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
    df_with_new_data.to_csv('output.csv', mode='a', header=False)

## Loop where you fetch new data
while (True):
    newly_fetched_result = test_string
    add_json_string_to_csv(newly_fetched_result)