Search code examples
pythonjsonpandaspython-requestsexport-to-csv

Flatten JSON response and output to csv


I appear to have exhausted the internet searching for what feels like a common occurrence, and I need some help, please.

I'm making an API call using the requests library, which returns one JSON response per call - I'm going to loop through and make multiple calls.

I want to combine all of the responses from the many API calls into one python data structure and then export the results to CSV.

One API response looks like this:

{
    "status": "1",
    "msg": "Success",
    "data": {
      "id": "12345",
      "PriceDetail": [
        {
          "item": "Apple",
          "amount": "10",
          "weight": "225",
          "price": "92",
          "bestbeforeendeate": "30/09/2023"
        }
        ]
    }
}

My final output should be a CSV file with the following headers and data in the subsequent rows:

id item amount weight price bestbeforeendeate
12345 apple 10 225 92 30/09/2023
..... ..... .. ... .. ..........

I've looked at combining the responses in a dictionary, named tuple, dataframe and tried the various options to export to from said structures like dictwriter, csvwriter, normalize etc. Still, I'm struggling to make any of it work.

The closest I got was (I saved the results to a JSON file to stop hitting the API):

with open('item.json') as json_file: 
    data_set = json.load(json_file) 
    for data in data_set: 
        if data['msg'] == 'Success': 
            id = data['data']['id'] 
            return_data[id] = data['data']['PriceDetail'] 

df = pd.json_normalize(data['data']['PriceDetail']) 
print(df) 

I couldn't get the id added to the dataframe

Any suggestions would be appreciated.

Thanks,


Solution

  • Pandas has a function called json_normalize, which can directly convert a dict into a dataframe. In order to convert a JSON string into a dict you can simply use the json library. Good source I found would be this`.

    import json
    import pandas as pd
    
    # Test string, assuming it is from API
    test_string = """{
        "status": "1",
        "msg": "Success",
        "data": {
          "id": "12345",
          "PriceDetail": [
            {
              "item": "Apple",
              "amount": "10",
              "weight": "225",
              "price": "92",
              "bestbeforeendeate": "30/09/2023"
            }
            ]
        }
    }"""
    
    # Function converts the api result to the dataframe and appends it to df
    def add_new_entry_to_dataframe(df, api_result_string):
        input_parsed = json.loads(api_result_string)
        df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
        df = df.append(df_with_new_data)
        return df
        
    
    # The dataframe you want to store everything
    df = pd.DataFrame()
    
    ## Loop where you fetch new data
    for i in range(10):
        newly_fetched_result = test_string
        df = add_new_entry_to_dataframe(df, newly_fetched_result)
    
    
    df = df.reset_index()
    
    # Save as .csv
    df.to_csv('output.csv')
    
    print(df)
    

    The output of above code:

    item amount weight price bestbeforeendeate
    0  Apple     10    225    92        30/09/2023
    0  Apple     10    225    92        30/09/2023
    0  Apple     10    225    92        30/09/2023
    0  Apple     10    225    92        30/09/2023
    0  Apple     10    225    92        30/09/2023
    0  Apple     10    225    92        30/09/2023
    0  Apple     10    225    92        30/09/2023
    0  Apple     10    225    92        30/09/2023
    0  Apple     10    225    92        30/09/2023
    0  Apple     10    225    92        30/09/2023
    

    EDIT: I had another look at the problem and thought I share another solution, which might be better for you. Instead of building a huge dataframe over time, the code below appends the fetched data directly into the CSV file. The advantage is that all data is already in the CSV if the program crashes or if you terminate it.

    # Function converts the json string to a dataframe and appends it directly to the CSV file
    def add_json_string_to_csv(api_result_string):
        input_parsed = json.loads(api_result_string)
        df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
        df_with_new_data.to_csv('output.csv', mode='a', header=False)
    
    ## Loop where you fetch new data
    while (True):
        newly_fetched_result = test_string
        add_json_string_to_csv(newly_fetched_result)