Search code examples
pythonarraysjsonpandasconcatenation

concatenate several JSON responses


I try to concatenate several JSON-responses from a XHR.

I have a list of XHR I want to go through and put all the results in the same CSV. I understand that I should probably write to CSV outside the loop but initially I just want it to work. I have made two comments at the end, this is the part I don't get to work. I also added a break at the end so you don't have to go trough everything.

import requests
import pandas as pd
from pandas.io.json import json_normalize
import csv
import json
    


h = {
     'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
     'X-Requested-With': 'XMLHttpRequest',
}

#VARIABLES
i=0

projects = "https://cdn-search-standard-prod.azureedge.net/api/v1/tags/all/2af3c43b-98aa-49d8-b4ff-da6d5a992751" 

y=requests.get(projects,headers=h).json()
df=pd.json_normalize(y)
df.to_csv(r'C:\Users\abc\Documents\Python Scripts\ListOfProjects_20200628.csv', index=False, sep=';',encoding='utf-8')

export=[]
i=29

for y in y:
    print(str(df.id[i]))
    u = "https://cdn-search-standard-prod.azureedge.net/api/v1/search/getstageobjects/"+str(df.id[i])
    i = i+1
    units = requests.get(u,headers=h).json()
    dp=pd.DataFrame(units)
    dp = pd.json_normalize(units)

    dp.to_csv(r'C:\Users\abc\Documents\Python Scripts\Units_20200628.csv', index=False, sep=';',encoding='utf-8')
    #export = pd.concat([export,dp], ignore_index=False, sort=False)
    #export.to_csv(r'C:\Users\abc\Documents\Python Scripts\Units_20200628.csv', index=False, sep=';',encoding='utf-8')
    break

Solution

  • Here's one way to do it. Keep all the 'partial' dataframes in a list, and then create a single large dataframe using pd.concat. You can then save this large dataframe using to_csv. Here's the relevant part of the code:

    df_list = []
    
    for y in y[:10]: # for development purpose - iterate over 10 URLs only
        print(str(df.id[i]))
        u = "https://cdn-search-standard-prod.azureedge.net/api/v1/search/getstageobjects/"+str(df.id[i])
        i = i+1
        units = requests.get(u,headers=h).json()
        dp=pd.DataFrame(units)
        dp = pd.json_normalize(units)
        df_list.append(dp)
    
    res = pd.concat(df_list)
    res.to_csv("final_result.csv")