I have a list of header keys that I need to iterate through and get data from an API. I am creating a temporary dataframe to hold API response and using union to append data from temp dataframe to final dataframe. This code works but it is very slow. Please help me find an efficient solution.
# Created df_final empty dataframe before the for loop
list1 = [<contains list of lists of header data>]
for i in range(0,len(list1)):
api_header_data = list1['header']
# Call Api function
input_data = get_api_function(<api_header_data>)
response = postrequest(input_data)
columns = response.json()["result"]["Headers"]
data = response.json()["result"]["Data"]
# Create temp dataframe union it to main dataframe
df_temp = spark.createDataFrame(data,columns)
df_final = df_final.union(df_temp)
You can collect the data first and then create the dataframe:
# Created df_final empty dataframe before the for loop
list1 = [<contains list of lists of header data>]
all_data = []
for headers in list1:
api_header_data = headers['header']
# Call Api function
input_data = get_api_function(<api_header_data>)
response = postrequest(input_data)
columns = response.json()["result"]["Headers"]
data = response.json()["result"]["Data"]
all_data.extend(data)
if all_data:
df_final = spark.createDataFrame(all_data, columns)