I have one dataframe of couple thousands of rows
input_df
case_id api_param stat
1 data1 1
2 data2 0
1 data3 0
4 data4 0
1 data5 1
I do a groupBy(case_id)
and get:
case_id 1 2 3
1 data1 data3 data5
2 data2 nan nan
4 data4 nan nan
Now suppose that for each case_id
that I would like to modify the date value in the api_param column for all the case_id where stat == 0. => modify data2, data3, data4.
To do so I decide to choose a new data within k data points of the prior data and call the API to check that the data is valid;
ie: url = https:// example..com/over/there?name=api_param[i] with api_param == data2 +k data pnt for example for case_id ==2 above.
if the API response is 200 then I am able to overwrite the old value in the input_df
.
Now I may have thousands of such cases in my file, and each case has many datapoints to change. Let say I have 300 cases which each have 100 dates to modify
And therefore using the Python requests API would be very slow. I would like to use concurrent.futures; How could I go about doing it?
Please try with this functions
def check_api_call(count, dates):
length = dates.values.__len__()
executor = futures.ThreadPoolExecutor()
for i in range(length):
date = dates.values[i]
pool = executor.submit(task_api, date)
response = pool.result()
while not response:
count = count + 1
day_value = count * 7
td = pd.to_timedelta(day_value, unit='d')
delta_date = datetime.strptime(date, "%Y-%m-%d") + td
new_date = delta_date.strftime("%Y-%m-%d")
pool = executor.submit(task_api, new_date)
response = pool.result()
if not response:
continue
dates.values[i] = new_date
return True, dates
def task_api(date):
url = "https:// example..com/over/there?name=" + date
response = requests.get(url)
if response.status_code == 404:
return False
else:
return True