python pandas api python-requests concurrent.futures

Python Concurrent Futures for Panda Dataframe

I have one dataframe of couple thousands of rows

input_df

case_id api_param   stat
1        data1      1
2        data2      0
1        data3      0
4        data4      0
1        data5      1

I do a groupBy(case_id) and get:

  case_id    1      2       3  
      1     data1  data3  data5          
      2     data2  nan    nan   
      4     data4  nan    nan

Now suppose that for each case_id that I would like to modify the date value in the api_param column for all the case_id where stat == 0. => modify data2, data3, data4. To do so I decide to choose a new data within k data points of the prior data and call the API to check that the data is valid;

ie: url = https:// example..com/over/there?name=api_param[i] with api_param == data2 +k data pnt for example for case_id ==2 above. if the API response is 200 then I am able to overwrite the old value in the input_df.

Now I may have thousands of such cases in my file, and each case has many datapoints to change. Let say I have 300 cases which each have 100 dates to modify

And therefore using the Python requests API would be very slow. I would like to use concurrent.futures; How could I go about doing it?

Solution

Please try with this functions

def check_api_call(count, dates):
    length = dates.values.__len__()
    executor = futures.ThreadPoolExecutor()
    for i in range(length):
        date = dates.values[i]
        pool = executor.submit(task_api, date)
        response = pool.result()
        while not response:
            count = count + 1
            day_value = count * 7
            td = pd.to_timedelta(day_value, unit='d')
            delta_date = datetime.strptime(date, "%Y-%m-%d") + td
            new_date = delta_date.strftime("%Y-%m-%d")
            pool = executor.submit(task_api, new_date)
            response = pool.result()
            if not response:
                continue
            dates.values[i] = new_date
    return True, dates

def task_api(date):
    url = "https:// example..com/over/there?name=" + date
    response = requests.get(url)
    if response.status_code == 404:
        return False
    else:
        return True