I have a function which I input a subset of a Pandas dataframe. I'm using concurrent.futures
and requests_futures
to request each URL in one of the columns of the dataframe asynchronously/concurrently. It works very well, however, I cannot match the elements back with the corresponding rows of the dataframe after the concurrent operations have been executed.
Here is the code:
from concurrent.futures import as_completed
from requests_futures.sessions import FuturesSession
def img_helper_function(df_chunk):
# multithreading lock
lock = threading.Lock() # Actually not used
with FuturesSession(max_workers=10) as session:
futures = [session.get(j, stream=True) for j in df_chunk["billedurl"].tolist()]
filenames = [filename for filename in df_chunk["image_filename"].tolist()] # New, slugify filenames
for idx, future in enumerate(as_completed(futures)):
response = future.result()
# These two elements don't match
response.url
filenames[idx]
You can achieve that by changing from storing the futures as a list to a dictionary with the future as the key and the args as the values. This allows you to use the finished future to lookup the corresponding args. In code that would look like
futures = {session.get(j, stream=True):j for j in df_chunk["billedurl"].tolist()}
for idx, future in enumerate(as_completed(futures)):
col = futures[future]
...