Search code examples
pythonmultithreadingconcurrent.futures

How to access the elements in as_completed(futures) chronology after the threads have executed?


I have a function which I input a subset of a Pandas dataframe. I'm using concurrent.futures and requests_futures to request each URL in one of the columns of the dataframe asynchronously/concurrently. It works very well, however, I cannot match the elements back with the corresponding rows of the dataframe after the concurrent operations have been executed.

Here is the code:

from concurrent.futures import as_completed
from requests_futures.sessions import FuturesSession

def img_helper_function(df_chunk):

    # multithreading lock
    lock = threading.Lock() # Actually not used

    with FuturesSession(max_workers=10) as session:
        futures = [session.get(j, stream=True) for j in df_chunk["billedurl"].tolist()]
        filenames = [filename for filename in df_chunk["image_filename"].tolist()] # New, slugify filenames

        for idx, future in enumerate(as_completed(futures)):
            response = future.result()

            # These two elements don't match
            response.url
            filenames[idx]

Solution

  • You can achieve that by changing from storing the futures as a list to a dictionary with the future as the key and the args as the values. This allows you to use the finished future to lookup the corresponding args. In code that would look like

    futures = {session.get(j, stream=True):j for j in df_chunk["billedurl"].tolist()}
    
    for idx, future in enumerate(as_completed(futures)):
        col = futures[future]
    ...