Search code examples
pythonconcurrent.futures

Python's concurrent.futures hangs in both multithreading and multiprocessing on some systems


The code I wrote is to do some data analysis work, and it has been working well for several months now. But the size of my source data increased significantly recently, and I see that the code hangs now without errors at around the same point in execution (but not always at the same point).

The code looks like this:

def submission_loop(data, submission):
    # No loops in this function
    # Do some data analysis

    return result

def data_loop(arg1, arg2, data_row):
    # Check this marker against all the criteria
    results = []
    for data in data_row:
        for submission in submissions:
            results.append(submission_loop(data, submission))
            
    # Do something with result here
    return results

if __name__ == '__main__':
    with concurrent.futures.ProcessPoolExecutor(max_workers=cpu_count()) as executor:
        chunksize = max(1, int(len(data_rows)/cpu_count()))
        results = executor.map(functools.partial(data_loop, *args), data_rows, chunksize=chunksize)
        results = list(results)     

Note the three levels of loops.

Now, I've tested this on 3 machines:

  1. Inside a docker container running python:3.8.5 running on an Ubuntu 20 host.
  2. Inside a docker container running python:3.8.5 running on a Windows 10 host using Docker Desktop.
  3. Directly on another Windows 10 machine running python 3.8.5.

On both 1 and 2, the above mentioned issue is seen consistently. On 3, the task completes successfully.

I changed this to use ThreadPoolExecutor and the issue does not resolve, which makes me say that the number of cores is irrelevant here. If I remove concurrent.futures usage and use a serial loop, it works perfectly.

Is this a bug with concurrent.futures?


Solution

  • Further to my previous comment. I noticed that pebble also was causing similar issues. I switched to futureproof and the issue has been resolved.