Search code examples
pythonmultiprocessingforkdeadlock

CPU idle when parallelising using Python multiprocessing


I've used Python's multiprocessing to parallelise a function on a list of various arguments, and each process would freeze halfway through. When this happens, I check top followed by 1 on the Ubuntu machine and see that the cores are all now mostly idle (https://man7.org/linux/man-pages/man1/top.1.html).

This is my code:

from multiprocessing import Pool, Queue


class Parallelisable:
    # Adapted from: https://stackoverflow.com/questions/41992810/python-multiprocessing-across-different-files
    def _apply_function(self, function, args_queue):
        while not args_queue.empty():
            args = args_queue.get()
            # Apply function to arguments
            function(*args)

    def parallelise(self, function, args_list):
        queue = Queue()
        for args in args_list:
            queue.put(args)
        pool = Pool(None, self._apply_function, (function, queue,))
        pool.close()  # signal that we won't submit any more tasks to pool
        pool.join()  # wait until all processes are done



if __name__ == '__main__':
    # Define data_dir, output_dir, filenames and some_frozenset here
    # data_dir and output_dir are strings
    # filenames is a list of strings
    # some_frozenset is a frozenset of strings

    Parallelisable().parallelise(some_function, [(filename, data_dir, output_dir, some_frozenset) for filename in filenames])

I suspect that it is due to a deadlock.

I have come up with these possible explanations but they don't make much sense to me:

  1. The Parallelisable object is a shared resource, with the lock acquired by one of the child processes at any one point in time and prevents a self._apply_function() call in multiple child processes. I don't think this is the case as I've had 2 child processes running at the same time. I'm guessing this can be solved by forcing child processes to call execve using the multiprocessing spawn method
  2. function in parallelise() and _apply_function() is a shared resource, similar to point 1 above
  3. Some of my code wasn't in __main__ but I don't see that as a problem since I'm running on Ubuntu not Windows
  4. One of the arguments (filename, data_dir, output_dir, some_frozenset) isn't threadsafe, which shouldn't be the case since the first 3 are immutable strings, and the last is an immutable set of immutable strings

Is there anything I'm missing?

By the way, I think that I can rewrite the code above like this:

from multiprocessing import Pool

def parallelise_v2(function, args_list):
    with Pool(None) as pool:  # Use the "spawn" method if I want to call execve
        pool.starmap_async(function, args_list)


if __name__ == '__main__':
    # Define data_dir, output_dir, filenames and some_frozenset here

    parallelise_v2(some_function, [(filename, data_dir, output_dir, some_frozenset) for filename in filenames])

Solution

  • It seemed to be because the child processes were memory-intensive and Python silently killed them. The parent process simply waited for the child processes to return. I checked the reason for killed processes using dmesg -T| grep -E -i -m10 -B20 'killed process', where -m specifies max number of matches to return and -B specifies number of lines before the match "killed process".

    For more possible reasons and troubleshooting tips, look at the question description and the comments in the question. They include a few things to look out for and the suggestion of viztracer to trace what happened. Quoting from @minker:

    viztracer will output the file if the script is stuck. Just Ctrl-C out of it. Just remember you need --log_multiprocess for multiprocessing library when you use viztracer.