Search code examples
pythonmultiprocessingpython-multiprocessing

Looping through a list using multiprocessing.pool()


I'm calling a function with large memory overhead on a list of N files. The reasons for the large memory overhead are due to a number of factors that cannot be resolved without modifying the function, however I have overcome the leaking memory using the multiprocessing module. By creating a subprocess for each of the N files, and then calling pool.close(), the memory from the function is released with minimal overhead. I have achieved this in the following example:

def my_function(n):
    do_something(file=n)
    return 


if __name__ == '__main__':

    # Initialize pool
    for n in range(0,N,1):
        pool = mp.Pool(processes=1)
        results = pool.map(my_function,[n])
        pool.close()
        pool.join()

This does exactly what I want: by setting processes=1 in pool, one file is run at a time for N files. After each n file, I call pool.close(), which closes the process and releases the memory back to the OS. Before, I didn't use multiprocessing at all, just a for loop, and the memory would accumulate until my system crashed.

My questions are

  1. Is this the correct way to implement this?
  2. Is there a better way to implement this?
  3. Is there a way to run more than one process at a time (processes>1), and still have the memory released after each n?

I'm just learning about the multiprocessing module. I've found many multiprocessing examples here, but none specific to this problem. I'll appreciate any help I can get.


Solution

  • Is this the correct way to implement this?

    "Correct" is a value judgement in this case. One could consider this either a bodge or a clever hack.

    Is there a better way to implement this?

    Yes. Fix my_function so that it doesn't leak memory. If a Python function is leaking lots of memory, chances are you are doing something wrong.

    Is there a way to run more than one process at a time (processes>1), and still have the memory released after each n?

    Yes. Use the maxtasksperchild argument when creating a Pool.