Search code examples
pythonmultiprocessingmultiprocess

Multiprocess and Multiprocessing with no file IO: OSError: [Errno 24] Too many open files


Summary:

I'm trying to use multiprocess and multiprocessing to parallelise work with the following attributes:

  • Shared datastructure
  • Multiple arguments passed to a function
  • Setting number of processes based on current system

Errors:

My approach works for a small amount of work but fails with the following on larger tasks:

OSError: [Errno 24] Too many open files

Solutions tried

Running on a macOS Catalina system, ulimit -n gives 1024 within Pycharm.

Is there a way to avoid having to change ulimit? I want to avoid this as the code will ideally work out of the box for various sytems.

I've seen in related questions like this thread that recommend using .join and gc.collect in the comments, other threads recommend closing any opened files but I do not access files in my code.

import gc
import time

import numpy as np

from math import pi
from multiprocess import Process, Manager
from multiprocessing import Semaphore, cpu_count

def do_work(element, shared_array, sema):
    shared_array.append(pi*element)
    gc.collect()
    sema.release()

# example_ar = np.arange(1, 1000) # works
example_ar = np.arange(1, 10000) # fails

# Parallel code
start = time.time()
# Instantiate a manager object and a share a datastructure
manager = Manager()
shared_ar = manager.list()
# Create semaphores linked to physical cores on a system (1/2 of reported cpu_count)
sema = Semaphore(cpu_count()//2)
job = []
# Loop over every element and start a job
for e in example_ar:
    sema.acquire()
    p = Process(target=do_work, args=(e, shared_ar, sema))
    job.append(p)
    p.start()
_ = [p.join() for p in job]
end_par = time.time()

# Serial code equivalent
single_ar = []
for e in example_ar:
    single_ar.append(pi*e)
end_single = time.time()

print(f'Parallel work took {end_par-start} seconds, result={sum(list(shared_ar))}')
print(f'Serial work took {end_single-end_par} seconds, result={sum(single_ar)}')

Solution

  • The way to avoid changing the ulimit is to make sure that your process pool size does not increase beyond 1024. That's why 1000 works and 10000 fails.

    Here's an example of managing the processes with a Pool which will ensure you don't go above the ceiling of your ulimit value:

    from multiprocessing import Pool
    
    def f(x):
        return x*x
    
    if __name__ == '__main__':
        with Pool(5) as p:
            print(p.map(f, [1, 2, 3]))
    

    https://docs.python.org/3/library/multiprocessing.html#introduction

    other threads recommend closing any opened files but I do not access files in my code

    You don't have files open, but your processes are opening file descriptors which is what the OS is seeing here.