Search code examples
pythonctypespython-multiprocessing

Python multiprocessing claims too many open files when no files are even opened


I'm attempting to speed up an algorithm that makes use of a gigantic matrix. I've parallelised it to operate on rows, and put the data matrix in shared memory so the system doesn't get clogged. However, instead of working smoothly as I'd have hoped, it now throws a weird error with regards to files, which I don't comprehend as I don't even open files in the thing.

Mock-up of roughly what's going on in the program proper, with the 1000-iteration for being representative of what's happening in the algorithm too.

import multiprocessing
import ctypes
import numpy as np

shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)

def my_func(i, shared_array):
    shared_array[i,:] = i

def pool_init(_shared_array, _constans):
    global shared_array, constans
    shared_array = _shared_array
    constans = _constans

def pool_my_func(i):
    my_func(i, shared_array)

if __name__ == '__main__':
    for i in np.arange(1000):
        pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
        pool.map(pool_my_func, range(10))
    print(shared_array)

And this throws this error (I'm on OSX):

Traceback (most recent call last):
  File "weird.py", line 24, in <module>
    pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "//anaconda/lib/python3.4/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 21, in __init__
    self._launch(process_obj)
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 69, in _launch
    parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files

I'm quite stumped. I don't even open files in here. All I want to do is pass shared_array to the individual processes in a manner that won't clog the system memory, I don't even need to modify it within the parallelised process if this will help anything.

Also, in case it matters, the exact error thrown by the proper code itself is a little different:

Traceback (most recent call last):
  File "tcap.py", line 206, in <module>
  File "tcap.py", line 202, in main
  File "tcap.py", line 181, in tcap_cluster
  File "tcap.py", line 133, in ap_step
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 118, in Pool
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 168, in __init__
  File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 233, in _repopulate_pool
  File "//anaconda/lib/python3.4/multiprocessing/process.py", line 105, in start
  File "//anaconda/lib/python3.4/multiprocessing/context.py", line 267, in _Popen
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 21, in __init__
  File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 69, in _launch
OSError: [Errno 24] Too many open files

So yeah, I have no idea how to proceed. Any help would be appreciated. Thanks in advance!


Solution

  • You're trying to create 1000 process pools, which are not reclaimed (for some reason); these have consumed all available file descriptors in your main process for the pipes that are used for communicating between the main process and its children.

    Perhaps you'd want to use:

    pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
    for _ in range(1000):
        pool.map(pool_my_func, range(10))