I'm attempting to speed up an algorithm that makes use of a gigantic matrix. I've parallelised it to operate on rows, and put the data matrix in shared memory so the system doesn't get clogged. However, instead of working smoothly as I'd have hoped, it now throws a weird error with regards to files, which I don't comprehend as I don't even open files in the thing.
Mock-up of roughly what's going on in the program proper, with the 1000-iteration for being representative of what's happening in the algorithm too.
import multiprocessing
import ctypes
import numpy as np
shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)
def my_func(i, shared_array):
shared_array[i,:] = i
def pool_init(_shared_array, _constans):
global shared_array, constans
shared_array = _shared_array
constans = _constans
def pool_my_func(i):
my_func(i, shared_array)
if __name__ == '__main__':
for i in np.arange(1000):
pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
pool.map(pool_my_func, range(10))
print(shared_array)
And this throws this error (I'm on OSX):
Traceback (most recent call last):
File "weird.py", line 24, in <module>
pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
File "//anaconda/lib/python3.4/multiprocessing/context.py", line 118, in Pool
context=self.get_context())
File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 168, in __init__
self._repopulate_pool()
File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 233, in _repopulate_pool
w.start()
File "//anaconda/lib/python3.4/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "//anaconda/lib/python3.4/multiprocessing/context.py", line 267, in _Popen
return Popen(process_obj)
File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 21, in __init__
self._launch(process_obj)
File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 69, in _launch
parent_r, child_w = os.pipe()
OSError: [Errno 24] Too many open files
I'm quite stumped. I don't even open files in here. All I want to do is pass shared_array
to the individual processes in a manner that won't clog the system memory, I don't even need to modify it within the parallelised process if this will help anything.
Also, in case it matters, the exact error thrown by the proper code itself is a little different:
Traceback (most recent call last):
File "tcap.py", line 206, in <module>
File "tcap.py", line 202, in main
File "tcap.py", line 181, in tcap_cluster
File "tcap.py", line 133, in ap_step
File "//anaconda/lib/python3.4/multiprocessing/context.py", line 118, in Pool
File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 168, in __init__
File "//anaconda/lib/python3.4/multiprocessing/pool.py", line 233, in _repopulate_pool
File "//anaconda/lib/python3.4/multiprocessing/process.py", line 105, in start
File "//anaconda/lib/python3.4/multiprocessing/context.py", line 267, in _Popen
File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 21, in __init__
File "//anaconda/lib/python3.4/multiprocessing/popen_fork.py", line 69, in _launch
OSError: [Errno 24] Too many open files
So yeah, I have no idea how to proceed. Any help would be appreciated. Thanks in advance!
You're trying to create 1000 process pools, which are not reclaimed (for some reason); these have consumed all available file descriptors in your main process for the pipes that are used for communicating between the main process and its children.
Perhaps you'd want to use:
pool = multiprocessing.Pool(8, pool_init, (shared_array, 4))
for _ in range(1000):
pool.map(pool_my_func, range(10))