I have two arrays named arrayone
& arraytwo
, both of which are identical in dimensions, static, and will not need to be altered. A third array masterarray
is pre-assembled, in which the compilation of integers are cast to a third array, then placed into the pre-assembled third array.
The process of moving along the ndarray's columns (j) and each row (i) is fast, however I'd like to utilize multiprocessing to accelerate this process and share these arrays without excessive memory consumption. Specifically, I'd like to execute multiple processes which loop across each column (j) at any given row (i) and writes the result to masterarray in shared memory. I've perused this answer, however the potential instability caused by sharedmem has led me to ask this question. For reference, my code is as follows:
def gridagg():
masterarray = np.empty([1228,2606,208])
for index, val in np.ndenumerate(arrayone):
selection = arraytwo[index[0]][index[1]]
piece = stacked[selection[:,0], selection[:,1]].tolist()
piece = [j for i in piece for j in i]
comparray = np.array(piece)
if index[1] == 0:
compiled = comparray
else:
stage1 = comparray
stage2 = compiled
if index[1] == 1:
compiled = np.stack([stage2, stage1])
else:
compiled = np.vstack([stage2, stage1[None, :]])
if index[1] == 2605:
masterarray[index[0], :] = compiled
Having multiple threads modify an array is usually a bad idea. It's better to just have the tasks calculate the values, but let the main thread actually create the array.
def initialize_arrays(a, b):
global arrayone, arraytwo
arrayone = a
arraytwo = b
def get_masterarray_row(index):
contents = .... calculate the contents of masterarray[index] ...
return index, contents
def main():
masterarray = np.empty([1228,2606,208])
with mp.Pool(initializer=initialize_arrays, initargs=(arrayone, arraytwo)) as pool:
for index, contents in pool.imap_unordered(get_masterarray_row, range(1228)):
masterarray[index, :] = contents