Search code examples
pythonnumpypython-multiprocessingnumpy-ndarray

Numpy ndenumerate A Shared Array With Multiprocessing


I have two arrays named arrayone & arraytwo, both of which are identical in dimensions, static, and will not need to be altered. A third array masterarray is pre-assembled, in which the compilation of integers are cast to a third array, then placed into the pre-assembled third array.

The process of moving along the ndarray's columns (j) and each row (i) is fast, however I'd like to utilize multiprocessing to accelerate this process and share these arrays without excessive memory consumption. Specifically, I'd like to execute multiple processes which loop across each column (j) at any given row (i) and writes the result to masterarray in shared memory. I've perused this answer, however the potential instability caused by sharedmem has led me to ask this question. For reference, my code is as follows:

def gridagg():
    masterarray = np.empty([1228,2606,208])
    for index, val in np.ndenumerate(arrayone):
        selection = arraytwo[index[0]][index[1]]
        piece = stacked[selection[:,0], selection[:,1]].tolist()
        piece = [j for i in piece for j in i]
        comparray = np.array(piece)
        if index[1] == 0:
            compiled = comparray
        else:
            stage1 = comparray
            stage2 = compiled
            if index[1] == 1:
                compiled = np.stack([stage2, stage1])
            else:
                compiled = np.vstack([stage2, stage1[None, :]])
        if index[1] == 2605:
            masterarray[index[0], :] = compiled

Solution

  • Having multiple threads modify an array is usually a bad idea. It's better to just have the tasks calculate the values, but let the main thread actually create the array.

    def initialize_arrays(a, b):
        global arrayone, arraytwo
        arrayone = a
        arraytwo = b
    
    def get_masterarray_row(index):
        contents = .... calculate the contents of masterarray[index] ...
        return index, contents
    
    def main():
        masterarray = np.empty([1228,2606,208])
        with mp.Pool(initializer=initialize_arrays, initargs=(arrayone, arraytwo)) as pool:
            for index, contents in pool.imap_unordered(get_masterarray_row, range(1228)):
                masterarray[index, :] = contents