Search code examples
pythonparallel-processingpathos

Getting results from pool.map() of pathos


I have tried to create an issue of mine in the code below:

import numpy as np
from pathos.multiprocessing import ProcessPool
import multiprocessing
shapes_to_display=[]
products =[ 0,1,2,3,4,5,6,7,8,9]

def hay(i):
    print('PROCESSOR ID: ', multiprocessing.current_process().pid, 'Iteration: ', i)
    product = products[i]
    shapes_to_display.append([i, product])
    return shapes_to_display

pool = ProcessPool(nodes=4)
res= pool.map(hay, range(len(products)))

pool.close()
pool.restart()

res = np.asarray(res)
print(res.shape)

for r in res:
    r = np.asarray(r)
    print(r)

What I want to get as result is:

(10,) [[0 0], [1,1], .. .. .. [9,9]]

What I end up getting:

(10,) [[0 0]] [[1 1]] [[2 2]] [[3 3]] [[0 0] [4 4]] [[1 1] [5 5]] [[2 2] [6 6]] [[3 3] [7 7]] [[0 0] [4 4] [8 8]] [[1 1] [5 5] [9 9]]

Can any of you help me to figure this out ? What do I need to do to get the result the way I want to ?


Solution

  • Q : "What do I need to do to get the result the way I want to ?"

    You need to first understand the toys & its mechanics

    Process-based Pathos-distributed computing does not uniformly "share" the variables "across" all the remote processes ( rather the very opposite happens and right due to separation of these individual processes, the central and all distributed GIL-lock-s become & remain independent & free to run computation without waiting in otherwise un-escapable python interpreter [SERIAL]-queue working but waiting for one's turn in an avoided-concurrency-safe but awfully inefficient sequence of all waiting but one does a small amount of work, if it is its turn, in an endless GIL-lock waiting queue of One_step-after-Another_step-after-Another_step-...

    Pathos helped you to cut this pure-[SERIAL] chain of ALL-wait-but-( just ) one( and only one ) -works.

    The cut ( by going to send some work into fully-independent processes ) also means that those processes know nothing else but the state of the original (main) python process, at the time of their instantiation ( yes, the were created by a full-copy of the originating python interpreter, including its whole internal state - i.e. having all import-s done and also all the data-structures get fully replicated into the herd of new process-copies ).

    This helps for doing some serious work, yet it is pretty expensive to all that heavy artillery work for cases, where but a few computing steps get done at the far end of the remote-process code-execution - such use-cases never justify the add-on costs that have been burnt for making that happen.

    Yet, the process independence also means, there is no way ( unless explicitly programmed ) any independent process will ever hear about a variable assignment in any other of the (still independent) processes.

    So doing an .append() is always the trouble if modifying remote copies of variable instantiated in the main interpreter before the processes get copied it into the remote-independent-instances.

    Do a return [i, product] in hay(i) and the pool.map() will assemble these respective results into the shape and form you expected.