Search code examples
pythonparallel-processingpicklescikit-imagejoblib

multiprocess with joblib and skimage: PicklingError: Could not pickle the task to send it to the workers


I'm trying to parallelize the task of finding minimum cost paths through a raster cost surface, but I keep bumping into the same PicklingError: Could not pickle the task to send it to the workers.

This is a code example of what's going on:

import numpy as np
from skimage.graph import MCP_Geometric
import timeit
from joblib import Parallel, delayed

np.random.seed(123)  
cost_surface = np.random.rand(1000, 1000)  
mcp = MCP_Geometric(cost_surface)
pois = [(np.random.randint(0, 1000), np.random.randint(0, 1000)) for _ in range(20)]

def task(poi):
    costs_array, traceback = mcp.find_costs(starts=[poi], ends=pois)
    ends_idx = tuple(np.asarray(pois).T.tolist())
    costs = costs_array[ends_idx]
    tracebacks = [mcp.traceback(end) for end in pois]

Parallel(n_jobs=6)(delayed(task)(poi) for poi in pois)

I'm fairly new to parallelizing tasks, but the code I'm running might take weeks if done sequentially and want to leverage parallel. I understand that pickling is not possible for some complex objects, so I'm also looking for alternatives.


Solution

  • This is because MCP_Geometric is unpickable. You need to move initialization of this class into task function:

    
    import numpy as np
    from skimage.graph import MCP_Geometric
    import timeit
    from joblib import Parallel, delayed
    
    np.random.seed(123)  
    cost_surface = np.random.rand(1000, 1000)  
    pois = [(np.random.randint(0, 1000), np.random.randint(0, 1000)) for _ in range(20)]
    
    def task(poi):
        # --- each process will have it's own version of the `mcp` class
        mcp = MCP_Geometric(cost_surface) 
        costs_array, traceback = mcp.find_costs(starts=[poi], ends=pois)
        ends_idx = tuple(np.asarray(pois).T.tolist())
        costs = costs_array[ends_idx]
        tracebacks = [mcp.traceback(end) for end in pois]
    
    Parallel(n_jobs=6)(delayed(task)(poi) for poi in pois)