Search code examples
pythonparallel-processingmultiprocessingpool

Parallel Processing - Pool - Python


I'm trying to learn how to use multiprocessing in Python. I read about multiprocessing, and I trying to do something like this:

I have the following class(partial code), which has a method to produce voronoi diagrams:

class ImageData:    

    def generate_voronoi_diagram(self, seeds):
    """
    Generate a voronoi diagram with *seeds* seeds
    :param seeds: the number of seed in the voronoi diagram
    """
    nx = []
    ny = []
    gs = []
    for i in range(seeds):
        # Generate a cell position
        pos_x = random.randrange(self.width)
        pos_y = random.randrange(self.height)
        nx.append(pos_x)
        ny.append(pos_y)

        # Save the f(x,y) data
        x = Utils.translate(pos_x, 0, self.width, self.range_min, self.range_max)
        y = Utils.translate(pos_y, 0, self.height, self.range_min, self.range_max)
        z = Utils.function(x, y)

        gs.append(z)

    for y in range(self.height):
        for x in range(self.width):
            # Return the Euclidean norm
            d_min = math.hypot(self.width - 1, self.height - 1)
            j = -1
            for i in range(seeds):
                # The distance from a cell to x, y point being considered
                d = math.hypot(nx[i] - x, ny[i] - y)
                if d < d_min:
                    d_min = d
                    j = i
            self.data[x][y] = gs[j]

I have to generate a large number of this diagrams, so, this consumes a lot of time, so I thought this is a typical problem to be parallelized. I was doing this, in the "normal" approach, like this:

if __name__ == "__main__":
    entries = []
    for n in range(images):
        entry = ImD.ImageData(width, height)
        entry.generate_voronoi_diagram(seeds)
        entry.generate_heat_map_image("ImagesOutput/Entries/Entry"+str(n))
        entries.append(entry)

Trying to parallelize this, I tried this:

if __name__ == "__main__":
    entries = []
    seeds = np.random.poisson(100)
    p = Pool()
    entry = ImD.ImageData(width, height)
    res = p.apply_async(entry.generate_voronoi_diagram,(seeds))
    entries.append(entry)
    entry.generate_heat_map_image("ImagesOutput/Entries/EntryX")

But, besides it doesn't work not even to generate a single diagram, I don't know how to specify that this have to be made N times.

Any help would be very appreciated. Thanks.


Solution

  • Python's multiprocessing doesn't share memory (unless you explicitly tell it to). That means that you won't see "side effects" of any function that gets run in a worker processes. Your generate_voronoi_diagram method works by adding data to an entry value, which is a side effect. In order to see the results, you need to be passing it back as a return values from your function.

    Here's one approach that handles the entry instance as an argument and return value:

    def do_voroni(entry, seeds):
        entry.generate_voronoi_diagram(seeds)
        return entry
    

    Now, you can use this function in your worker processes:

    if __name__ == "__main__":
        entries = [ImD.ImageData(width, height) for _ in range(images)]
        seeds = numpy.random.poisson(100, images) # array of values
    
        pool = multiprocessing.Pool()
        for i, e in enumerate(pool.starmap_async(do_voroni, zip(entries, seeds))):
            e.generate_heat_map_image("ImagesOutput/Entries/Entry{:02d}".format(i))
    

    The e values in the loop are not references to the values in the entries list. Rather, they're copies of those objects, which have been passed out to the worker process (which added data to them) and then passed back.