I'm trying to learn how to use multiprocessing in Python. I read about multiprocessing, and I trying to do something like this:
I have the following class(partial code), which has a method to produce voronoi diagrams:
class ImageData:
def generate_voronoi_diagram(self, seeds):
"""
Generate a voronoi diagram with *seeds* seeds
:param seeds: the number of seed in the voronoi diagram
"""
nx = []
ny = []
gs = []
for i in range(seeds):
# Generate a cell position
pos_x = random.randrange(self.width)
pos_y = random.randrange(self.height)
nx.append(pos_x)
ny.append(pos_y)
# Save the f(x,y) data
x = Utils.translate(pos_x, 0, self.width, self.range_min, self.range_max)
y = Utils.translate(pos_y, 0, self.height, self.range_min, self.range_max)
z = Utils.function(x, y)
gs.append(z)
for y in range(self.height):
for x in range(self.width):
# Return the Euclidean norm
d_min = math.hypot(self.width - 1, self.height - 1)
j = -1
for i in range(seeds):
# The distance from a cell to x, y point being considered
d = math.hypot(nx[i] - x, ny[i] - y)
if d < d_min:
d_min = d
j = i
self.data[x][y] = gs[j]
I have to generate a large number of this diagrams, so, this consumes a lot of time, so I thought this is a typical problem to be parallelized. I was doing this, in the "normal" approach, like this:
if __name__ == "__main__":
entries = []
for n in range(images):
entry = ImD.ImageData(width, height)
entry.generate_voronoi_diagram(seeds)
entry.generate_heat_map_image("ImagesOutput/Entries/Entry"+str(n))
entries.append(entry)
Trying to parallelize this, I tried this:
if __name__ == "__main__":
entries = []
seeds = np.random.poisson(100)
p = Pool()
entry = ImD.ImageData(width, height)
res = p.apply_async(entry.generate_voronoi_diagram,(seeds))
entries.append(entry)
entry.generate_heat_map_image("ImagesOutput/Entries/EntryX")
But, besides it doesn't work not even to generate a single diagram, I don't know how to specify that this have to be made N times.
Any help would be very appreciated. Thanks.
Python's multiprocessing doesn't share memory (unless you explicitly tell it to). That means that you won't see "side effects" of any function that gets run in a worker processes. Your generate_voronoi_diagram
method works by adding data to an entry
value, which is a side effect. In order to see the results, you need to be passing it back as a return values from your function.
Here's one approach that handles the entry
instance as an argument and return value:
def do_voroni(entry, seeds):
entry.generate_voronoi_diagram(seeds)
return entry
Now, you can use this function in your worker processes:
if __name__ == "__main__":
entries = [ImD.ImageData(width, height) for _ in range(images)]
seeds = numpy.random.poisson(100, images) # array of values
pool = multiprocessing.Pool()
for i, e in enumerate(pool.starmap_async(do_voroni, zip(entries, seeds))):
e.generate_heat_map_image("ImagesOutput/Entries/Entry{:02d}".format(i))
The e
values in the loop are not references to the values in the entries
list. Rather, they're copies of those objects, which have been passed out to the worker process (which added data to them) and then passed back.