Search code examples
pythonnumpyoptimizationcentroid

Faster way to find centroids of pixel areas


For a game, I made a territory map, consisting of pixels, with each territory having a different color. From there, I want to add names on every territory.

For visual purposes, I want to put names on the centroid of the area. Therefore,, I used PIL to convert the image into a single large matrix. I built a class to record the centroid data of each territory, collected in a dictionary. Then, I iterate over the pixels to process the centroid. This method is very slow and takes around a minute for 2400 x 1100 map.

territory_map = numpy.array([
    [0, 0, 0, 1, 0, 0, 0],
    [0, 2, 2, 1, 0, 0, 0],
    [2, 2, 1, 1, 3, 3, 3],
    [2, 0, 0, 1, 3, 0, 0],
])

centroid_data = {}

class CentroidRecord(object):
    def __init__(self, x, y):
        super(CentroidRecord, self).__init__()
        self.x = float(x)
        self.y = float(y)
        self.volume = 1

    def add_mass(self, x, y):
        #           new_x = (old_x * old_volume + x) / (old_volume + 1),
        # therefore new_x = old_x + (x - old_x) / v,
        # for v = volume + 1.
        self.volume += 1
        self.x += (x - self.x) / self.volume
        self.y += (y - self.y) / self.volume


for y in range(territory_map.shape[0]):
    for x in range(territory_map.shape[1]):
        cell = territory_map[y][x]
        if cell == 0:
            continue
        if cell not in centroid_data:
            centroid_data[cell] = CentroidRecord(x, y)
        else:
            centroid_data[cell].add_mass(x, y)

for area in centroid_data:
    data = centroid_data[area]
    print(f"{area}: ({data.x}, {data.y})")

This should print the following:

1: (2.8, 1.6)
2: (0.8, 1.8)
3: (4.75, 2.25)

Is there any faster method to do this?


Solution

  • Each coordinate of the centroid for a colour is simply the mean of all the coordinates of points of that colour. Accordingly, we can use a dict comprehension:

    import numpy as np
    
    n_colours = territory_map.max()
    
    {i: tuple(c.mean() for c in np.where(territory_map.T == i)) 
     for i in range(1, n_colours + 1)}
    

    Output:

    {1: (2.8, 1.6), 
     2: (0.8, 1.8), 
     3: (4.75, 2.25)}
    

    Note that we need to take the transpose because rows (y-coordinate) come before columns (x-coordinate) in numpy.

    Time taken on randomly generated data:

    81.6 ms ± 5.36 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)